Spark中产生shuffle的算子

作用

算子名

能否替换,由谁替换

去重

distinct()

不能

聚合

reduceByKey()

groupByKey

groupBy()

groupByKey()

reduceByKey

aggregateByKey()

combineByKey()

排序

sortByKey()

sortBy()

重分区

coalesce()

repartition()

集合或者表操作

Intersection()

Substract()

SubstractByKey()

Join()

LeftOutJoin()

https://www.cnblogs.com/Alex-zqzy/p/9949117.html

去重

def distinct()

def distinct(numPartitions: Int)

聚合

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]

def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]

def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]

def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]

def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]

排序

def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]

def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

重分区

def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)

def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)

集合或者表操作

def intersection(other: RDD[T]): RDD[T]

def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]

def intersection(other: RDD[T], numPartitions: Int): RDD[T]

def subtract(other: RDD[T], numPartitions: Int): RDD[T]

def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]

def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]

def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]

def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]

def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]

def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

最新文章

  1. opencv_判断两张图片是否相同
  2. Oracle 语法
  3. socket通信入门
  4. 南阳理工ACM975--关于521
  5. js鼠标及对象坐标控制属性详细解析
  6. dubbo监控活跃线程数
  7. 【jquery学习笔记】关于$(window),$("html,body").scroll()的在不同浏览器的不同反应
  8. POJ 3177 Redundant Paths - from lanshui_Yang
  9. box-shadow IE8兼容处理
  10. 请求返回时的Size/Content Time/Latency的区别
  11. 免费靠谱的 Let’s Encrypt 免费 https 证书申请全过程
  12. day 25 二十五、抽象类、多态、鸭子、反射、异常处理
  13. 异常:Instantiation of bean failed; nested exception is java.lang.NoSuchMethodError: com.google.common.base.Preconditions.che ckState(ZLjava/lang/String;I)V
  14. laravel5.8笔记四:中间件
  15. Linux下使用Nginx代理访问json文件报404错误
  16. js将有父子关系的数据转换成树形结构数据
  17. Spark中cache和persist的区别
  18. java面试第十四天
  19. e595. Drawing an Image
  20. hdu1847sg函数

热门文章

  1. HTTP上传大文件要考虑的问题
  2. JAVA实现DIJKSTRA算法
  3. PostgreSQL 务实应用(二/5)插入冲突
  4. beans.xml中的头部配置
  5. 在 beforeSend中设置ajax请求的Content-type
  6. ObjectARX反应器概述[转载]
  7. ZOJ3228【AC自动机】
  8. Lightoj 1008【规律】
  9. STLstack,queue
  10. grunt和seajs入门之--提取依赖、合并、压缩js文件