spark中产生shuffle的算子
2024-09-07 01:17:33
Spark中产生shuffle的算子
作用 |
算子名 |
能否替换,由谁替换 |
去重 |
distinct() |
不能 |
聚合 |
reduceByKey() |
groupByKey |
groupBy() |
||
groupByKey() |
reduceByKey |
|
aggregateByKey() |
||
combineByKey() |
||
排序 |
sortByKey() |
|
sortBy() |
||
重分区 |
coalesce() |
|
repartition() |
||
集合或者表操作 |
Intersection() |
|
Substract() |
||
SubstractByKey() |
||
Join() |
||
LeftOutJoin() |
https://www.cnblogs.com/Alex-zqzy/p/9949117.html
去重
def distinct() def distinct(numPartitions: Int)
聚合
def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)] def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)] def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])] def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])] def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)] def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)] def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)] def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)] def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]
排序
def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)] def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
重分区
def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty) def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)
集合或者表操作
def intersection(other: RDD[T]): RDD[T] def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] def intersection(other: RDD[T], numPartitions: Int): RDD[T] def subtract(other: RDD[T], numPartitions: Int): RDD[T] def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)] def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)] def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)] def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))] def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))] def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]
最新文章
- opencv_判断两张图片是否相同
- Oracle 语法
- socket通信入门
- 南阳理工ACM975--关于521
- js鼠标及对象坐标控制属性详细解析
- dubbo监控活跃线程数
- 【jquery学习笔记】关于$(window),$(";html,body";).scroll()的在不同浏览器的不同反应
- POJ 3177 Redundant Paths - from lanshui_Yang
- box-shadow IE8兼容处理
- 请求返回时的Size/Content Time/Latency的区别
- 免费靠谱的 Let’s Encrypt 免费 https 证书申请全过程
- day 25 二十五、抽象类、多态、鸭子、反射、异常处理
- 异常:Instantiation of bean failed; nested exception is java.lang.NoSuchMethodError: com.google.common.base.Preconditions.che ckState(ZLjava/lang/String;I)V
- laravel5.8笔记四:中间件
- Linux下使用Nginx代理访问json文件报404错误
- js将有父子关系的数据转换成树形结构数据
- Spark中cache和persist的区别
- java面试第十四天
- e595. Drawing an Image
- hdu1847sg函数