Spark会产生shuffle的算子
2024-10-18 23:35:33
去重
def distinct()
def distinct(numPartitions: Int)
聚合
def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]
def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]
def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]
def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]
def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]
def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]
排序
def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]
def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]
重分区
def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)
def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)
集合或者表操作
def intersection(other: RDD[T]): RDD[T]
def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
def intersection(other: RDD[T], numPartitions: Int): RDD[T]
def subtract(other: RDD[T], numPartitions: Int): RDD[T]
def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]
def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]
def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]
最新文章
- C++基础——模拟事务 (1)COMMAND模式
- 二十三、Java基础--------网络编程
- mysql and 和 or 的 优先级和 查询问题
- <;《基金经理投资笔记丛书4-1:投资是一种生活方式》>;
- 【Cocos2d-Js基础教学(3)各种基类的定义和使用】
- oracle的例程
- 编写一个单独的Web Service for Delphi
- hdu_5802_Windows 10(贪心)
- JAVA EE 运行环境配置(包含JAVA SE)
- Android的fuzz测试技术之符号执行浅谈-android学习之旅(82)
- <;<;C语言--神奇的指针>;>;
- 使用iframe方式获得svg中的DOM元素,和svg 的 contentDocument 返回 null
- Duplicate entry &#39;0&#39; for key &#39;PRIMARY&#39;
- How-to: Do Statistical Analysis with Impala and R
- Why in the code “456”+1, output is “56”
- 使用Holer将本地端口映射到公网
- <;容错性FaultTolerance>;<;Hadoop>;<;Spark>;
- 一份可以发布jar包到MAVEN中央仓库的POM
- /etc/fstab 参数详解(转)
- Mirror--自增键在镜像中的影响