去重

def distinct()
def distinct(numPartitions: Int)

聚合

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]
def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]
def groupBy[K](f: T => K, p: Partitioner):RDD[(K, Iterable[V])]
def groupByKey(partitioner: Partitioner):RDD[(K, Iterable[V])]
def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner): RDD[(K, U)]
def aggregateByKey[U: ClassTag](zeroValue: U, numPartitions: Int): RDD[(K, U)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C): RDD[(K, C)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, numPartitions: Int): RDD[(K, C)]
def combineByKey[C](createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null): RDD[(K, C)]

排序

def sortByKey(ascending: Boolean = true, numPartitions: Int = self.partitions.length): RDD[(K, V)]
def sortBy[K](f: (T) => K, ascending: Boolean = true, numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T]

重分区

def coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer] = Option.empty)
def repartition(numPartitions: Int)(implicit ord: Ordering[T] = null)

集合或者表操作

def intersection(other: RDD[T]): RDD[T]
def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
def intersection(other: RDD[T], numPartitions: Int): RDD[T]
def subtract(other: RDD[T], numPartitions: Int): RDD[T]
def subtract(other: RDD[T], p: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
def subtractByKey[W: ClassTag](other: RDD[(K, W)]): RDD[(K, V)]
def subtractByKey[W: ClassTag](other: RDD[(K, W)], numPartitions: Int): RDD[(K, V)]
def subtractByKey[W: ClassTag](other: RDD[(K, W)], p: Partitioner): RDD[(K, V)]
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
def leftOuterJoin[W](other: RDD[(K, W)]): RDD[(K, (V, Option[W]))]

最新文章

  1. C++基础——模拟事务 (1)COMMAND模式
  2. 二十三、Java基础--------网络编程
  3. mysql and 和 or 的 优先级和 查询问题
  4. <《基金经理投资笔记丛书4-1:投资是一种生活方式》>
  5. 【Cocos2d-Js基础教学(3)各种基类的定义和使用】
  6. oracle的例程
  7. 编写一个单独的Web Service for Delphi
  8. hdu_5802_Windows 10(贪心)
  9. JAVA EE 运行环境配置(包含JAVA SE)
  10. Android的fuzz测试技术之符号执行浅谈-android学习之旅(82)
  11. <<C语言--神奇的指针>>
  12. 使用iframe方式获得svg中的DOM元素,和svg 的 contentDocument 返回 null
  13. Duplicate entry '0' for key 'PRIMARY'
  14. How-to: Do Statistical Analysis with Impala and R
  15. Why in the code “456”+1, output is “56”
  16. 使用Holer将本地端口映射到公网
  17. <容错性FaultTolerance><Hadoop><Spark>
  18. 一份可以发布jar包到MAVEN中央仓库的POM
  19. /etc/fstab 参数详解(转)
  20. Mirror--自增键在镜像中的影响

热门文章

  1. lapis http verb 处理
  2. postman参数化的方法
  3. Java-Runoob:Java 异常处理
  4. laravel 环境自编译过程
  5. 性能监控之Spotlight
  6. WebStorm ES6 语法支持设置
  7. 017:磁盘I/0介绍和测试
  8. Linux学习笔记 - Shell 函数的使用
  9. Cookie与Session的复习
  10. ASP.NET MVC5+EF6+EasyUI 后台管理系统(1)-前言与目录(转)