fold and reduce both aggregate over a collection by implementing an operation you specify, the major different is the starting point of the aggregation. For fold(), you have to specify the starting value, and for reduce() the starting value is the first (or possibly an arbitrary) element in the collection.

Simple examples - we can sum the numbers in a collection using both functions: 
(1 until 10).reduce( (a,b) => a+b ) 
(1 until 10).fold(0)( (a,b) => a+b )

With fold, we want to start at 0 and cumulatively add each element. In this case, the operation passed to fold() and reduce() were very similar, but it is helpful to think about fold in the following way. For the operation we pass to fold(), imagine its two arguments are (i) the current accumulated value and (ii) the next value in the collection,

(1 until 10).fold(0)( (accumulated_so_far, next_value) => accumulated_so_far + next_value ).

So the result of the operation, accumulated_so_far + next_value, will be passed to the operation again as the first argument, and so on.

In this way, we could count the number of elements in a collection using fold,

(1 until 10).fold(0)( (accumulated_so_far, next_value) => accumulated_so_far + 1 ).

When it comes to Spark, here’s another thing to keep in mind. For both reduce and fold, you need to make sure your operation is both commutative and associative. For RDDs, reduce and fold are implemented on each partition separately, and then the results are combined using the operation.  With fold, this could get you into trouble because an empty partition will emit fold’s starting value, so the number of partitions might erroneously affect the result of the calculation, if you’re not careful about the operation. This would occur with the ( (a,b) => a+1) operation from above (see http://stackoverflow.com/questions/29150202/pyspark-fold-method-output).

最新文章

  1. 【前端】Node.js学习笔记
  2. How secure FB Messenger is?
  3. 【PKUSC 2015的一道数学题】
  4. BZOJ2285 : [Sdoi2011]保密
  5. ASP.NET中进行消息处理(MSMQ) 三
  6. URI中的常用属性
  7. Comparable & Comparator
  8. SSRS(rdl报表)分页显示表头和冻结表头
  9. 自学 iOS - 三十天三十个 Swift 项目 第一天
  10. hdu1083二分图匹配模板题
  11. R语言分析(二)——薛毅R语言第二章后面习题解析
  12. __call PHP伪重载方法
  13. jvm GC
  14. python五十八课——正则表达式(分组)
  15. JDBC(2)—Statement
  16. 如何查看当前项目Laya的引擎版本
  17. OC 里面 webView与js
  18. html的css选择器
  19. 如何成为一名优秀的CTO(首席技术官)
  20. js实现卡号每四位空格分隔

热门文章

  1. 日常记录-Pandas Cookbook
  2. Python笔记12-----画图Matplotlib
  3. 洛谷P1567 统计天数
  4. [luogu P2590 ZJOI2008] 树的统计 (树链剖分)
  5. Elasticsearch 7.0 正式发布,盘他!
  6. BA--干球温度、露点温度和湿球温度--概念
  7. hdu 1542 线段树之扫描线之面积并
  8. Arduino Yun高速新手教程(大学霸内部资料)
  9. 南邮JAVA程序设计实验1 综合图形界面程序设计
  10. spark中的广播变量broadcast