Partitioner:

Partitioning and Combining take place between Map and Reduce phases. It is to club the data which should go to the same reducer based on keys. The number of partitioners is equal to the number of reducers. That means a partitioner will divide the data according to the number of reducers. Therefore, the data passed from a single partitioner is processed by a single Reducer. HashPartitioner is the default Partitioner in hadoop.

A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job. Records having the same key value go into the same partition (within each mapper).

Partition doing jobs on local machine.

Combiner:

Combiner is a 'mini-reducer' (semi-reducer), used to process reducer's work before transfering data onto reducers. It can reduce network congestion. An example is shown below:

Shuffle:

shuffle notify master to copy files onto reducer machines. In the final output of map task there can be multiple partitions and these partitions should go to different reduce task. Shuffling is basically transferring map output partitions to the corresponding reduce tasks. Map task notified application master about completion of map task and application master notifies corresponding reducer to copy the map output into reduce machine. As shuffling can start even before the map phase has finished so this saves some time and completes the tasks in lesser time.

References:

https://www.cnblogs.com/hadoop-dev/p/5910459.html

https://blog.csdn.net/bitcarmanlee/article/details/60137837

http://geekdirt.com/blog/map-reduce-in-detail/

Using hash function to map immediate K,V pairs

https://en.wikipedia.org/wiki/Hash_function

https://www.tutorialspoint.com/map_reduce/map_reduce_partitioner.htm

https://data-flair.training/blogs/hadoop-partitioner-tutorial/

最新文章

  1. Android屏幕适配
  2. WinForm常用属性
  3. Zabbix监控nginx-rtmp status(html版)
  4. ov5648摄像头调试
  5. 以一则LUA实例说明敏捷开发中“分离构造和使用”原则
  6. Python 科学计算涉及模块
  7. codeforces B. Making Sequences is Fun 解题报告
  8. 使用VideoView播放、暂停、快进视频
  9. jqGrid 设置列宽
  10. PV和并发
  11. mysqli connect database and print
  12. java多线程编程核心技术——第二章
  13. Android适配难题全面总结
  14. jdk和tomcat环境配置
  15. html中 submit和button的区别?
  16. UiAutomator1.0 与 UiAutomator2.0
  17. tomcat 闪退问题排查
  18. centos 6 秘钥分发
  19. c# 写入Xml 元素(<![CDATA[ ]]>)
  20. GraphX中Pregel单源点最短路径(转)

热门文章

  1. installsheild2011打包程序internal build error 6213
  2. restTemplate工具类
  3. a标签实现下载canvas图片
  4. STS插件创建springboot项目,pom第一行报unkown错误
  5. kali优化配置(2)
  6. CentOS7系统局域网内配置本地yum源解决cannot find a valid baseurl for repo
  7. Object中有哪些公用方法?
  8. c# 匿名委托
  9. highlight语法高亮推荐样式
  10. 信号量计算问题--n个进程, 共享3个资源, 当前信号量为-1, 其他进程继续执行P操作, 那么信号量应该继续减