简介

  • Combiner是Mapper和Reducer之外的组件。
  • Combiner是在Reducer运行之前,对Mapper数据进行处理的。

Wordcount实例

WordCountMapper

package com.neve.Combiner;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class WordCountMapper extends Mapper<LongWritable, Text,Text, IntWritable>{ private Text outk = new Text();
//每次读到一个单词都为1
private IntWritable outv = new IntWritable(1); @Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //1.将text换为string
String line = value.toString();
//2.分割
String[] words = line.split(" ");
//3.输出
for (String word : words) {
//将String转换为Text
outk.set(word);
//写出
context.write(outk, outv);
}
} }

WordCountReducer

package com.neve.Combiner;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> { private IntWritable outv = new IntWritable(); @Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) {
sum += value.get();
} outv.set(sum); context.write(key,outv); }
}

WordCountCombiner

package com.neve.Combiner;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class WordCountCombiner extends Reducer<Text, IntWritable,Text,IntWritable> { private IntWritable outv = new IntWritable(); @Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) {
sum += value.get();
} outv.set(sum); context.write(key,outv); }
}

WordCountDriver

package com.neve.Combiner;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class WordCountDriver { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { //1.创建配置
Configuration configuration = new Configuration();
//2.创建job
Job job = Job.getInstance(configuration);
//3.关联驱动类
job.setJarByClass(WordCountDriver.class);
//4.关联mapper和reducer类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//5.设置mapper的输出值和value
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//6.设置最终的输出值和value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//7.设置输入输出路径
FileInputFormat.setInputPaths(job,new Path("F:\\Workplace\\IDEA_Workplace\\hadoopstudy2\\input"));
FileOutputFormat.setOutputPath(job,new Path("F:\\Workplace\\IDEA_Workplace\\hadoopstudy2\\output"));
//设置combiner
job.setCombinerClass(WordCountCombiner.class);
//8.提交job
job.waitForCompletion(true);
} }

可以看到combiner与reducer类相同,便可直接将reducer类当做combiner使用(该案例)。

最新文章

  1. NVelocity
  2. 关于FluentNhibernate数据库连接配置,请教
  3. [Notes] Learn Python2.7 From Python Tutorial
  4. hibernate4中使用Session doWork()方法进行jdbc操作(代码)
  5. SAP HR宏 rp-provide-from-last
  6. 腾讯sdk学到了
  7. oracle的常用函数
  8. [转]人人网首页拖拽上传详解(HTML5 Drag&amp;Drop、FileReader API、formdata)
  9. 使用ANT 生成Xfire 客户端端文件
  10. python 本地文档查看
  11. pc端有弹出层 并有滚动的时候遇到的问题以及解决
  12. leetcode&mdash;word ladder II
  13. Dubbo在Spring和Spring Boot中的使用
  14. c# const与readonly 关键字的比较
  15. 【转义字符】HTML 字符实体&amp;lt; &amp;gt: &amp;amp;
  16. 开发快速定位需求(Coding之前的工作)
  17. LOJ#2085 循环之美
  18. 图解Go的channel底层原理
  19. 全方位理解Android权限之底层实现概览
  20. 为虚拟机配置NAT网络

热门文章

  1. 论文解读 - Composition Based Multi Relational Graph Convolutional Networks
  2. js给多级复杂动态变量赋值
  3. peterson算法(软件互斥 转)
  4. Quirc二维码识别模块
  5. http请求返回ObjectJson,Array之类转换类型
  6. Java 实例化接口或抽象类
  7. SQL Server DATEDIFF() 函数用法
  8. ubuntu 18.04安装RTL8821CE无线网卡驱动
  9. PHP核心配置基础解读
  10. laravel 验证器使用