在AWS里用Elastic Map Reduce 开一个Cluster

然后登陆master node并编译下面程序:

import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1);
private Text word = new Text(); @Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
} } public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Word Count hadoop-0.20"); //setting the class names
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class); //setting the output data type classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); //to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

设置:

export CLASSPATH=$CLASSPATH:/home/hadoop/*:/home/hadoop/lib/*:'.'

javac WordCount.java

jar cvf WordCount.jar *.class

hadoop jar WordCount.jar WordCount s3://15-319-s13/book-dataset/pg_00 /output

执行成功后,由于output目录在Hadoop FS下,所以能够这样查看:

hadoop fs -cat /output/part-r-00000  | less

主要參考:

http://kickstarthadoop.blogspot.com/2011/04/word-count-hadoop-map-reduce-example.html

http://kickstarthadoop.blogspot.com/2011/05/word-count-example-with-hadoop-020.html

最新文章

  1. java 保留字符串数字的位数,不够前面补0
  2. 2016年iOS笔试题
  3. 开发者所需要知道的 iOS 10 SDK 新特性
  4. android小技巧(二)
  5. alu features menu
  6. LeetCode 【47. Permutations II】
  7. NLPP-02-Exercises
  8. AndroidManifest.xml 配置权限大全
  9. 文章投稿 latex 生成 pdf的字体Embeded问题解决(转自兵马俑BBS)
  10. ecshop 分类树全部显示
  11. openstack domain serverID connect uri
  12. Java邮件服务学习之三:邮箱服务客户端-Java Mail
  13. poj2192
  14. 关于Linux中cd的一些快捷用法
  15. html5 postMessage解决跨域、跨窗口消息传递[转载]
  16. [原创]Java性能优化权威指南读书思维导图4
  17. 性能提速:debounce(防抖)、throttle(节流/限频)
  18. Android教你怎样一步步打造通用适配器
  19. Linux发展历史图
  20. linux用户权限 -&gt; 系统用户管理

热门文章

  1. EasyUI系列学习(五)-Resizable(调整大小)
  2. Java二分法查找
  3. codeforces_305C_STLset
  4. lsb_release No LSB modules are available
  5. jmeter接口测试小结
  6. CAD梦想看图6.0安卓版 20181022更新
  7. ArrayList经典Demo
  8. java基础学习日志---File方法分析
  9. manacher(马拉车)算法
  10. CCF201612-2 工资计算 java(100分)