Hadoop AWS Word Count 样例
2024-08-30 21:44:52
在AWS里用Elastic Map Reduce 开一个Cluster
然后登陆master node并编译下面程序:
import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1);
private Text word = new Text(); @Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
} } public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "Word Count hadoop-0.20"); //setting the class names
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class); //setting the output data type classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); //to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1);
} }
设置:
export CLASSPATH=$CLASSPATH:/home/hadoop/*:/home/hadoop/lib/*:'.'
javac WordCount.java
jar cvf WordCount.jar *.class
hadoop jar WordCount.jar WordCount s3://15-319-s13/book-dataset/pg_00 /output
执行成功后,由于output目录在Hadoop FS下,所以能够这样查看:
hadoop fs -cat /output/part-r-00000 | less
主要參考:
http://kickstarthadoop.blogspot.com/2011/04/word-count-hadoop-map-reduce-example.html
http://kickstarthadoop.blogspot.com/2011/05/word-count-example-with-hadoop-020.html
最新文章
- java 保留字符串数字的位数,不够前面补0
- 2016年iOS笔试题
- 开发者所需要知道的 iOS 10 SDK 新特性
- android小技巧(二)
- alu features menu
- LeetCode 【47. Permutations II】
- NLPP-02-Exercises
- AndroidManifest.xml 配置权限大全
- 文章投稿 latex 生成 pdf的字体Embeded问题解决(转自兵马俑BBS)
- ecshop 分类树全部显示
- openstack domain serverID connect uri
- Java邮件服务学习之三:邮箱服务客户端-Java Mail
- poj2192
- 关于Linux中cd的一些快捷用法
- html5 postMessage解决跨域、跨窗口消息传递[转载]
- [原创]Java性能优化权威指南读书思维导图4
- 性能提速:debounce(防抖)、throttle(节流/限频)
- Android教你怎样一步步打造通用适配器
- Linux发展历史图
- linux用户权限 ->; 系统用户管理