1. 在linux(虚拟机环境)下安装hadoop2.8.3

1.1 安装JDK环境

1.2 安装hadoop

1.3 进行配置:core-site.xml,hdfs-site.xml设置

1.4 初始化namenode

1.5 启动dfs和yarn

2. 在主机WIN10下安装STS及maven

3. 在将linux下hadoop目录全部拷贝到WIN10下,设置HADOOP-HOME环境变量,并将HADOOP-HOME/bin加入PATH

4. 将hadoop-eclipse-plugin-2.8.3插件拷贝到STS的plugin目录下,并将winutils.exe放入win10中hadoop/bin目录下,将hadoop.dll加入到windows/system32目录下

5. 启动STS安装hadoop-eclipse-plugin-2.8.3插件(在sts中设置hadoop的安装目录,并建立一个linux下hadoop服务器的实例,设置其dfs server的IP和port),可以看到所有hadoop的节点内目录

6. 生成一个mapreduce项目,在项目的src中新建一个wordcount.java文件

7. 将代码加入其中:

package helloWordCount;

public class WordCount {

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.addResource("../core-site.xml");//将hadoop的设置导入,这样就不会出现找不到目录的情况了
conf.addResource("../hdfs-site.xml");//将hadoop的设置导入,这样就不会出现找不到目录的情况了
//String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
String[] otherArgs = new String[] {"/input", "/output"};
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

8.完成后,右击文件wordcount.java,点击Run As-》run on hadoop运行

第一次运行时出现找不到指定的文件夹的异常,其原因是没有将hadoop的基本设置导入,这时需要加入上面带有下划线的两行,一切OK!

最新文章

  1. PB gird类型数据窗口 设置分组、分组小计、合计
  2. nginx反向代理docker registry报”blob upload unknown&quot;解决办法
  3. BZOJ 1853: [Scoi2010]幸运数字
  4. July 1st, Week 27th Friday, 2016
  5. ajax 提交數據
  6. 记录php日志
  7. (medium)LeetCode 238.Product of Array Except Self
  8. 设计模式 适配器-Adapter
  9. LeetCode_Reverse Nodes in k-Group
  10. FLAnimatedImageView处理gif过程
  11. onCreate源码分析
  12. Zabbix常见触发器表达式
  13. Mysql数据库中索引的概念总结
  14. VUE项目安装
  15. EJB到底是什么?
  16. kubernetes集群搭建(6):kubernetes基本使用演示
  17. haproxy实现会话保持
  18. 一个小工具 TcpTextListener
  19. spring 判断非空提示断言
  20. Android Service和Binder、AIDL

热门文章

  1. WordCount--实现字符,单词,代码统计
  2. python_反射:动态导入模块
  3. 第59题:螺旋矩阵 II
  4. sublimetext插件自定义respository
  5. koa常用api文档整理
  6. python中闭包的概念
  7. React中生命周期
  8. learning armbian steps(5) ----- armbian 构建arm rootfs
  9. Linux之基础命令
  10. 检测系统中进程占满单个cpu的情况