刚刚开始接触hadoop的时候,总觉得必须要先安装hadoop集群才能开始学习MR编程,其实并不用这样,当然如果你有条件有机器那最好是自己安装配置一个hadoop集群,这样你会更容易理解其工作原理。我们今天就是要给大家演示如何不用安装hadoop直接调试编程MapReduce函数。

开始之前我们先来理解一下mapreduce的工作原理:

hadoop集群是有DataNode和NameNode两种节点构成,DataNode负责存储数据本身而NameNode负责存储数据的元数据信息,在启动mapreduce任务时,数据首先是通过inputformat模块从集群的文件库中读出,然后按照设定的Splitsize进行Split(默认是一个block大小128MB),通过ReadRecorder(RR)将每个split的数据块按行进行轮询访问结果给到map函数,由map函数按照编程的代码逻辑进行处理,输出key和value。由map到reduce的处理过程中包含三件事情,Combiner(map端的预先处理,相对于map段reduce)Partitioner(负责将map输出数据均衡的分配给reduce)Shulffling&&sort(根据map输出的key进行洗牌和排序,将结果根据partitioner的分配情况传输给指定的reduce),最后reduce按照代码逻辑处理输出结果(也是key,value格式)。

注意:

map阶段的key-value对的格式是由输入的格式所决定的,如果是默认的TextInputFormat,则每行作为一个记录进程处理,其中key为此行的开头相对于文件的起始位置,value就是此行的字符文本
map阶段的输出的key-value对的格式必须同reduce阶段的输入key-value对的格式相对应

下面是wordcount的处理过程大家来理解一下:

现在我们开始我们的本地MR编程吧

首先我们得去官网下载一个hadoop安装包(本文用的hadoop2.6.0版本,不用安装,我们只要包中jars)

下载链接:https://archive.apache.org/dist/hadoop/common/(下载最多的那个就可以了,版本自己选个)

下面就上MR的代码吧:

  1. package loganalysis;
  2. import java.io.IOException;
  3. import java.util.StringTokenizer;
  4. import java.lang.*;
  5. import org.apache.hadoop.conf.Configuration;
  6. import org.apache.hadoop.fs.Path;
  7. import org.apache.hadoop.io.IntWritable;
  8. import org.apache.hadoop.io.Text;
  9. import org.apache.hadoop.mapreduce.Job;
  10. import org.apache.hadoop.mapreduce.Mapper;
  11. import org.apache.hadoop.mapreduce.Reducer;
  12. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  13. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  14. import org.apache.hadoop.util.GenericOptionsParser;
  15. public class WordCount {
  16. public static class TokenizerMapper
  17. extends Mapper<Object, Text, Text, IntWritable>{
  18. private final static IntWritable one = new IntWritable(1);
  19. private Text word = new Text();
  20. private String imei = new String();
  21. private String areacode  = new String();
  22. private String responsedata = new String();
  23. private String requesttime = new String();
  24. private String requestip = new String();
  25. //    map阶段的key-value对的格式是由输入的格式所决定的,如果是默认的TextInputFormat,则每行作为一个记录进程处理,其中key为此行的开头相对于文件的起始位置,value就是此行的字符文本
  26. //    map阶段的输出的key-value对的格式必须同reduce阶段的输入key-value对的格式相对应
  27. public void map(Object key, Text value, Context context
  28. ) throws IOException, InterruptedException {
  29. //StringTokenizer itr = new StringTokenizer(value.toString());
  30. int areai = value.toString().indexOf("areacode", 21);
  31. int imeii = value.toString().indexOf("imei", 21);
  32. int redatai = value.toString().indexOf("responsedata", 21);
  33. int retimei = value.toString().indexOf("requesttime", 21);
  34. int reipi = value.toString().indexOf("requestip", 21);
  35. if (areai==-1)
  36. { areacode=""; }
  37. else
  38. {
  39. areacode=value.toString().substring(areai+11);
  40. int len2=areacode.indexOf("\"");
  41. if(len2 <= 1)
  42. {
  43. areacode="";
  44. }
  45. else
  46. {
  47. areacode=areacode.substring(0,len2);
  48. }
  49. }
  50. if (imeii==-1)
  51. { imei=""; }
  52. else
  53. {
  54. imei=value.toString().substring(imeii+9);
  55. int len2=imei.indexOf("\\");
  56. if(len2 <= 1)
  57. {
  58. imei="";
  59. }
  60. else
  61. {
  62. imei=imei.substring(0,len2);
  63. }
  64. }
  65. if (redatai==-1)
  66. { responsedata=""; }
  67. else
  68. {
  69. responsedata=value.toString().substring(redatai+15);
  70. int len2=responsedata.indexOf("\"");
  71. if(len2 <= 1)
  72. {
  73. responsedata="";
  74. }
  75. else
  76. {
  77. responsedata=responsedata.substring(0,len2);
  78. }
  79. }
  80. if (retimei==-1)
  81. { requesttime=""; }
  82. else
  83. {
  84. requesttime=value.toString().substring(retimei+14);
  85. int len2=requesttime.indexOf("\"");
  86. if(len2 <= 1)
  87. {
  88. requesttime="";
  89. }
  90. else
  91. {
  92. requesttime=requesttime.substring(0,len2);
  93. }
  94. }
  95. if (reipi==-1)
  96. { requestip=""; }
  97. else
  98. {
  99. requestip=value.toString().substring(reipi+12);
  100. int len2=requestip.indexOf("\"");
  101. if(len2 <= 1)
  102. {
  103. requestip="";
  104. }
  105. else
  106. {
  107. requestip=requestip.substring(0,len2);
  108. }
  109. }
  110. /* while (itr.hasMoreTokens()) {
  111. string tim;
  112. word.set(itr.nextToken());
  113. context.write(word, one);
  114. }*/
  115. if(imei!=""&&areacode!=""&&responsedata!=""&&requesttime!=""&&requestip!="")
  116. {
  117. String wd=new String();
  118. wd=imei+"\t"+areacode+"\t"+responsedata+"\t"+requesttime+"\t"+requestip;
  119. //wd="areacode|"+areacode +"|imei|"+ imei +"|responsedata|"+ responsedata +"|requesttime|"+ requesttime +"|requestip|"+ requestip;
  120. word.set(wd);
  121. context.write(word, one);
  122. }
  123. }
  124. }
  125. public static class IntSumReducer
  126. extends Reducer<Text,IntWritable,Text,IntWritable> {
  127. private IntWritable result = new IntWritable();
  128. public void reduce(Text key, Iterable<IntWritable> values,
  129. Context context
  130. ) throws IOException, InterruptedException {
  131. int sum = 0;
  132. for (IntWritable val : values) {
  133. sum += val.get();
  134. }
  135. result.set(sum);
  136. context.write(key, result);
  137. }
  138. }
  139. public static void main(String[] args) throws Exception {
  140. Configuration conf = new Configuration();
  141. //  String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
  142. String[] otherArgs=new String[]{"/Users/mac/tmp/inputmr","/Users/mac/tmp/output1"};
  143. if (otherArgs.length != 2) {
  144. System.err.println("Usage: wordcount <in> <out>");
  145. System.exit(2);
  146. }
  147. //Job job = new Job(conf, "word count");
  148. Job job = Job.getInstance(conf);
  149. job.setJarByClass(WordCount.class);
  150. job.setMapperClass(TokenizerMapper.class);
  151. job.setCombinerClass(IntSumReducer.class);
  152. job.setReducerClass(IntSumReducer.class);
  153. job.setOutputKeyClass(Text.class);
  154. job.setOutputValueClass(IntWritable.class);
  155. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
  156. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
  157. System.exit(job.waitForCompletion(true) ? 0 : 1);
  158. }
  159. }

主要以上除了jdk1.7其他的jar包都来自hadoop安装包中的share文件下下面

如果你不知道那些包需要那就将share\hadoop\下面的所以得jar包都添加到项目中

注意:我的电脑是mac pro如果你的是Windows机器相关的路径需要修改一下,前面加上“file:///”( file:///D:\tmp\input file:///D:\tmp\output)

String[] otherArgs=new String[]{"file:///D:\tmp\input","file:///D:\tmp\output"};
这个程序核心代码都是在map中,主要做了系统日志中相关核心字段的提取并拼接以key形式返回给reduce,value都是设置为1,是为了方便以后的统计。因为是实例所以简单的弄了几个字段,实际可不止这些。

下面给下测试的系统日志:

  1. 2016-04-18 16:00:00 {"areacode":"浙江省丽水市","countAll":0,"countCorrect":0,"datatime":"4134362","logid":"201604181600001184409476","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966390499\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13989589062\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13989589062\"}","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  2. 2016-04-18 16:00:00 {"areacode":"宁夏银川市","countAll":0,"countCorrect":0,"datatime":"4715990","logid":"201604181600001858043208","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400120\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1210\",\"imei\":\"A0000044ABFD25\",\"subjectNum\":\"15379681917\",\"imsi\":\"460036951451601\",\"queryNum\":\"\"}","requestip":"115.168.93.87","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果","userAgent":"ZTE-Me/Mobile"}
  3. 2016-04-18 16:00:00 {"areacode":"黑龙江省哈尔滨市","countAll":0,"countCorrect":0,"datatime":"5369561","logid":"201604181600001068429609","requestinfo":"{\"interfaceUserName\":\"12345678900987654321\",\"queryNum\":\"\",\"timestamp\":\"1460966400139\",\"sign\":\"4\",\"imsi\":\"460030301212545\",\"imei\":\"35460207765269\",\"subjectNum\":\"55588237\",\"subjectPro\":\"123456\",\"remark\":\"4\",\"channelno\":\"2100\"}","requestip":"42.184.41.180","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  4. 2016-04-18 16:00:00 {"areacode":"浙江省丽水市","countAll":0,"countCorrect":0,"datatime":"4003096","logid":"201604181600001648238807","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966391025\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13989589062\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13989589062\"}","requestip":"36.16.128.234","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  5. 2016-04-18 16:00:00 {"areacode":"广西南宁市","countAll":0,"countCorrect":0,"datatime":"4047993","logid":"201604181600001570024205","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966382871\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004853168C\",\"subjectNum\":\"07765232589\",\"imsi\":\"460031210400007\",\"queryNum\":\"13317810717\"}","requestip":"219.159.72.3","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  6. 2016-04-18 16:00:00 {"areacode":"海南省五指山市","countAll":0,"countCorrect":0,"datatime":"5164117","logid":"201604181600001227842048","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399159\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1017\",\"imei\":\"A000005543AFB7\",\"subjectNum\":\"089836329061\",\"imsi\":\"460036380954376\",\"queryNum\":\"13389875751\"}","requestip":"140.240.171.71","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  7. 2016-04-18 16:00:00 {"areacode":"山西省","countAll":0,"countCorrect":0,"datatime":"14075772","logid":"201604181600001284030648","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966400332\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"1006\",\"imei\":\"A000004FE0218A\",\"subjectNum\":\"03514043633\",\"imsi\":\"460037471517070\",\"queryNum\":\"\"}","requestip":"1.68.5.227","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  8. 2016-04-18 16:00:00 {"areacode":"四川省","countAll":0,"countCorrect":0,"datatime":"6270982","logid":"201604181600001173504863","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966398896\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"13666231300\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"13666231300\"}","requestip":"182.144.66.97","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}
  9. 2016-04-18 16:00:00 {"areacode":"浙江省","countAll":0,"countCorrect":0,"datatime":"4198522","logid":"201604181600001390637240","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966399464\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"05533876327\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"05533876327\"}","requestip":"36.23.9.49","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"000000","responsedata":"操作成功"}
  10. 2016-04-18 16:00:00 {"areacode":"江苏省连云港市","countAll":0,"countCorrect":0,"datatime":"4408097","logid":"201604181600001249944032","requestinfo":"{\"sign\":\"4\",\"timestamp\":\"1460966395908\",\"remark\":\"4\",\"subjectPro\":\"123456\",\"interfaceUserName\":\"12345678900987654321\",\"channelno\":\"100\",\"imei\":\"12345678900987654321\",\"subjectNum\":\"18361451463\",\"imsi\":\"12345678900987654321\",\"queryNum\":\"18361451463\"}","requestip":"58.223.4.210","requesttime":"2016-04-18 16:00:00","requesttype":"0","responsecode":"010005","responsedata":"无查询结果"}

最后给出运行结果截图:

最新文章

  1. .NET分布式事务处理
  2. soap和wsdl的定义
  3. linux nc (NetCat) 命令详解
  4. 关于UITextView / String的尺寸
  5. ios开发怎么获取输入的日期得到星期
  6. Netbeans 中的编译器相关配置
  7. COGS8 备用交换机
  8. HDU 4063 Aircraft(计算几何)(The 36th ACM/ICPC Asia Regional Fuzhou Site —— Online Contest)
  9. getbyclass
  10. 【转载】图论 500题——主要为hdu/poj/zoj
  11. 如何让windows版Safari支持H5 audio/video?
  12. form里面的action和method(post和get的方法)使用
  13. python练习_购物车(简版)
  14. Hadoop 少量map/reduce任务执行慢问题
  15. 【学习笔记】Spring JdbcTemplate (3-3-3)
  16. ●BZOJ 2560 串珠子
  17. Vue 可输入可下拉组件的封装
  18. Django关联数据库时报错TypeError: __init__() missing 1 required positional argument: &#39;on_delete&#39;
  19. JavaServer Faces (JSF) with Spring
  20. 善用 CSS 中的 table-layout 屬性加快 Table 的顯示速度

热门文章

  1. JS中字符串的相关操作
  2. 是否 whether ,if
  3. Linux 查看当前时间和修改系统时间
  4. 【canvas】三角光阑
  5. vue - 页面跳转
  6. UNIX网络编程读书笔记:字节操纵函数
  7. JavaScript严格模式下this指向
  8. vlc模块间共享变量
  9. Android Exception 6 (adapter is not modified from a background thread)
  10. Android 四大组件(Activity、Service、BroadCastReceiver、ContentProvider)