Mapper maps input key/value pairs into intermediate key/value pairs.

E.g.

Input: (docID, doc)

Output: (term, 1)

Mapper Class Prototype:

Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE

Special Data Type for Mapper

IntWritable

A serializable and comparable object for integer.

Example:

private final static IntWritable one = new IntWritable(1);

Text

A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.

Example:

private Text word = new Text();

Hadoop defines its own classes for general data types.

-- All "values" must have Writable interface;

-- All "keys" must have WritableComparable interface;

Map Method for Mapper

Method header

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.

Tokenization

// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());

Building (key, value) pairs

// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}

Map Method Summary

Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs

  public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

Overview of Mapper Class

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

最新文章

  1. 【CentOS】学习Bash
  2. C# yield
  3. 剑指Offer 二维数组中的查找
  4. 特定场景下SQL的优化
  5. HDU5802-windows 10-dfs+贪心
  6. 2、SQL基础整理(聚合函数)
  7. GCC生成的汇编代码
  8. rsyslogd配置文件详解
  9. Struts2第四天
  10. VMware+Windbg双机调试
  11. Verilog笔记——Verilog数字系统设计(第二版)夏宇闻
  12. 删除了原有的offset之后再次启动会报错park Streaming from Kafka has error numRecords must not ...
  13. pyqt5界面使用
  14. 手机App调试(Android)
  15. 一个简单的&quot;RPC框架&quot;代码分析
  16. c# sharpsvn 客户端开发总结
  17. PAT 1074 Reversing Linked List[链表][一般]
  18. nio入门教程
  19. 解读ASP.NET 5 &amp; MVC6 ---- 系列文章
  20. 【转载】RocketMQ与Kafka对比(18项差异)

热门文章

  1. 课时56.marquee标签(理解)
  2. Sftp搭建与配置参考
  3. 557. Reverse Words in a String III (5月25日)
  4. Linux 学习第三天
  5. Enable directory listing on Nginx Web Server
  6. 32位ubuntu16.04桌面版系统安装
  7. linux系统常用命令统计及shell特殊字符
  8. Elasticsearch 6 重要参数配置
  9. ruby中url解码并替换非法字符
  10. C#/STM32 WAV转byte WAV数据格式