Wordcount -- MapReduce example -- Mapper
2024-10-11 04:32:32
Mapper maps input key/value pairs into intermediate key/value pairs.
E.g.
Input: (docID, doc)
Output: (term, 1)
Mapper Class Prototype:
Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE
Special Data Type for Mapper
IntWritable
A serializable and comparable object for integer.
Example:
private final static IntWritable one = new IntWritable(1);
Text
A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.
Example:
private Text word = new Text();
Hadoop defines its own classes for general data types.
-- All "values" must have Writable interface;
-- All "keys" must have WritableComparable interface;
Map Method for Mapper
Method header
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.
Tokenization
// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());
Building (key, value) pairs
// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}
Map Method Summary
Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
Overview of Mapper Class
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
最新文章
- 【CentOS】学习Bash
- C# yield
- 剑指Offer 二维数组中的查找
- 特定场景下SQL的优化
- HDU5802-windows 10-dfs+贪心
- 2、SQL基础整理(聚合函数)
- GCC生成的汇编代码
- rsyslogd配置文件详解
- Struts2第四天
- VMware+Windbg双机调试
- Verilog笔记——Verilog数字系统设计(第二版)夏宇闻
- 删除了原有的offset之后再次启动会报错park Streaming from Kafka has error numRecords must not ...
- pyqt5界面使用
- 手机App调试(Android)
- 一个简单的";RPC框架";代码分析
- c# sharpsvn 客户端开发总结
- PAT 1074 Reversing Linked List[链表][一般]
- nio入门教程
- 解读ASP.NET 5 &; MVC6 ---- 系列文章
- 【转载】RocketMQ与Kafka对比(18项差异)