《hadoop the definitive way》(third version)中的Benchmarking a Hadoop Cluster Test Cases的class在新的版本中已不再试hadoop-*-test.jar, 新版本中做BanchMark Test应采用如下方法:


1. TestDFSIO

write

TestDFSIO用来测试HDFS的I/O 性能,用一个MapReduce job来并行读取/写入文件, 每个文件在一个独立的map task里被读取或写入,而map的输出用来收集该文件被执行过程中的统计数据,

test1 写入2个文件,每个10MB

%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient--tests.jar TestDFSIO -write -nrFiles 
 -fileSize 

提交job时的consol输出:

// :: INFO fs.TestDFSIO: TestDFSIO.1.7
// :: INFO fs.TestDFSIO: nrFiles =
// :: INFO fs.TestDFSIO: nrBytes (MB) = 10.0
// :: INFO fs.TestDFSIO: bufferSize =
// :: INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
// :: INFO fs.TestDFSIO: creating control  bytes,  files
// :: INFO fs.TestDFSIO: created control files  files
// :: INFO client.RMProxy: Connecting to ResourceManager at cluster1/
// :: INFO client.RMProxy: Connecting to ResourceManager at cluster1/
// :: INFO mapred.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384321503481_0003
// :: INFO impl.YarnClientImpl: Submitted application application_1384321503481_0003 to ResourceManager at cluster1/
// :: INFO mapreduce.Job: The url to track the job: http://cluster1:8888/proxy/application_1384321503481_0003/
// :: INFO mapreduce.Job: Running job: job_1384321503481_0003

从consol输出可以看到:

(1)最终文件默认会被写入id_data文件夹下的/benchmarks/TestDFSIO文件夹下, 通过test.build.data的系统变量可以修改默认设置。

(2)2个map task (number of splits:2), 同时也证明每一个文件的写入或读取都被单独作为一个map task

job跑完后的console输出:

// :: INFO mapreduce.Job:  map % reduce %
// :: INFO mapreduce.Job: Job job_1384321503481_0003 completed successfully
// :: INFO mapreduce.Job: Counters:
    File System Counters
        FILE: Number of bytes read=
        FILE: Number of bytes written=
        FILE: Number of read operations=
        FILE: Number of large read operations=
        FILE: Number of
        HDFS: Number of bytes read=
        HDFS: Number of bytes written=
        HDFS: Number of read operations=
        HDFS: Number of large read operations=
        HDFS: Number of
    Job Counters
        Launched map tasks=
        Launched reduce tasks=
        Data-local map tasks=
        Total
        Total
    Map-Reduce Framework
        Map input records=
        Map output records=
        Map output bytes=
        Map output materialized bytes=
        Input
        Combine input records=
        Combine output records=
        Reduce input
        Reduce shuffle bytes=
        Reduce input records=
        Reduce output records=
        Spilled Records=
        Shuffled Maps =
        Failed Shuffles=
        Merged Map outputs=
        GC
        CPU
        Physical memory (bytes) snapshot=
        Virtual memory (bytes) snapshot=
        Total committed heap usage (bytes)=
    Shuffle Errors
        BAD_ID=
        CONNECTION=
        IO_ERROR=
        WRONG_LENGTH=
        WRONG_MAP=
        WRONG_REDUCE=
    File Input Format Counters
        Bytes Read=
    File Output Format Counters
        Bytes Written=
// :: INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
// :: INFO fs.TestDFSIO:            Date &  :: PST
// :: INFO fs.TestDFSIO:        Number of files:
// :: INFO fs.TestDFSIO: Total MBytes processed: 20.0
// :: INFO fs.TestDFSIO:      Throughput mb/sec: 0.5591277606933184
// :: INFO fs.TestDFSIO: Average IO rate mb/sec: 0.5635650753974915
// :: INFO fs.TestDFSIO:  IO rate std deviation: 0.05000733272172887
// :: INFO fs.TestDFSIO:     Test exec time sec: 534.566
// :: INFO fs.TestDFSIO:

从图中可以看到map task 2, reduce task 1, 统计结果中有平均I/O速率,整体速率, job运行时间,写入文件数;

read

%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient--tests.jar TestDFSIO -read  -nrFiles  -fileSize 

就不仔细分析了,自己试试。

2. MapReduce Test with Sort

hadoop提供了一个MapReduce 程序,可以测试整个MapReduce System。此基准测试分三步:

# 产生random data

# sort data

# validate results

步骤如下:

1. 产生random data

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-.jar randomwriter random-data

用RandomWriter产生random data, 在yarn上运行RandomWriter会启动一个MapReduce job, 每个node上默认启动10个map task, 每个map 会产生1GB的random data.

修改默认参数: test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map

2. sort data

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-.jar sort random-data sorted-data

3.validate results

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-.jar testmapredsort –sortInput randomdata –sortOutput sorted-data

the command 会启动一个SortValidator 程序,此程序会做一些列检查例如检查unsorted和sorted data是否精确。

3. 其他Tests

MRBench –invoked by mrbench, 此程序会启动一个程序,运行多次

NNBench – invoked by nnbench, namenode上的负载测试

Gridmix  --没兴趣

最新文章

  1. angular手势事件之on-Hold
  2. LEETCODE —— binary tree [Same Tree] && [Maximum Depth of Binary Tree]
  3. win7 :安装SQL2005
  4. Hadoop2.2.0 第一步完成MapReduce wordcount计算文本数量
  5. B+树索引和哈希索引的区别——我在想全文搜索引擎为啥不用hash索引而非得使用B+呢?
  6. Cisco 防止SYN Flood 攻击原理
  7. java实现双端链表
  8. C# 实现Html转JSON
  9. Qt 图像缩放显示
  10. 程序员 面试题【前端,java,php】
  11. android控件基本布局
  12. HTML基础【2】:基础标签
  13. 【转】Loadrunner 性能指标定位系统瓶颈
  14. 第二届CCF软件能力认证
  15. 【LeetCode】221. Maximal Square
  16. Nginx 灰度实现方式(支持纯灰度,纯生产,50度灰及更多比例配置)
  17. 微信小程序-注册和第一个demo
  18. 记一次RMI的调用数据失误
  19. 小米root
  20. 【Error】:10061由于目标计算机积极拒绝,无法连接

热门文章

  1. .Net 程序集 签名工具sn.exe 密钥对SNK文件 最基本的用法
  2. IO流04_InputStream和Reader输入流
  3. VIM 拼写/spell check
  4. 查找并绘制轮廓[OpenCV 笔记XX]
  5. java nio使用方法(转)
  6. python 自动化之路 day 04
  7. mysql的1045解决方法
  8. wamp——利用phpmyadmin修改数据库密码
  9. 解决php json_encode 出现的中文转码、乱码问题
  10. PHP的PSR-0命名标准