准备

系统:CentOS 6或者RedHat 6(这里用的是64位操作)

软件:JDK 1.7、hadoop-2.3.0、native64位包(可以再csdn上下载,这里不提供了)

部署规划

192.168.1.11 C6H1 NameNode、DataNode、ResourceManager、NodeManager、JournalNode

192.168.1.12 C6H2 NameNode、DataNode、JournalNode、NodeManager

192.168.1.13 C6H3 DataNode、JourNode、NodeManager

配置过程
1、关闭相关服务、配置HOSTS文件、解压缩包

chkconfig iptables off

service iptables stop #关闭防火墙

vi /etc/selinux/config

SELINUX=disabled #注销以前的,添加这个或者直接改。

:wq

setenforce 0 #强制关闭selinux

#设置hosts,每台都设置

vi /etc/hosts

192.168.1.11 C6H1

192.168.1.12 C6H2

192.168.1.13 C6H3

tar –zxvf hadoop-2.3.0.tar.gz –C /usr/local

目录改名为hadoop2

2、安装JDK

tar –zxvf jdk-1.7.xx.tar.gz –C /usr/src

cd /usr/src

mv /usr/src/jdk-1.7.xx /usr/local/jdk

vi /etc/profile

添加环境变量,这里一次性把所有的环境变量都添加了

export JAVA_HOME=/usr/local/jdk

export ZOOKEEPER_HOME=/usr/local/zk

export HADOOP_HOME=/usr/local/hadoop2

export PATH=.:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

:wq #保存退出

source /etc/profile #立即生效

验证

java –version

java version "1.7.0_51"

Java(TM) SE Runtime Environment (build 1.7.0_51-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

3、配置密钥登陆

ssh-keygen –t rsa #生成密钥,一路回车。每台机器上都执行一遍

scp /root/.ssh/id_rsa.pub root@C6H1:/root/C6H2_key #分别将C6H2\C6H3上的公钥传到C6H1中。

在C6H1上操作:

cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys

cat /root/C6H2_key >> /root/.ssh/authorized_keys #>>代表追加,一个>覆盖了内容

cat /root/C6H3_key >> /root/.ssh/authorized_keys

将C6H1中的文件拷贝到C6H2\C6H3机器的/root/.ssh/目录下,这样机器之间免密码可以登陆了。

4、配置core-site.xml

<configuration>

<!—设置集群的名称 -- >

<property>

<name>fs.defaultFS</name>

<value>hdfs://cluster1</value>

</property>

<! – 设置目录存储的位置,默认namenode、datanode都存储在这里目录下 -- >

<property>

<name>hadoop.tmp.dir</name>

<value>/data/dfs/hadoop</value>

</property>

<property>

</configuration>

5、配置hdfs-site.xml

<configuration>

<! – 副本数,默认3个 -- >

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<! – 设置集群名称 -- >

<property>

<name>dfs.nameservices</name>

<value>cluster1</value>

</property>

<! – 设置集群中的NameNode节点-- >

<property>

<name>dfs.ha.namenodes.cluster1</name>

<value>C6H1,C6H2</value>

</property>

<! –- 设置集群中的C6H1的namenode的rpc访问地址和端口 -- >

<property>

<name>dfs.namenode.rpc-address.cluster1.C6H1</name>

<value>C6H1:9000</value>

</property>

<! –- 设置集群中的C6H2的namenode的rpc访问地址和端口 -- >

<property>

<name>dfs.namenode.rpc-address.cluster1.C6H2</name>

<value>C6H2:9000</value>

</property>

<! –- 设置集群中的C6H1的namenode的http访问地址和端口 -- >

<property>

<name>dfs.namenode.http-address.cluster1.C6H1</name>

<value>C6H1:50070</value>

</property>

<! –- 设置集群中的C6H2的namenode的rpc访问地址和端口 -- >

<property>

<name>dfs.namenode.http-address.cluster1.C6H2</name>

<value>C6H2:50070</value>

</property>

<! –- 设置namenode的元数据信息都保存在journal集群中 -- >

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://C6H1:8485;C6H2:8485;C6H3:8485/cluster1</value>

</property>

<!-- 设置cluster1故障时,哪一个实现类指定故障切换 -- >

<property>

<name>dfs.client.failover.proxy.provider.cluster1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<! -- 设置NameNode切换的操作方式,使用ssh操作 -- >

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<!-- 设置密钥保存位置 -- >

<property>

<name>dfs.ha.fencing.ssh.private-key-file</name>

<value>/root/.ssh/id_rsa</value>

</property>

<! -- 指定journalNode集群对NameNode的目录进行共享时,自己存储在磁盘的路径-- >

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/data/dfs/journal</value>

</property>

<! -- 设置namenode存储在磁盘的路径 -->

<property>

<name>dfs.namenode.name.dir</name>

<value>/data/dfs/name</value>

</property>

<! -- 设置datanode存储在磁盘的路径 -- >

<property>

<name>dfs.datanode.data.dir</name>

<value>/data/dfs/data</value>

</property>

<! -- 开启web端访问FS -- >

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

</configuration>

6、配置mapred-site.xml

<configuration>

<! -- 与Hadoop1不一样的这里设置yarn方式执行mapreduce -- >

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

7、配置yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<! -- 设置reourcemanager主机,这里只能设置一个,有单点隐患! -- >

<property>

<name>yarn.resourcemanager.hostname</name>

<value>C6H1</value>

</property>

<! -- 设置aux-services,mapreduce_shuffle -- >

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

8、配置yarn-env.sh

export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

9、配置mapred-env.sh

export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

10、配置hadoop-env.sh

export JAVA_HOME=/usr/local/jdk #设置hadoop调用的JAVA_HOME路径

11、配置slaves

vi /usr/local/hadoop2/etc/hadoop/slaves

C6H1

C6H2

C6H3

每行一个主机名

12、第一次初始化启动过程

初始化跟hadoop1不同,按照步骤来操作,如果重复格式化需要删除 /data/dfs/中的所有目录,也就是hadoop.tmp.dir设置的路径。

1、分别在三台机器上启动JournalNode

hadoop-daemon.sh start journalnode

2、在C6H1格式化NameNode

hdfs namenode –format

3、在C6H1上启动namenode

hadoop-daemon.sh start namenode

4、在C6H2上格式化另一个NameNode,需要同步C6H1上的NameNode数据。

hdfs namenode –bootstrapStandby

5、启动另一个NameNode

hadoop-daemon.sh start namenode

6、关闭NameNode,启动所有的hadoop所有服务

stop-all.sh

start-all.sh #以后启动直接使用这个命令就行,第一次初始化必须按照以上步骤操作。

启动HDFS 的HA自动切换

hdfs haadmin –failover –forceactive CH61 C6H2

Failover from C6H1 to C6H2 successful

13、测试HDFS

SHELL测试创建文件夹

hadoop fs –mkdir /data

hadoop fs –ls /

14、测试MapReduce

vi /root/word.text

hello you

hello me

上传一个文本文件

hadoop fs –put /root/word.text /

使用自带的测试包测试wordcount

格式 hadoop jar jar包路径 wordcount hdfs输入路径 输出路径(必须不存在的,会自动创建)

[root@C6H1 hadoop]# hadoop jar /usr/local/hadoop2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount /word.text /word_out1

14/03/16 09:36:21 INFO client.RMProxy: Connecting to ResourceManager at C6H1/192.168.1.11:8032

14/03/16 09:36:22 INFO input.FileInputFormat: Total input paths to process : 1

14/03/16 09:36:22 INFO mapreduce.JobSubmitter: number of splits:1

14/03/16 09:36:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1394933446304_0001

14/03/16 09:36:23 INFO impl.YarnClientImpl: Submitted application application_1394933446304_0001

14/03/16 09:36:23 INFO mapreduce.Job: The url to track the job: http://C6H1:8088/proxy/application_1394933446304_0001/

14/03/16 09:36:23 INFO mapreduce.Job: Running job: job_1394933446304_0001

14/03/16 09:36:31 INFO mapreduce.Job: Job job_1394933446304_0001 running in uber mode : false

14/03/16 09:36:31 INFO mapreduce.Job: map 0% reduce 0%

14/03/16 09:36:38 INFO mapreduce.Job: map 100% reduce 0%

14/03/16 09:36:44 INFO mapreduce.Job: map 100% reduce 100%

14/03/16 09:36:45 INFO mapreduce.Job: Job job_1394933446304_0001 completed successfully

14/03/16 09:36:45 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=48

FILE: Number of bytes written=173817

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=108

HDFS: Number of bytes written=26

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=4262

Total time spent by all reduces in occupied slots (ms)=3556

Total time spent by all map tasks (ms)=4262

Total time spent by all reduce tasks (ms)=3556

Total vcore-seconds taken by all map tasks=4262

Total vcore-seconds taken by all reduce tasks=3556

Total megabyte-seconds taken by all map tasks=4364288

Total megabyte-seconds taken by all reduce tasks=3641344

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=34

Map output materialized bytes=48

Input split bytes=90

Combine input records=4

Combine output records=4

Reduce input groups=4

Reduce shuffle bytes=48

Reduce input records=4

Reduce output records=4

Spilled Records=8

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=152

CPU time spent (ms)=1330

Physical memory (bytes) snapshot=308592640

Virtual memory (bytes) snapshot=1708167168

Total committed heap usage (bytes)=136450048

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=18

File Output Format Counters

Bytes Written=26

 

集群搭建参考吴超-沉思录博客,转载请注明出处,谢谢!

最新文章

  1. Codeforces Round #373 (Div. 2)
  2. 每天一个linux命令(6):mv命令
  3. 将DataTable导出为Excel C#
  4. C#处理控制台关闭事件
  5. PHP间隔一段时间执行
  6. Memcached服务器安装、配置、使用详解
  7. BZOJ1596: [Usaco2008 Jan]电话网络
  8. FACE++学习一、detect接口
  9. 201521123025 《Java程序设计》第2周学习总结
  10. oracle获取表字段属性
  11. 讲究门面的Request
  12. 开源顶级持久层框架——mybatis(ibatis)——day02
  13. python-shutil学习
  14. Python Revisited Day 02 (数据类型)
  15. hdu 1081 To The Max(二维压缩的最大连续序列)(最大矩阵和)
  16. ArcGIS自定义工具箱-字段合并
  17. AutoCAD开发1---获取块属性
  18. 织梦 百度sitemap制作教程
  19. Java设计模式-简单工厂模式(Static Factory Method)
  20. 16、SpringBoot-CRUD错误处理机制(3)

热门文章

  1. 路由器扫描的Java源码
  2. 【分享】Maven插件的源码下载(SVN)
  3. Kobject结构体分析
  4. NDK(0)简介
  5. jquery ajax的async属性的理解
  6. 浅析JavaScript函数的参数
  7. 3D volume texture和cube map
  8. No resource found that matches the given name
  9. 一个简单的ORM制作(CURD操作类)
  10. 使用MySQL Proxy解决MySQL主从同步延迟