在Fedora18上配置个人的Hadoop开发环境

1.    背景

文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。基本的目的是配置文件和日志文件有一个单一的源,

能够用软连接到开发生成的二进制库。这样就能够在所生成二进制库更新的时候维护其它的数据和配置项。

2.    用户案例

1.  比較不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做測试

2.  单一源的配置文件盒日志文件

3.    參考

网页:

http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment

http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

http://wiki.apache.org/hadoop/

http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm

书籍:

Hadoop “TheDefinitive Guide”

4.    免责声明

1.  当前是在使用存在maven依赖的非本地开发步骤,具体信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop

2 . 单节点环境搭建步骤在下边列出

5.    先决条件

1.      配置没有password的ssh

yum install openssh openssh-clients openssh-server

# generate a public/private key, if you don't already have one

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/*

# testing ssh:

ps -ef | grep sshd     # verify sshd is running

ssh localhost          # accept the certification when prompted

sudo passwd root       # Make sure the root has a password

2.        安装其他依赖包

yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++ 

3.        安装java和开发环境

yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*

改动.bashrc文件信息

 export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
 export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"

注意:以上的配置用在F18的OpenJDK7上。能够通过下面命令来測试当前环境配置是否成功。

mvn install -Dmaven.test.failure.ignore=true

6.     搭建“personal-hadoop“

1.        下载编译hadoop

git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests

2.        创建沙盒环境

在这个配置中我们默认到/home/tstclair

cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home

3.        重写你的环境变量

附加下面信息到家文件夹的.bashrc文件里

# Hadoop env override:

export HADOOP_BASE_DIR=${HOME}/personal-hadoop

export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs

export HADOOP_PID_DIR=${HADOOP_BASE_DIR}

export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf

export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home

export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}

export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}

# Yarn env override:

export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}

export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn

#classpath override to search hadoop loc

export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share

#Finally update your PATH

export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}

4.        验证以上步骤

source ~/.bashrc
which hadoop    # verify it should be ${HOME}/personal-hadoop/home/bin  
hadoop -help    # verify classpath is correct.

5.        创建初始化单一源的配置文件

拷贝默认的配置文件

cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf

更新你的hdfs-site.xml文件:

<?

xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Override tstclair with your home directory -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost/</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>file:///home/tstclair/personal-hadoop/name</value>

</property>

<property>

<name>dfs.http.address</name>

<value>0.0.0.0:50070</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:///home/tstclair/personal-hadoop/data</value>

</property>

<property>

<name>dfs.datanode.address</name>

<value>0.0.0.0:50010</value>

</property>

<property>

<name>dfs.datanode.http.address</name>

<value>0.0.0.0:50075</value>

</property>

<property>

<name>dfs.datanode.ipc.address</name>

<value>0.0.0.0:50020</value>

</property>

</configuratio

更新mapred-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Update or append these vars -->

<configuration>

<property>

<name>mapreduce.cluster.temp.dir</name>

<value>

</value>

<description>No description</description>

<final>true</final>

</property>

<property>

<name>mapreduce.cluster.local.dir</name>

<value>

</value>

<description>No description</description>

<final>true</final>

</property>

</configuration>

最后更新yarn-site.xml文件

<?xml version="1.0"?

>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>localhost:8031</value>

<description>host is the hostname of the resource manager and

port is the port on which the NodeManagers contact the Resource Manager.

</description>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>localhost:8030</value>

<description>host is the hostname of the resourcemanager and port is the port

on which the Applications in the cluster talk to the Resource Manager.

</description>

</property>

<property>

<name>yarn.resourcemanager.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>

<description>In case you do not want to use the default scheduler</description>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>localhost:8032</value>

<description>the host is the hostname of the ResourceManager and the port is the port on

which the clients can talk to the Resource Manager. </description>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>

</value>

<description>the local directories used by the nodemanager</description>

</property>

<property>

<name>yarn.nodemanager.address</name>

<value>localhost:8034</value>

<description>the nodemanagers bind to this port</description>

</property>

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>10240</value>

<description>the amount of memory on the NodeManager in GB</description>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

<description>shuffle service that needs to be set for Map Reduce to run </description>

</property>

</configuration>

7.    开启单节点的Hadoop集群

格式化namenode

hadoop namenode -format
#verify output is correct.

开启hdfs:

start-dfs.sh

打开浏览器http://localhost:50070。查看是否有一个节点已经被启动

接下来开启yarn

start-yarn.sh

通过查看日志文件来验证是否正常启动

最后通过执行MapReduce任务来检查Hadoop是否正常执行

cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out

文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/

最新文章

  1. 解析大型.NET ERP系统 多国语言实现
  2. Chrome浏览器二维码生成插件
  3. Iframe指定页面,并使其与该页面显示一致
  4. extjs基础 使用图标字体来美化按钮)
  5. GitHub上史上最全的Android开源项目分类汇总 (转)
  6. iOS开发之----常用函数和常数
  7. C# json object互转工具
  8. MyEclipse激活失败,解决办法
  9. SVN 代码下载,上传
  10. [Cycle.js] Making our toy DOM Driver more flexible
  11. linux命令chown和chmod什么区别
  12. flume-sink报错 java.lang.IllegalStateException: close() called when transaction is OPEN - you must either commit or rollback first
  13. [转载]iOS开发之手势识别
  14. OpenCV手写数字字符识别(基于k近邻算法)
  15. Java元编程及其应用
  16. 【XSY3309】Dreamweaver 高斯消元 拉格朗日插值
  17. 【转】Android辅助功能AccessibilityService自动全选择文字粘贴模拟输入
  18. pytest 11 allure2生成html报告
  19. opencv: 基本知识;
  20. git连接远程客户端,命令行窗口上传文件

热门文章

  1. HDU-5685 Problem A 求乘法逆元
  2. centos7 安装freeswitch
  3. docker环境下mysql参数修改
  4. 【codeforces 67A】Partial Teacher
  5. Docker_入门?只要这篇就够了!(纯干货适合0基础小白)
  6. [Test] Easy automated testing in NodeJS with TestCafe
  7. MySql基础总结(1)
  8. 自己封装js组件 - 中级
  9. hdoj--5256--序列变换(lis变形)
  10. Android textView开头空两格问题,排版缩进2个汉字