1.  参考说明

参考文档:

http://sqoop.apache.org/

http://sqoop.apache.org/docs/1.99.7/admin/Installation.html

2.  安装环境说明

2.1.  环境说明

CentOS7.4+ Hadoop2.7.5的伪分布式环境

主机名

NameNode

SecondaryNameNode

DataNodes

centoshadoop.smartmap.com

192.168.1.80

192.168.1.80

192.168.1.80

Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5

3.  安装

Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop。

3.1.  Sqoop下载

http://sqoop.apache.org/

3.2.  Sqoop解压

将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下

4.  配置

4.1.  修改Hadoop配置

4.1.1.  修改core-site.xml文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml

<property>

<name>hadoop.proxyuser.sqoop2.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.sqoop2.groups</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.root.groups</name>

<value>*</value>

</property>

4.1.2.  修改container-executor.cfg文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg

allowed.system.users=sqoop2,hadoop,root

4.1.3.  修改yarn-site.xml文件

[root@centoshadoop hadoop]# vi
/opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml

4.2.  创建第三方数据库驱动目录

[root@centoshadoop sqoop-1.99.7]# mkdir
/opt/hadoop/sqoop-1.99.7/jdbcDriver

[root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx
/opt/hadoop/sqoop-1.99.7/jdbcDriver

4.3.  上传驱动到存放JDBC驱动的目录下

上传软件包mysql-connector-java-5.1.46.jar到/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下

4.4.  修改profile文件

vi
/etc/profile

# export
JAVA_HOME=/opt/java/jdk1.7.0_45

export
JAVA_HOME=/opt/java/jdk1.8.0_171

# export
CLASSPATH=.:$JAVA_HOME/lib

export
HADOOP_HOME=/opt/hadoop/hadoop-2.7.5

export
HADOOP_INSTALL=$HADOOP_HOME

export
HADOOP_MAPRED_HOME=$HADOOP_HOME

export
HADOOP_COMMON_HOME=$HADOOP_HOME

export
HADOOP_HDFS_HOME=$HADOOP_HOME

export
YARN_HOME=$HADOOP_HOME

export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export
PIG_HOME=/opt/hadoop/pig-0.17.0

export
HIVE_HOME=/opt/hadoop/hive-2.3.3

export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export
PATH=$PATH:$PIG_HOME/bin

export
PATH=$PATH:$HIVE_HOME/bin

export
SQOOP_HOME=/opt/hadoop/sqoop-1.99.7

export
PATH=$PATH:$SQOOP_HOME/bin

export
CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib

export
SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/

export
CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib

export
LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native

export
JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH

export
HADOOP_USER_NAME=hadoop

4.5.  将JDK升级为1.8版本

将JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量

4.6.  修改Sqoop的配置文件

[root@centoshadoop bin]# vi
/opt/hadoop/sqoop-1.99.7/conf/sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/

[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh

export
HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common

export
HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs

export
HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce

export
HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

5.  启动Hadoop

5.1.  启动YARN与HDFS

cd
/opt/hadoop/hadoop-2.7.5/sbin

start-all.sh

5.2.  启动historyserver

cd
/opt/hadoop/hadoop-2.7.5/sbin

mr-jobhistory-daemon.sh start historyserver

6.  验证Sqoop安装与配置

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Sqoop
tool executor:

Version: 1.99.7

Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine

Running
tool: class org.apache.sqoop.tools.tool.VerifyTool

1   [main]
INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.

6   [main]
INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J:
Class path contains multiple SLF4J bindings.

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.

Verification was successful.

Tool
class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

7.  初始化Sqoop的Repository

[root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Sqoop
tool executor:

Version: 1.99.7

Revision:
435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled on Tue Jul 19 16:08:27 PDT
2016 by abefine

Running
tool: class org.apache.sqoop.tools.tool.UpgradeTool

0   [main]
INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread

SLF4J:
Class path contains multiple SLF4J bindings.

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
Found binding in
[jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J:
See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.

Tool
class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.

[root@centoshadoop sqoop-1.99.7]#

8.  应用Sqoop工具

8.1.  服务端

8.1.1.  启动与关闭sqoop2-server

[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-server start

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Starting
the Sqoop2 server...

[root@centoshadoop bin]# sqoop2-server stop

Setting
conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf

Sqoop
home directory: /opt/hadoop/sqoop-1.99.7

Stopping
the Sqoop2 server...

Sqoop2
server stopped.

[root@centoshadoop bin]# jps

29780
NameNode

30085
SecondaryNameNode

30247
ResourceManager

31767
SqoopJettyServer

29913
DataNode

32107
Jps

30364
NodeManager

29245
JobHistoryServer

[root@centoshadoop bin]#

8.2.  客户端

8.2.1.  启动运行Sqoop的交互式Shell环境

[root@centoshadoop bin]# cd
/opt/hadoop/sqoop-1.99.7/bin/

[root@centoshadoop bin]# sqoop2-shell

sqoop:000> :exit

8.2.2.  设置错误显示

sqoop:000> set option --name
verbose --value true

Verbose
option was changed to true

8.2.3.  连接Sqoop服务端

sqoop:000> set server -host
192.168.1.80

Server
is set successfully

# set
server --host ubuntu02 --port 12000 --webapp sqoop

sqoop:000> show version
--all

client
version:

Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016

server
version:

Sqoop
1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiled by abefine on Tue Jul 19 16:08:27 PDT
2016

API
versions:

[v1]

8.2.4.  创建MySQL的Link对象

sqoop:000> show
connector

+------------------------+---------+------------------------------------------------------------+----------------------+

|          Name          | Version |                           Class                            | Supported Directions |

+------------------------+---------+------------------------------------------------------------+----------------------+

|
generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

|
kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

|
oracle-jdbc-connector  | 1.99.7 |
org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

|
ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

|
hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

|
kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

|
sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

+------------------------+---------+------------------------------------------------------------+----------------------+

sqoop:000>

sqoop:000> create link -c
generic-jdbc-connector

Creating
link for connector with name generic-jdbc-connector

Please
fill following values to create new link object

Name: mysql-link

Database
connection

Driver
class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

Username: root

Password: ***

Fetch
Size:

Connection Properties:

There
are currently 0 values in the map:

entry#

SQL
Dialect

Identifier enclose:       # 空格

New link
was successfully created with validation status OK and name
mysql-link

sqoop:000>

8.2.5.  创建HDFS的Link对象

sqoop:000> create link
-connector hdfs-connector

Creating
link for connector with name hdfs-connector

Please
fill following values to create new link object

Name:
hdfs-link

HDFS
cluster

URI: hdfs://192.168.1.80:9000

Conf
directory:

Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop

There
are currently 0 values in the map:

entry#

New link
was successfully created with validation status OK and name hdfs-link

sqoop:000>

8.2.6.  显示Link对象

sqoop:000> show link
--all

2
link(s) to show:

link
with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated
by root at 5/23/18 3:48 PM)

Using
Connector generic-jdbc-connector with name {1}

Database connection

Driver
class: com.mysql.jdbc.Driver

Connection
String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false

Username:
root

Password:

Fetch
Size:

Connection
Properties:

protocol
= tcp

SQL
Dialect

Identifier
enclose: `

link
with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated
by root at 5/23/18 4:00 PM)

Using
Connector hdfs-connector with name {1}

HDFS
cluster

URI:
hdfs://192.168.1.80:9000

Conf
directory:

Additional
configs::

sqoop:000>

8.2.7.  创建MySQL导数据到HDFS的Job

8.2.7.1. 准备数据

create
database sqooptest character set utf8;

CREATE
TABLE topic (

id int(11) NOT NULL
AUTO_INCREMENT,

name varchar(50) DEFAULT
NULL,

address varchar(100) DEFAULT
NULL,

time datetime DEFAULT
NULL,

content varchar(500) DEFAULT
NULL,

PRIMARY KEY(id)

)ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT
CHARSET=UTF8;

INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', 'hello world');

INSERT
INTO topic(name, address, time, content) values('11', '111', '2018-05-01
08:00:00', NULL);

8.2.7.2. 创建任务(MySQLàHDFS)

sqoop:000> create job -f
mysql-link -t hdfs-link

Creating
job for links with from name mysql-link and to name hdfs-link

Please
fill following values to create new job object

Name:  mysql2hdfs

Database
source

Schema
name: sqooptest   #必填,数据库名称

Table
name: topic  #必填,表名

SQL
statement:

Column
names:

There
are currently 0 values in the list:

element#

Partition column: id

Partition column nullable:

Boundary
query:

Incremental read

Check
column:

Last
value:

Target
configuration

Override
null value:

Null
value:

File
format:

0 :
TEXT_FILE

1 :
SEQUENCE_FILE

2 :
PARQUET_FILE

Choose:
0

Compression codec:

0 :
NONE

1 :
DEFAULT

2 :
DEFLATE

3 :
GZIP

4 :
BZIP2

5 :
LZO

6 :
LZ4

7 :
SNAPPY

8 :
CUSTOM

Choose:
0

Custom
codec:

Output
directory:/sqoop/sqooptest/mysqltable/    #必填

Append
mode:

Throttling resources

Extractors:

Loaders:

Classpath configuration

Extra
mapper jars:

There
are currently 0 values in the list:

element#

New job
was successfully created with validation status OK  and name  mysql2hdfs

sqoop:000>

8.2.7.3. 启动Job

[root@centoshadoop sbin]# hadoop fs –mkdir -p
/sqoop/sqooptest/mysqltable

[root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop

sqoop:000> show
job

+----+-------------------------+-------------------------------------+----------------------------+---------+

| Id
|          Name           |           From Connector            |        To Connector        | Enabled |

+----+-------------------------+-------------------------------------+----------------------------+---------+

| 1  |  mysql2hdfs | mysql-link
(generic-jdbc-connector) | hdfs-link (hdfs-connector) | true    |

+----+-------------------------+-------------------------------------+----------------------------+---------+

sqoop:000>

sqoop:000> set option --name
verbose --value true

Verbose
option was changed to true

sqoop:000> start job -name
mysql2hdfs

Submission details

Job
Name: mysql2hdfs

Server
URL: http://localhost:12000/sqoop/

Created
by: root

Creation
date: 2018-05-24 08:57:14 CST

Lastly
updated by: root

External
ID: job_1527123411152_0002

http://centoshadoop:8088/proxy/application_1527123411152_0002/

Source
Connector schema: Schema{name=  sqooptest  .  topic  ,columns=[

FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},

Text{name=name,nullable=true,type=TEXT,charSize=null},

Text{name=address,nullable=true,type=TEXT,charSize=null},

Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},

Text{name=content,nullable=true,type=TEXT,charSize=null}]}

2018-05-24 08:57:14 CST: BOOTING  - Progress is not available

sqoop:000>

8.2.7.4. 查看结果

[root@centoshadoop sbin]# hadoop fs -ls
/sqoop/sqooptest/mysqltable

Found 1
items

-rw-r--r--   1 root supergroup         97 2018-05-24 08:57
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

[root@centoshadoop sbin]# hadoop
fs -cat
/sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt

3,'11','111','2018-05-01 08:00:00.000','hello
world'

4,'11','111','2018-05-01 08:00:00.000',NULL

[root@centoshadoop sbin]#

最新文章

  1. expect
  2. html中&lt;radio&gt;单选按钮控件标签用法解析及如何设置默认选中
  3. RNG vs EDG | SKT vs KTB [20160826]
  4. 2016-06-08:Windows中的bat脚本
  5. Maven 自动部署
  6. 基于Flot可放缩的折线图
  7. AVAudioRecorder、AVAudioPlayer录音及播放
  8. mysql 新增 删除用户和权限分配
  9. 对LevelDB的“升级版”存储引擎RocksDB的调研成果
  10. 实现ie6下的居中
  11. 三级菜单的实现(python程序)
  12. Python2和3版本对str和bytes类型的处理
  13. 阿里,百度面试90%会问的Java面试题
  14. uwsgi加nginx部署django restframework前后端分离项目
  15. Linux下tar压缩解压缩命令详解
  16. 64位 windows10,安装配置MYSQL8.0.13
  17. App里面如何正确显示用户头像
  18. POJ1742Coins(多重背包)
  19. vs2015 npm list 更新问题
  20. USBDM Debugger interface for Freescale RS08,HCS08,HCS12,Coldfire and ARM-Kinetis Devices.

热门文章

  1. 利用matlab求图像均值和方差的几种方法
  2. (转)linux用户态和内核态理解
  3. postgresql主从配置
  4. Solidity中如何判断mapping中某个键是否为空呢?
  5. SpringBoot入门 (十二) 定时任务
  6. Java 容器之 Connection栈队列及一些常用
  7. LR监测windows资源一般监测哪几个项?
  8. POJ 1730 Perfect Pth Powers(暴力枚举)
  9. UVa Dropping Balls
  10. zoj 2818 Root of the Problem(数学思维题)