Sqoop is a tool designed for efficiently transferring data between RDBMS and HDFS, we can import data from mysql, oracle, and other data bases into HDFS very easily; meanwhile we can dump data into data base from HDFS. For detailed documentation, please refer to sqoop documentation.

Before using Sqoop, please follow steps to setup it correctly.

Sqoop - Import

the following command is used for import

sqoop import (generic-args) (import-args)

given a table named stock_info, and the schema is:

Case 1: we can use below command to import stock_info data to hadoop hdfs file system:

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1

and the result looks like:

we can verify result in hdfs by running command

hadoop fs -cat /emp/part-m-*

Case 2: sepcify the target directory in hdfs by running the following import command

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp

then we can verify result by executing the same command as above

Case 3: imcremental import by specifying --incremental, --check-column and --append arguments. Note we should change 'last_chg_date' when applying other tables.

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append

Case 4: specify target file format as parquet format by adding argument '--as-parquetfile'

sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

Case 5: import all tables

sqoop import-all-tables --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser

Sqoop - Export

export means to dump data from hdfs to mysql, oracle or other data bases, command syntax is like

sqoop export (generic-args) (export-args)

given there are many parquet files under stock_info folder which is imported by sqoop import command incrementally

then we want to dump data back into mysql data base, using the following command

sqoop export --connent jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --export-dir /user/hlli/stock_info

finally verify data in mysql command line

select * from stock_info;

Incremental importing data

by using linux timer 'crontab' to schedule a job to execute importing periodically.

cd /var/spool/cron

touch hlli (please change hlli to your user name here)

vi hlli

*/ * * * * /usr/lib/sqoop/bin/sqoop import --connect jdbc:mysql://host:port/dbname --username loginuser --password loginuser --table stock_info --m 1 --target-dir /temp --incremental lastmodified --check-column last_chg_date --append --as-parquetfile

if it works, you will receive email in '/var/spool/mail/hlli'; meanwhile we can verify data by running command

hadoop fs -ls /

Commonly used Sqoop commands

sqoop help import

sqoop help export

sqoop help job

sqoop help codegen

sqoop help eval

sqoop help list-tables

sqoop help list-databases

sqoop help import-all-tables

References:

  1. http://sqoop.apache.org/
  2. http://man.linuxde.net/crontab

最新文章

  1. Python学习笔记(四)——编码和字符串
  2. Android性能优化典范(二)
  3. 在虚拟上安装kali
  4. 了解python
  5. Android journey 1@关于编码风格和命名规范
  6. iOS有关截图的操作
  7. 将Excel数据导入MySql
  8. mac 浏览器 强刷快捷键
  9. OC 之 const
  10. 编写C# Windows服务,用于杀死Zsd.exe进程
  11. C# 面向对象 , 继承
  12. csapp lab3 bufbomb 缓存区溢出攻击 《深入理解计算机系统》
  13. java jquery 函数多參数传递
  14. 网络编程应用:基于UDP协议【实现文件下载】--练习
  15. vscode restclient 插件
  16. CICD - Teamcity 配置之二:SSIS Package 自动部署
  17. Java学习-051-Detected both log4j-over-slf4j.jar AND bound slf4j-log4j12.jar on the class path, preempting StackOverflowError
  18. ARGB 颜色取值与透明度对照表
  19. JS笔记(二):对象
  20. unity3d 脚本周期函数

热门文章

  1. [Lua快速了解一下]Lua运行
  2. Java静态变量的用法:伪单例
  3. delphi将图片转换成Base64编码函数
  4. 数据库连接工具HeidiSql介绍(支持MySQL,MariaDB,Microsoft SQL或PostgreSQL)
  5. NFS4 挂载同主机多个目录
  6. Windows10电脑系统时间校准
  7. IPython绘图和可视化---matplotlib 入门
  8. Eclipse中文件夹变成包的解决办法(python版)
  9. XMLHttpRequest 与 Ajax 概要
  10. 洛谷 P1272 重建道路