Write Custom Java to Create LZO Files
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
- Created by Lefty Leverenz, last modified on Sep 19, 2017
LZO Compression
General LZO Concepts
LZO is a lossless data compression library that favors speed over compression ratio. See http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information about LZO and see Compressed Data Storage for information about compression in Hive.
Imagine a simple data file that has three columns
- id
- first name
- last name
Let's populate a data file containing 4 records:
19630001 john lennon
19630002 paul mccartney
19630003 george harrison
19630004 ringo starr
Let's call the data file /path/to/dir/names.txt
.
In order to make it into an LZO file, we can use the lzop utility and it will create a names.txt.lzo
file.
Now copy the file names.txt.lzo
to HDFS.
Prerequisites
Lzo/Lzop Installations
lzo
and lzop
need to be installed on every node in the Hadoop cluster. The details of these installations are beyond the scope of this document.
core-site.xml
Add the following to your core-site.xml
:
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
For example:
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>
com.hadoop.compression.lzo.LzoCodec</value>
</property>
Next we run the command to create an LZO index file:
hadoop jar /path/to/jar/hadoop-lzo-cdh4-0.4.15-gplextras.jar com.hadoop.compression.lzo.LzoIndexer /path/to/HDFS/dir/containing/lzo/files
This creates names.txt.lzo
on HDFS.
Table Definition
The following hive -e
command creates an LZO-compressed external table:
hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1 datatype_1......column_N datatype_N)
PARTITIONED BY (partition_col_1 datatype_1 ....col_P datatype_P)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"
OUTPUTFORMAT \"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\";
Note: The double quotes have to be escaped so that the 'hive -e
' command works correctly.
See CREATE TABLE and Hive CLI for information about command syntax.
Hive Queries
Option 1: Directly Create LZO Files
- Directly create LZO files as the output of the Hive query.
- Use
lzop
command utility or your custom Java to generate.lzo.index
for the.lzo
files.
Hive Query Parameters
SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true
For example:
hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec; SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true; <query-string>"
Note: If the data sets are large or number of output files are large , then this option does not work.
Option 2: Write Custom Java to Create LZO Files
- Create text files as the output of the Hive query.
- Write custom Java code to
- convert Hive query generated text files to
.lzo
files - generate
.lzo.index
files for the.lzo
files generated above
- convert Hive query generated text files to
Hive Query Parameters
Prefix the query string with these parameters:
SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false
For example:
hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"
最新文章
- 解决jquery操作checkbox全选全不选无法勾选问题
- [stm32] SIM808模块之发短信\GPS\TCP\HTTP研究
- jboss wildfly 外网访问
- YUVviewerPlus使用教程
- 单例模式中的多线程分析synchronized
- 记忆2--记忆的";记";和";忆";
- ORA-600[kcratr_scan_lastbwr]逻辑坏块解决
- java练习 - 字符串反转
- 几张图带你轻轻松松了解小程序和APP的区别
- webpack3新特性介绍
- 利用mk-table-checksum监测Mysql主从数据一致性操作记录
- js修改伪类元素样式
- 跟我学ASP.NET MVC之八:SportsStrore移动设备
- requests之headers &#39;Content-Type&#39;: &#39;text/html&#39;误判encoding为&#39;ISO-8859-1&#39;导致中文text解码错误
- devops工具
- [JZOJ5987] 仙人掌毒题
- 9、JPA-映射-双向多对多
- centos7 虚拟机安装 以后不能联网问题
- Android build.gradle
- elastisSearch-aggregations
热门文章
- linux 时间模块 二
- py2exe多文件转换
- POJ 2577: Interpreter
- react 生命周期详解
- BZOJ 4540 [Hnoi2016]序列 (单调栈 + ST表 + 莫队算法)
- Xamarin.Forms特殊的视图BoxView
- Symmetric Tree(DFS,二叉树的构建以及测试代码)
- python中 urllib, urllib2, httplib, httplib2 几个库的区别
- cocos2d-x调用android内嵌浏览器打开网页
- android:使用gallery和imageSwitch制作可左右循环滑动的图片浏览器