https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

 
 

Skip to end of metadata

 

Go to start of metadata

 

LZO Compression

General LZO Concepts

LZO is a lossless data compression library that favors speed over compression ratio. See http://www.oberhumer.com/opensource/lzo and http://www.lzop.org for general information about LZO and see Compressed Data Storage for information about compression in Hive.

Imagine a simple data file that has three columns

  • id
  • first name
  • last name

Let's populate a data file containing 4 records:

19630001     john          lennon
19630002 paul mccartney
19630003 george harrison
19630004 ringo starr

Let's call the data file /path/to/dir/names.txt.

In order to make it into an LZO file, we can use the lzop utility and it will create a names.txt.lzo file.

Now copy the file names.txt.lzo to HDFS.

Prerequisites

Lzo/Lzop Installations

lzo and lzop need to be installed on every node in the Hadoop cluster. The details of these installations are beyond the scope of this document.

core-site.xml

Add the following to your core-site.xml:

  • com.hadoop.compression.lzo.LzoCodec
  • com.hadoop.compression.lzo.LzopCodec

For example:

<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>

<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

Next we run the command to create an LZO index file:

hadoop jar /path/to/jar/hadoop-lzo-cdh4-0.4.15-gplextras.jar com.hadoop.compression.lzo.LzoIndexer  /path/to/HDFS/dir/containing/lzo/files

This creates names.txt.lzo on HDFS.

Table Definition

The following hive -e command creates an LZO-compressed external table:

hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1  datatype_1......column_N datatype_N)
PARTITIONED BY (partition_col_1 datatype_1 ....col_P datatype_P)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"
OUTPUTFORMAT \"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\";

Note: The double quotes have to be escaped so that the 'hive -e' command works correctly.

See CREATE TABLE and Hive CLI for information about command syntax.

Hive Queries

Option 1: Directly Create LZO Files

  1. Directly create LZO files as the output of the Hive query.
  2. Use lzop command utility or your custom Java to generate .lzo.index for the .lzo files.

Hive Query Parameters

SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec
SET hive.exec.compress.output=true
SET mapreduce.output.fileoutputformat.compress=true

For example:

hive -e "SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec; SET hive.exec.compress.output=true;SET mapreduce.output.fileoutputformat.compress=true; <query-string>"

     Note: If the data sets are large or number of output files are large , then this option does not work.

Option 2: Write Custom Java to Create LZO Files

  1. Create text files as the output of the Hive query.
  2. Write custom Java code to
    1. convert Hive query generated text files to .lzo files
    2. generate .lzo.index files for the .lzo files generated above

Hive Query Parameters

Prefix the query string with these parameters:

SET hive.exec.compress.output=false
SET mapreduce.output.fileoutputformat.compress=false

For example:

hive -e "SET hive.exec.compress.output=false;SET mapreduce.output.fileoutputformat.compress=false;<query-string>"

最新文章

  1. 解决jquery操作checkbox全选全不选无法勾选问题
  2. [stm32] SIM808模块之发短信\GPS\TCP\HTTP研究
  3. jboss wildfly 外网访问
  4. YUVviewerPlus使用教程
  5. 单例模式中的多线程分析synchronized
  6. 记忆2--记忆的&quot;记&quot;和&quot;忆&quot;
  7. ORA-600[kcratr_scan_lastbwr]逻辑坏块解决
  8. java练习 - 字符串反转
  9. 几张图带你轻轻松松了解小程序和APP的区别
  10. webpack3新特性介绍
  11. 利用mk-table-checksum监测Mysql主从数据一致性操作记录
  12. js修改伪类元素样式
  13. 跟我学ASP.NET MVC之八:SportsStrore移动设备
  14. requests之headers &#39;Content-Type&#39;: &#39;text/html&#39;误判encoding为&#39;ISO-8859-1&#39;导致中文text解码错误
  15. devops工具
  16. [JZOJ5987] 仙人掌毒题
  17. 9、JPA-映射-双向多对多
  18. centos7 虚拟机安装 以后不能联网问题
  19. Android build.gradle
  20. elastisSearch-aggregations

热门文章

  1. linux 时间模块 二
  2. py2exe多文件转换
  3. POJ 2577: Interpreter
  4. react 生命周期详解
  5. BZOJ 4540 [Hnoi2016]序列 (单调栈 + ST表 + 莫队算法)
  6. Xamarin.Forms特殊的视图BoxView
  7. Symmetric Tree(DFS,二叉树的构建以及测试代码)
  8. python中 urllib, urllib2, httplib, httplib2 几个库的区别
  9. cocos2d-x调用android内嵌浏览器打开网页
  10. android:使用gallery和imageSwitch制作可左右循环滑动的图片浏览器