错误信息:
insert overwrite table t_mobile_mid_use_p_tmp4_rcf
select '201411' as month_id,
a.prov_id, a.city, a.client_imsi, a.os_version,
b.install_status, b.install_date, b.unstall_status, b.unstall_date,
a.label_name, a.package_name, a.app_version, a.app_type_id, a.type_label_name,
b.run_time, monthSpace(b.install_date) as install_days,
a.flow, a.use_time, a.run_count, a.active_days, a.is_from_plugin,
from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') as load_date
from t_mobile_mid_use_p_tmp3_1_rcf a
join t_mobile_client_p_rcf b on (a.client_imsi = b.client_imsi and a.label_name = b.label_name);
Query ID = ca_20141218152020_9e4ebfa2-f663-47b8-a0cf-5303b9c0e482
Total jobs = 1
14/12/18 15:21:02 WARN conf.Configuration:
file:/tmp/ca/hive_2014-12-18_15-20-54_155_1926187970964040123-1/-local-10005/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/12/18 15:21:02 WARN conf.Configuration:
file:/tmp/ca/hive_2014-12-18_15-20-54_155_1926187970964040123-1/-local-10005/jobconf.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: /tmp/ca/ca_20141218152020_9e4ebfa2-f663-47b8-a0cf-5303b9c0e482.log
2014-12-18 03:21:03 Starting to launch local task to process map join; maximum memory = 1065484288
2014-12-18 03:21:08 Processing rows: 200000 Hashtable size: 199999 Memory usage: 112049704 percentage: 0.105
2014-12-18 03:21:09 Processing rows: 300000 Hashtable size: 299999 Memory usage: 160367688 percentage: 0.151
2014-12-18 03:21:10 Processing rows: 400000 Hashtable size: 399999 Memory usage: 209294088 percentage: 0.196
2014-12-18 03:21:11 Processing rows: 500000 Hashtable size: 499999 Memory usage: 257089944 percentage: 0.241
2014-12-18 03:21:12 Processing rows: 600000 Hashtable size: 599999 Memory usage: 305440536 percentage: 0.287
2014-12-18 03:21:14 Processing rows: 700000 Hashtable size: 699999 Memory usage: 347305664 percentage: 0.326
2014-12-18 03:21:14 Processing rows: 800000 Hashtable size: 799999 Memory usage: 403916624 percentage: 0.379
2014-12-18 03:21:16 Processing rows: 900000 Hashtable size: 899999 Memory usage: 452238592 percentage: 0.424
2014-12-18 03:21:16 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 499593552 percentage: 0.469
2014-12-18 03:21:18 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 547966320 percentage: 0.514
2014-12-18 03:21:19 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 593792800 percentage: 0.557
2014-12-18 03:21:21 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 641564688 percentage: 0.602
2014-12-18 03:21:21 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 690130432 percentage: 0.648
2014-12-18 03:21:21 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 737340976 percentage: 0.692
2014-12-18 03:21:24 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 793258352 percentage: 0.745
2014-12-18 03:21:25 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 841009952 percentage: 0.789
2014-12-18 03:21:25 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 887464680 percentage: 0.833
2014-12-18 03:21:28 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 934581288 percentage: 0.877
2014-12-18 03:21:28 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 984062056 percentage: 0.924
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
  Stage-5
官方FAQ解释:


Hive
converted a join into a locally running and faster 'mapjoin', but ran
out of memory while doing so. There are two bugs responsible for this.

hives metric for converting joins miscalculated the required amount of
memory. This is especially true for compressed files and ORC files, as
hive uses the filesize as metric, but compressed tables require more
memory in their uncompressed 'in memory representation'.

The later option may lead to bug number two if you happen to have a affected Hadoop version.

Hive/Hadoop ignores 'hive.mapred.local.mem' ! (more exactly: bug in
Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times,
effectively overriding the user set hive.mapred.local.mem setting. see:

  • 2) & 3) can be set in Big-Bench/engines/hive/conf/hiveSettings.sql

    原因:

    t_mobile_client_p_rcft_mobile_mid_use_p_tmp3_1_rcf
    因此,Hive优化器认为 是小表,所以,会将这张表数据加到DistributeCache中,造成内存溢出。


======select count(1) from t_mobile_mid_use_p_tmp3_1_rcf;

/**
 *MapReduce Jobs Launched:
 *Job 0: Map: 14  Reduce: 1   Cumulative CPU: 102.42 sec   HDFS Read: 172923550 HDFS Write: 9 SUCCESS
 *Total MapReduce CPU Time Spent: 1 minutes 42 seconds 420 msec
 *OK
 *34304843
 *Time taken: 33.022 seconds, Fetched: 1 row(s)
 */
======select count(*) from t_mobile_client_p_rcf;
/**
 *MapReduce Jobs Launched:
 *Job 0: Map: 5  Reduce: 1   Cumulative CPU: 62.47 sec   HDFS Read: 116257926 HDFS Write: 10 SUCCESS
 *Total MapReduce CPU Time Spent: 1 minutes 2 seconds 470 msec
 *OK
 *165830880
 *Time taken: 37.75 seconds, Fetched: 1 row(s)
*/

解决方法:

set hive.auto.convert.join=false;关闭自动转化MapJoin,默认为true;

set hive.ignore.mapjoin.hint=false; 关闭忽略mapjoin的hints(不忽略,hints有效),默认为true(忽略hints)。

最新文章

  1. 使用AWS亚马逊云搭建Gmail转发服务(三)
  2. Python爬虫Scrapy框架入门(3)
  3. 为GDI函数增加透明度处理
  4. PartialViewResult不鸟_ViewStart.cshtml
  5. javascript获取url信息的常见方法
  6. WebDataTree 使用XML做数据源绑定数据
  7. java工程师的标准
  8. Web技术导论复习大纲
  9. 了解开源的许可证GPL、LGPL、BSD、Apache 2.0的区别 【转】
  10. Python正则表达式+自创口诀
  11. JavaScript中依赖注入详细解析
  12. cf437C The Child and Toy
  13. iOS 处理方法中的可变參数
  14. 结构体struct和联合体union以及enum枚举体5的区别
  15. hdu_5683_zxa and xor(非正解的暴力)
  16. Linux内核分析期中总结
  17. HDOJ 2005 第几天?
  18. Python邮件发送源码
  19. 申请LINE 帐号的所有方法
  20. php的ajax简单实例

热门文章

  1. Python学习-list操作
  2. WCF *.svc 自定义地址路由映射
  3. LG3374 【模板】树状数组 1
  4. 洛谷 P1227 [JSOI2008]完美的对称
  5. bzoj 4278 [ONTAK2015]Tasowanie——后缀数组
  6. webservice有关application/xop+xml的异常
  7. AngularJS:template
  8. mybatis-spring升级导致的异常
  9. Mongodb时间问题
  10. Excel开发学习笔记:根据工作表worksheet内容控制按钮的状态