最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

最新文章

  1. android端,webview内url跳转到app本地
  2. JavaScript思维导图—数组
  3. C4.5(决策树)
  4. addrinfo 结构
  5. Android---表格布局
  6. window常见事件
  7. android 学习随笔二(读写文件)
  8. Ubuntu升级没有声音的解决方法
  9. PAT-乙级-1045. 快速排序(25)
  10. 【转】COCOS2D-X之CCHttpRequest下载图片Demo
  11. 系统的启动模式(启动级别)的改动---使用upstart启动机制的
  12. RSA加密解密及数字签名Java实现--转
  13. C# ---- 串口数据YSI实例
  14. 关于在TabBar 中添加按钮,并通过block 或代理在控制器中实现响应
  15. MASM32快速起步
  16. Tomcat启动:Container StandardContext[] has not been started
  17. Java+Selenium向文本框输入内容以后模仿键盘的"ENTRY"
  18. Openresty 进行限流的方法
  19. Oracle高效分页查询(转)
  20. RockChip RK3326 系统编译问题总结

热门文章

  1. js跳转页面的方法
  2. 前端之BOM,DOM
  3. WPF中Brush类型
  4. 使用xpath爬取猫眼电影排行榜
  5. O006、CPU和内存虚拟化原理
  6. 【Groovy】 Groovy笔记
  7. Hyperledger Fabric 环境搭建(1)
  8. 4G漏洞
  9. deep_learning_tensorflow_get_variable()
  10. CentOS7使用阿里云的yum源