spark 2.1.1

系统中希望监控spark on yarn任务的执行进度,但是监控过程发现提交任务之后执行进度总是10%,直到执行成功或者失败,进度会突然变为100%,很神奇,

下面看spark on yarn任务提交过程:

spark on yarn提交任务时会把mainClass修改为Client

childMainClass = "org.apache.spark.deploy.yarn.Client"

spark-submit过程详见:https://www.cnblogs.com/barneywill/p/9820684.html

下面看Client执行过程:

org.apache.spark.deploy.yarn.Client

  def main(argStrings: Array[String]) {
...
val sparkConf = new SparkConf
// SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
// so remove them from sparkConf here for yarn mode.
sparkConf.remove("spark.jars")
sparkConf.remove("spark.files")
val args = new ClientArguments(argStrings)
new Client(args, sparkConf).run()
... def run(): Unit = {
this.appId = submitApplication()
... def submitApplication(): ApplicationId = {
...
val containerContext = createContainerLaunchContext(newAppResponse)
... private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse)
: ContainerLaunchContext = {
...
val amClass =
if (isClusterMode) {
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else {
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}

这里调用过程为Client.main->run->submitApplication->createContainerLaunchContext,然后会设置amClass,最终都会调用到ApplicationMaster,因为ExecutorLauncher内部也是调用ApplicationMaster,如下:

org.apache.spark.deploy.yarn.ExecutorLauncher

object ExecutorLauncher {

  def main(args: Array[String]): Unit = {
ApplicationMaster.main(args)
} }

下面看ApplicationMaster:

org.apache.spark.deploy.yarn.ApplicationMaster

  def main(args: Array[String]): Unit = {
...
SparkHadoopUtil.get.runAsSparkUser { () =>
master = new ApplicationMaster(amArgs, new YarnRMClient)
System.exit(master.run())
}
... final def run(): Int = {
...
if (isClusterMode) {
runDriver(securityMgr)
} else {
runExecutorLauncher(securityMgr)
}
... private def registerAM(
_sparkConf: SparkConf,
_rpcEnv: RpcEnv,
driverRef: RpcEndpointRef,
uiAddress: String,
securityMgr: SecurityManager) = {
...
allocator = client.register(driverUrl,
driverRef,
yarnConf,
_sparkConf,
uiAddress,
historyAddress,
securityMgr,
localResources) allocator.allocateResources()
reporterThread = launchReporterThread()
...
private def launchReporterThread(): Thread = {
// The number of failures in a row until Reporter thread give up
val reporterMaxFailures = sparkConf.get(MAX_REPORTER_THREAD_FAILURES) val t = new Thread {
override def run() {
var failureCount = 0
while (!finished) {
try {
if (allocator.getNumExecutorsFailed >= maxNumExecutorFailures) {
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_MAX_EXECUTOR_FAILURES,
s"Max number of executor failures ($maxNumExecutorFailures) reached")
} else {
logDebug("Sending progress")
allocator.allocateResources()
}
...

这里调用过程为ApplicationMaster.main->run,run中会调用runDriver或者runExecutorLauncher,最终都会调用到registerAM,其中会调用YarnAllocator.allocateResources,然后在launchReporterThread中会启动一个thread,其中也会不断调用YarnAllocator.allocateResources,下面看YarnAllocator:

org.apache.spark.deploy.yarn.YarnAllocator

  def allocateResources(): Unit = synchronized {
updateResourceRequests() val progressIndicator = 0.1f
// Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
// requests.
val allocateResponse = amClient.allocate(progressIndicator)

可见这里会设置进度为0.1,即10%,而且是硬编码,所以spark on yarn的执行进度一直为10%,所以想监控spark on yarn的任务进度看来是徒劳的;

最新文章

  1. sql查找最后一个字符匹配
  2. android之旋转的刻度盘
  3. linux添加环境变量
  4. usaco 购买饲料 && 修剪草坪
  5. 安装SQL Server2005出现 IIS警告原因
  6. Struts1和Struts2的区别和对比
  7. 如何用PHP做到页面注册审核
  8. 基于Windows下python3.4.1IDLE常用快捷键小结
  9. Base64算法原理
  10. MSIL实用指南-一维数组的操作
  11. Android高效率编码-第三方SDK详解系列(二)——Bmob后端云开发,实现登录注册,更改资料,修改密码,邮箱验证,上传,下载,推送消息,缩略图加载等功能
  12. ffmypeg 视频处理类库使用方法
  13. 【java爬虫】---爬虫+jsoup轻松爬博客
  14. timestamp时间格式
  15. C语言排序算法学习笔记——交换类排序
  16. about the libiconv.2.dylib
  17. Android Studio开发第一篇QuickStart
  18. spring中MessageSource的配置使用方法3--ResourceBundleMessageSource【转】
  19. Python中fnmatch模块的使用
  20. ubuntu18.04 安装Navicat 解决字体方框问题

热门文章

  1. Python--day01(计算机基础)
  2. 为什么Fourier分析?
  3. 为什么开源外围包安装指导都是按照到/usr/local/目录下,/usr/local与/usr的区别
  4. Python中的 一些常用技巧函数[.join()]
  5. python之内置函数(一)
  6. JS快速排序 希尔排序 归并排序 选择排序
  7. P4783 【模板】矩阵求逆
  8. 【XSY2962】作业 数学
  9. Hdoj 1425.sort 题解
  10. CF1059C Sequence Transformation