Spark源码分析:

https://yq.aliyun.com/articles/28400?utm_campaign=wenzhang&utm_medium=article&utm_source=QQ-qun&utm_content=m_11999

Spark shuffle:

http://blog.csdn.net/johnny_lee/article/details/22619585

Spark java.lang.OutOfMemoryError: Java heap space

My cluster: 1 master, 11 slaves, each node has 6 GB memory.

My settings:
spark.executor.memory=4g, Dspark.akka.frameSize=512
Here is the problem:

First, I read some data (2.19 GB) from HDFS to RDD:
val imageBundleRDD = sc.newAPIHadoopFile(...)
Second, do something on this RDD:

val res = imageBundleRDD.map(data => {
val desPoints = threeDReconstruction(data._2, bg)
(data._1, desPoints)
})
Last, output to HDFS:

res.saveAsNewAPIHadoopFile(...)
When I run my program it shows:

.....
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:24 as TID 33 on executor 9: Salve7.Hadoop (NODE_LOCAL)
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:24 as 30618515 bytes in 210 ms
14/01/15 21:42:27 INFO cluster.ClusterTaskSetManager: Starting task 1.0:36 as TID 34 on executor 2: Salve11.Hadoop (NODE_LOCAL)
14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Serialized task 1.0:36 as 30618515 bytes in 449 ms
14/01/15 21:42:28 INFO cluster.ClusterTaskSetManager: Starting task 1.0:32 as TID 35 on executor 7: Salve4.Hadoop (NODE_LOCAL)
Uncaught error from thread [spark-akka.actor.default-dispatcher-3] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[spark]

I have a few suggestions:

  • If your nodes are configured to have 6g maximum for Spark (and are leaving a little for other processes), then use 6g rather than 4g, spark.executor.memory=6g. Make sure you're using as much memory as possible by checking the UI (it will say how much mem you're using)
  • Try using more partitions, you should have 2 - 4 per CPU. IME increasing the number of partitions is often the easiest way to make a program more stable (and often faster). For huge amounts of data you may need way more than 4 per CPU, I've had to use 8000 partitions in some cases!
  • Decrease the fraction of memory reserved for caching, using spark.storage.memoryFraction. If you don't use cache() or persist in your code, this might as well be 0. It's default is 0.6, which means you only get 0.4 * 4g memory for your heap. IME reducing the mem frac often makes OOMs go away. UPDATE: From spark 1.6 apparently we will no longer need to play with these values, spark will determine them automatically.
  • Similar to above but shuffle memory fraction. If your job doesn't need much shuffle memory then set it to a lower value (this might cause your shuffles to spill to disk which can have catastrophic impact on speed). Sometimes when it's a shuffle operation that's OOMing you need to do the opposite i.e. set it to something large, like 0.8, or make sure you allow your shuffles to spill to disk (it's the default since 1.0.0).
  • Watch out for memory leaks, these are often caused by accidentally closing over objects you don't need in your lambdas. The way to diagnose is to look out for the "task serialized as XXX bytes" in the logs, if XXX is larger than a few k or more than an MB, you may have a memory leak. See http://stackoverflow.com/a/25270600/1586965
  • Related to above; use broadcast variables if you really do need large objects.
  • If you are caching large RDDs and can sacrifice some access time consider serialising the RDDhttp://spark.apache.org/docs/latest/tuning.html#serialized-rdd-storage. Or even caching them on disk (which sometimes isn't that bad if using SSDs).
  • (Advanced) Related to above, avoid String and heavily nested structures (like Map and nested case classes). If possible try to only use primitive types and index all non-primitives especially if you expect a lot of duplicates. Choose WrappedArray over nested structures whenever possible. Or even roll out your own serialisation - YOU will have the most information regarding how to efficiently back your data into bytes, USE IT!
  • (bit hacky) Again when caching, consider using a Dataset to cache your structure as it will use more efficient serialisation. This should be regarded as a hack when compared to the previous bullet point. Building your domain knowledge into your algo/serialisation can minimise memory/cache-space by 100x or 1000x, whereas all a Dataset will likely give is 2x - 5x in memory and 10x compressed (parquet) on disk.

http://spark.apache.org/docs/1.2.1/configuration.html

EDIT: (So I can google myself easier) The following is also indicative of this problem:

java.lang.OutOfMemoryError : GC overhead limit exceeded

Answer2:

Have a look at the start up scripts a Java heap size is set there, it looks like you're not setting this before running Spark worker.

# Set SPARK_MEM if it isn't already set since we also use it for this process
SPARK_MEM=${SPARK_MEM:-512m}
export SPARK_MEM # Set JAVA_OPTS to be able to load native libraries and to set heap size
JAVA_OPTS="$OUR_JAVA_OPTS"
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"

You can find the documentation to deploy scripts here.

 

最新文章

  1. MySql 存储过程及调用方法
  2. java.lang.OutOfMemoryError: Java heap space
  3. Mabitis 多表查询(一)resultType=“java.util.hashMap”
  4. iOS自动偏移64个像素
  5. js数据类型和关系运算语法
  6. mysql SQL SERVER 的算法
  7. Android 通过程序添加桌面快捷方式
  8. 机器学习之SVM(支持向量机)
  9. Activiti 工作流会签开发设计思路
  10. html本地存储尝试
  11. windows环境中mysql忘记root密码的解决办法
  12. Android在未来对 Java 8 特性的支持
  13. echarts 折线图自定义颜色与修改legend颜色
  14. zuul网关入门(一、网关具有的功能)
  15. 爬虫下载校花网美女信息-lxml
  16. 深入理解javascript原型和闭包——从【自由变量】到【作用域链】
  17. A - Dogs and Cages HDU - 6243(组合数学)
  18. tmux不自动加载配置文件.tmux.conf
  19. 在Linux里安装Samba(文件共享)方便在Windows下面操作
  20. chrome浏览器使用chrome://inspect调试app 网页,打开空白的问题

热门文章

  1. [luoguP2862] [USACO06JAN]把牛Corral the Cows(二分 + 乱搞)
  2. GFS, HDFS, Blob File System架构对比
  3. Spring Open Session In View
  4. request.getContextPath是为了解决相对路径的问题,可返回站点的根路径
  5. 解决 IDEA 中src下xml等资源文件无法读取的问题
  6. Infinite monkey theorem(hdu 3689)
  7. Linux编译安装Apache+PHP
  8. shell按日期自动切割nginx日志脚本
  9. PLsql/Oracle数据库中没有scott账户,如何创建并解锁
  10. Xib/Storyboard碰到不同版本的Xcode真是想死啊!