一、cache和persisit的对比

-rw-r--r--@ 1 hadoop staff 68M 5 17 07:04 access.log

cache/persitence是 lazy的,延迟加载
unpersitence是立即执行的

@DeveloperApi
class StorageLevel private(
private var _useDisk: Boolean,
private var _useMemory: Boolean,
private var _useOffHeap: Boolean,
private var _deserialized: Boolean,
private var _replication: Int = 1)
extends Externalizable { } /**
* Various [[org.apache.spark.storage.StorageLevel]] defined and utility functions for creating
* new storage levels.
*/
object StorageLevel {
val NONE = new StorageLevel(false, false, false, false)
val DISK_ONLY = new StorageLevel(true, false, false, false)
val DISK_ONLY_2 = new StorageLevel(true, false, false, false, 2)
val MEMORY_ONLY = new StorageLevel(false, true, false, true)
val MEMORY_ONLY_2 = new StorageLevel(false, true, false, true, 2)
val MEMORY_ONLY_SER = new StorageLevel(false, true, false, false)
val MEMORY_ONLY_SER_2 = new StorageLevel(false, true, false, false, 2)
val MEMORY_AND_DISK = new StorageLevel(true, true, false, true)
val MEMORY_AND_DISK_2 = new StorageLevel(true, true, false, true, 2)
val MEMORY_AND_DISK_SER = new StorageLevel(true, true, false, false)
val MEMORY_AND_DISK_SER_2 = new StorageLevel(true, true, false, false, 2)
val OFF_HEAP = new StorageLevel(true, true, true, false, 1)
  /**
* Persist this RDD with the default storage level (`MEMORY_ONLY`).
*/
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY) /**
* Persist this RDD with the default storage level (`MEMORY_ONLY`).
*/
def cache(): this.type = persist() /**
* Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
*
* @param blocking Whether to block until all blocks are deleted.
* @return This RDD.
*/
def unpersist(blocking: Boolean = true): this.type = {
logInfo("Removing RDD " + id + " from persistence list")
sc.unpersistRDD(id, blocking)
storageLevel = StorageLevel.NONE
this
} /** Get the RDD's current storage level, or StorageLevel.NONE if none is set. */
def getStorageLevel: StorageLevel = storageLevel

二、序列化测试Java和kyro

序列化:
默认java序列化类User
使用kyro序列化没有未注册类User
使用kryo序列化并注册类User



默认java序列化类User

import scala.collection.mutable.ListBuffer
class User(id:Int,username:String,age:String) extends Serializable
val users = new ListBuffer[User]
for(i <- 1 to 1000000){
users.+=(new User(i,"name"+i,i.toString))
}
val usersRDD=sc.parallelize(users)
import org.apache.spark.storage.StorageLevel
usersRDD.persist(StorageLevel.MEMORY_ONLY_SER)
usersRDD.foreach(println(_))

使用kyro序列化没有未注册类User

import org.apache.spark.SparkConf
val sparkConf= new SparkConf()
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") import org.apache.spark.SparkContext

使用kryo序列化并注册类User

sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.registerKryoClasses(Array(classOf[User]))

最新文章

  1. HTTP协议详解
  2. lock
  3. 基于jQuery的对象切换插件:soChange 1.5 (点击下载)
  4. android的充电图标显示
  5. ***mysql索引总结----mysql索引类型以及创建
  6. table 表格隔行换色实现
  7. 条形码/二维码之开源利器ZXing图文介绍(转)
  8. WIN7远程桌面重启、关机
  9. phpstudy 相关配置
  10. 做一个360度看车的效果玩玩(web)
  11. Mysql索引使用解析
  12. SpringMVC Controller接收前台ajax的GET或POST请求返回各种参数
  13. 最简单的java浏览器
  14. 37)django-单例模式
  15. xorm中的几个坑
  16. 将react升级到15之后的坑
  17. 转: MinGw离线安装方法集合
  18. asm数据文件迁移(asm–&gt;asm)
  19. Ubuntu16.04安装vmware workstation14
  20. python-爬图小样

热门文章

  1. C语言基于NIOSII的软件开发及流水灯设计
  2. 如何从Mac删除恶意广告软件,摆脱那些通过弹出广告或工具栏入侵Mac的恶意软件
  3. Ubuntu : apt 命令
  4. C语言搬书学习第一记 —— 认识一个简单程序的细节
  5. gitlab如何从Github导入项目
  6. IT兄弟连 HTML5教程 DIV+CSS的兼容性问题
  7. Orcle如何获取当前时间
  8. MyBatis框架之第三篇
  9. 如何使用CAD删除命令?怎么删除图纸中线段
  10. Bootstrap4 本地编译运行