Spark(Python) 从内存中建立 RDD 的例子:

myData = ["Alice","Carlos","Frank","Barbara"]
myRdd = sc.parallelize(myData)
myRdd.take(2)

----
In [52]: myData = ["Alice","Carlos","Frank","Barbara"]

In [53]: myRdd = sc.parallelize(myData)

In [54]: myRdd.take(2)
17/09/24 02:40:10 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:393
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Got job 5 (runJob at PythonRDD.scala:393) with 1 output partitions
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (runJob at PythonRDD.scala:393)
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Missing parents: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43), which has no missing parents
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 3.2 KB, free 1767.1 KB)
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1769.3 KB)
17/09/24 02:40:10 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:33950 (size: 2.2 KB, free: 208.7 MB)
17/09/24 02:40:10 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43)
17/09/24 02:40:10 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 1 tasks
17/09/24 02:40:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, localhost, partition 0,PROCESS_LOCAL, 2028 bytes)
17/09/24 02:40:10 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 5)
17/09/24 02:40:11 INFO python.PythonRunner: Times: total = 41, boot = 20, init = 14, finish = 7
17/09/24 02:40:11 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 5). 979 bytes result sent to driver
17/09/24 02:40:11 INFO scheduler.DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:393) finished in 0.423 s
17/09/24 02:40:11 INFO scheduler.DAGScheduler: Job 5 finished: runJob at PythonRDD.scala:393, took 0.648315 s
17/09/24 02:40:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 5) in 423 ms on localhost (1/1)
17/09/24 02:40:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
Out[54]: ['Alice', 'Carlos']

In [55]:

最新文章

  1. Atitit 教育与培训学校 的计划策划 v4 qc18
  2. C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本 - 角色成员功能的改进支持公司加入到角色
  3. spring.net xml 命名空间
  4. js 闭包原理理解
  5. zw版【转发·台湾nvp系列Delphi例程】HALCON HImage与Bitmap格式转换
  6. Uva 10305 给任务排序
  7. 可辨别iPhone真假的网址
  8. 【转】jquery-取消冒泡
  9. homework03
  10. 实现C++模板类头文件和实现文件分离的方法
  11. JSON C# Class Generator是一个从JSON文本中生成C#内的应用程序
  12. QF——关于iOS的强引用,弱引用及strong,retain,copy,weak,assignd的关系
  13. error C3861: “gets”: 找不到标识符
  14. 关于响应式、媒体查询和media的关系 、流媒体布局flex 和em rem像素的使用 我有一些废话要讲.....
  15. MD5和Base64
  16. 栈和队列简单的STL模板
  17. php基础(五)日期
  18. vue.js基础知识篇(3):计算属性、表单控件绑定
  19. Docker(一):Docker安装
  20. 牛腩新闻发布系统--学习Web的小技巧汇总

热门文章

  1. 将window的shell脚本通过ftp传输到Linux服务器后, shell脚本中执行时提示“没有那个文件或目录”的解决办法
  2. 网站软件FTP下载
  3. 标准JSF的生命周期
  4. Ehcache缓存配置和基本使用
  5. SQL删除多列语句
  6. db2错误代码大全
  7. NFS网络共享介绍与使用
  8. months_between()用法
  9. CompletionService简讲
  10. java多重转型问题