Spark(Python) 从内存中建立 RDD 的例子
Spark(Python) 从内存中建立 RDD 的例子:
myData = ["Alice","Carlos","Frank","Barbara"]
myRdd = sc.parallelize(myData)
myRdd.take(2)
----
In [52]: myData = ["Alice","Carlos","Frank","Barbara"]
In [53]: myRdd = sc.parallelize(myData)
In [54]: myRdd.take(2)
17/09/24 02:40:10 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:393
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Got job 5 (runJob at PythonRDD.scala:393) with 1 output partitions
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (runJob at PythonRDD.scala:393)
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Missing parents: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43), which has no missing parents
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 3.2 KB, free 1767.1 KB)
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1769.3 KB)
17/09/24 02:40:10 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:33950 (size: 2.2 KB, free: 208.7 MB)
17/09/24 02:40:10 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43)
17/09/24 02:40:10 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 1 tasks
17/09/24 02:40:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, localhost, partition 0,PROCESS_LOCAL, 2028 bytes)
17/09/24 02:40:10 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 5)
17/09/24 02:40:11 INFO python.PythonRunner: Times: total = 41, boot = 20, init = 14, finish = 7
17/09/24 02:40:11 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 5). 979 bytes result sent to driver
17/09/24 02:40:11 INFO scheduler.DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:393) finished in 0.423 s
17/09/24 02:40:11 INFO scheduler.DAGScheduler: Job 5 finished: runJob at PythonRDD.scala:393, took 0.648315 s
17/09/24 02:40:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 5) in 423 ms on localhost (1/1)
17/09/24 02:40:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
Out[54]: ['Alice', 'Carlos']
In [55]:
最新文章
- Atitit 教育与培训学校 的计划策划 v4 qc18
- C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本 - 角色成员功能的改进支持公司加入到角色
- spring.net xml 命名空间
- js 闭包原理理解
- zw版【转发·台湾nvp系列Delphi例程】HALCON HImage与Bitmap格式转换
- Uva 10305 给任务排序
- 可辨别iPhone真假的网址
- 【转】jquery-取消冒泡
- homework03
- 实现C++模板类头文件和实现文件分离的方法
- JSON C# Class Generator是一个从JSON文本中生成C#内的应用程序
- QF——关于iOS的强引用,弱引用及strong,retain,copy,weak,assignd的关系
- error C3861: “gets”: 找不到标识符
- 关于响应式、媒体查询和media的关系 、流媒体布局flex 和em rem像素的使用 我有一些废话要讲.....
- MD5和Base64
- 栈和队列简单的STL模板
- php基础(五)日期
- vue.js基础知识篇(3):计算属性、表单控件绑定
- Docker(一):Docker安装
- 牛腩新闻发布系统--学习Web的小技巧汇总