rating_data_raw = sc.textFile("%s/ml-100k/u.data" % PATH)
print rating_data_raw.first()
num_ratings = rating_data_raw.count()
print "Ratings: %d" % num_ratings # In[35]: rating_data = rating_data_raw.map(lambda line: line.split("\t"))
ratings = rating_data.map(lambda fields: int(fields[2]))
max_rating = ratings.reduce(lambda x, y: max(x, y))
min_rating = ratings.reduce(lambda x, y: min(x, y))
mean_rating = ratings.reduce(lambda x, y: x + y) / float(num_ratings)
median_rating = np.median(ratings.collect())
ratings_per_user = num_ratings / num_users
ratings_per_movie = num_ratings / num_movies
print "Min rating: %d" % min_rating
print "Max rating: %d" % max_rating
print "Average rating: %2.2f" % mean_rating
print "Median rating: %d" % median_rating
print "Average # of ratings per user: %2.2f" % ratings_per_user
print "Average # of ratings per movie: %2.2f" % ratings_per_movie # In[36]: # we can also use the stats function to get some similar information to the above
ratings.stats()

上面是粗暴的做法

简单的做法:

>>> all_data = sc.parallelize([1,2,3,4,5,6,7,8,100])
>>> all_data.mean()
15.11111111111111
>>> all_data.max()
100
>>> all_data.min()
1
>>> all_data.median()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'RDD' object has no attribute 'median'
>>> all_data.stats()
(count: 9, mean: 15.1111111111, stdev: 30.0903987804, max: 100.0, min: 1.0) 
 
   

最新文章

  1. 无参数实例化Configuration对象以及addResource无法加载core-site.xml中的内容
  2. Spring中MultipartHttpServletRequest实现文件上传
  3. ZOJ 1188 DNA Sorting
  4. .NET 请求被挂起,前端轮询,委托
  5. ASP.NET MVC学习1
  6. python 循环技巧
  7. VB.NET中LINQ TO List泛型查询语句(分组,聚合函数)
  8. 最大化最小值 Aggressive cows
  9. 使用OkHttp和Retrofit发送网易云信验证码
  10. 【深度学习】吴恩达网易公开课练习(class2 week1 task2 task3)
  11. Tomcat 控制台出现乱码
  12. c#代码文件上传和下载
  13. 2018idea如何布置tomcat修改URL后连接不到
  14. 在web.xml中设置全局编码
  15. JDK类集框架实验(ArrayList,LinkedList,TreeSet,HashSet,TreeMap,HashMap)
  16. VS2015安装与单元测试
  17. php解析mpp文件中的资源
  18. 深入理解uwsgi和gunicorn网络模型
  19. eclipse安装使用教程
  20. 转-安装vncserver

热门文章

  1. Android 复制文本内容到系统剪贴板(自由复制)
  2. dubbo之直连提供者
  3. 2星|《工业X.0》:物联网的资料汇编
  4. Python+selenium第一个自动化脚本
  5. JavaScript小技巧总结
  6. mysql 5.6 中 explicit_defaults_for_timestamp参数
  7. POJ_2411_Mondriaan&#39;s Dream_状态压缩dp
  8. webpack学习(二)
  9. Lua的string库函数、lua中string的模式匹配
  10. Ubuntu Server下docker实战 01: 安装docker