python spark 求解最大 最小 平均 中位数
2024-09-20 21:37:40
rating_data_raw = sc.textFile("%s/ml-100k/u.data" % PATH)
print rating_data_raw.first()
num_ratings = rating_data_raw.count()
print "Ratings: %d" % num_ratings # In[35]: rating_data = rating_data_raw.map(lambda line: line.split("\t"))
ratings = rating_data.map(lambda fields: int(fields[2]))
max_rating = ratings.reduce(lambda x, y: max(x, y))
min_rating = ratings.reduce(lambda x, y: min(x, y))
mean_rating = ratings.reduce(lambda x, y: x + y) / float(num_ratings)
median_rating = np.median(ratings.collect())
ratings_per_user = num_ratings / num_users
ratings_per_movie = num_ratings / num_movies
print "Min rating: %d" % min_rating
print "Max rating: %d" % max_rating
print "Average rating: %2.2f" % mean_rating
print "Median rating: %d" % median_rating
print "Average # of ratings per user: %2.2f" % ratings_per_user
print "Average # of ratings per movie: %2.2f" % ratings_per_movie # In[36]: # we can also use the stats function to get some similar information to the above
ratings.stats()
上面是粗暴的做法
简单的做法:
>>> all_data = sc.parallelize([1,2,3,4,5,6,7,8,100])
>>> all_data.mean()
15.11111111111111
>>> all_data.max()
100
>>> all_data.min()
1
>>> all_data.median()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'RDD' object has no attribute 'median'
>>> all_data.stats()
(count: 9, mean: 15.1111111111, stdev: 30.0903987804, max: 100.0, min: 1.0)
最新文章
- 无参数实例化Configuration对象以及addResource无法加载core-site.xml中的内容
- Spring中MultipartHttpServletRequest实现文件上传
- ZOJ 1188 DNA Sorting
- .NET 请求被挂起,前端轮询,委托
- ASP.NET MVC学习1
- python 循环技巧
- VB.NET中LINQ TO List泛型查询语句(分组,聚合函数)
- 最大化最小值 Aggressive cows
- 使用OkHttp和Retrofit发送网易云信验证码
- 【深度学习】吴恩达网易公开课练习(class2 week1 task2 task3)
- Tomcat 控制台出现乱码
- c#代码文件上传和下载
- 2018idea如何布置tomcat修改URL后连接不到
- 在web.xml中设置全局编码
- JDK类集框架实验(ArrayList,LinkedList,TreeSet,HashSet,TreeMap,HashMap)
- VS2015安装与单元测试
- php解析mpp文件中的资源
- 深入理解uwsgi和gunicorn网络模型
- eclipse安装使用教程
- 转-安装vncserver
热门文章
- Android 复制文本内容到系统剪贴板(自由复制)
- dubbo之直连提供者
- 2星|《工业X.0》:物联网的资料汇编
- Python+selenium第一个自动化脚本
- JavaScript小技巧总结
- mysql 5.6 中 explicit_defaults_for_timestamp参数
- POJ_2411_Mondriaan&#39;s Dream_状态压缩dp
- webpack学习(二)
- Lua的string库函数、lua中string的模式匹配
- Ubuntu Server下docker实战 01: 安装docker