今天调研了并行化频繁模式挖掘算法PFP Growth及其在Mahout下的命令使用,简单记录下试验结果,供以后查阅:

  • 环境:Jdk1.7 + Hadoop2.2.0单机伪集群 +  Mahout0.6(0.8和0.9版本号都不包括该算法。Mahout0.6能够和Hadoop2.2.0和平共处有点意外orz)
  • 部分输入数据,输入数据一行代表一个购物篮:

4750,19394,25651,6395,5592

26180,10895,24571,23295,20578,27791,2729,8637

7380,18805,25086,19048,3190,21995,10908,12576

3458,12426,20578

1880,10702,1731,5185,18575,28967

21815,10872,18730

20626,17921,28930,14580,2891,11080

18075,6548,28759,17133

7868,15200,13494

7868,28617,18097,22999,16323,8637,7045,25733

12189,8816,22950,18465,13258,27791,20979

26728

17512,14821,18741

26619,14470,21899,6731

5184

28653,28662,18353,27437,5661,12078,11849,15784,7248,7061,18612,24277,4807,15584,9671,18741,3647,1000

。。

  • 运行命令:

mahout fpg -i /workspace/dataguru/hadoopdev/week13/fpg/in/ -o /workspace/dataguru/hadoopdev/week13/fpg/out -method mapreduce -s 3

參数说明:

-i 输入路径,因为执行在hadoop环境中,所以输入路径必须是hdfs路径,实验的输入路径是/workspace/dataguru/hadoopdev/week13/fpg/in/user2items.csv

-o 输出路径,指定hdfs中的输出路径

完整參数说明參见下表:

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvdTAxMDk2NzM4Mg==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="">

  • 命令运行以后的输出文件夹:

casliyang@singlehadoop:~$ hadoop dfs -ls /workspace/dataguru/hadoopdev/week13/fpg/out

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

Found 4 items

-rw-r--r--   3 casliyang supergroup       5567 2014-06-17 17:50 /workspace/dataguru/hadoopdev/week13/fpg/out/fList

drwxr-xr-x   - casliyang supergroup          0 2014-06-17 17:51 /workspace/dataguru/hadoopdev/week13/fpg/out/fpgrowth

drwxr-xr-x   - casliyang supergroup          0 2014-06-17 17:51 /workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns

drwxr-xr-x   - casliyang supergroup          0 2014-06-17 17:50 /workspace/dataguru/hadoopdev/week13/fpg/out/parallelcounting

挖掘出来的频繁模式在frequentpatterns目录下

casliyang@singlehadoop:~$ hadoop dfs -ls /workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

Found 2 items

-rw-r--r--   3 casliyang supergroup          0 2014-06-17 17:51 /workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/_SUCCESS

-rw-r--r--   3 casliyang supergroup      10017 2014-06-17 17:51 /workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/part-r-00000

该文件是序列化文件,不能直接查看,mahout提供了命令能够将其转换为普通文本:

mahout seqdumper -s /workspace/dataguru/hadoopdev/week13/fpg/out/frequentpatterns/part-r-00000 -o /home/casliyang/outpattern

这里要注意。-o指定的输出文件路径必须是linux文件系统。而且目标文件必须提前创建好,否则会报错。

  • 终于输出到/home/casliyang/outpattern的部分结果

Key: 29099: Value: ([29099],18), ([29099, 4479],3)

Key: 29202: Value: ([29202],3)

Key: 29203: Value: ([29203],9), ([14020, 29203],3)

Key: 29224: Value: ([29224],3)

Key: 29547: Value: ([29547],5)

Key: 2963: Value: ([2963],8), ([2963, 21146],3)

Key: 2999: Value: ([2999],3)

Key: 3032: Value: ([3032],4)

Key: 3047: Value: ([3047],4)

Key: 3151: Value: ([3151],7), ([14020, 3151],4)

Key: 3181: Value: ([3181],3)

Key: 3228: Value: ([3228],14)

Key: 3313: Value: ([3313],3)

Key: 3324: Value: ([3324],3)

Key: 3438: Value: ([3438],3)

Key: 3458: Value: ([3458],4)

Key: 3627: Value: ([3627],11), ([3627, 11176],3)

。。。。

含义:

Key:itemid

Value:和该item相关的频繁模式及其支持度

有了挖掘出来的频繁模式。就能够进一步用程序依据业务需求做处理了。

Mahout真是个伟大的开源项目。

最新文章

  1. Wireshark图解教程(简介、抓包、过滤器)
  2. ROS笔记——创建简单的主题发布节点和主题订阅节点
  3. MVC5 + EF6 + Bootstrap3 (13) 查看详情、编辑数据、删除数据
  4. 使用kaptcha生成验证码
  5. org.hibernate.PersistentObjectException: detached entity passed to persist异常
  6. c语言实用功能库函数#include<stdlib.h>
  7. SharePoint Server 2007 Enterprise Key
  8. Class Model of Quick Time Plugin
  9. Linux下Git安装、配置
  10. 【Keras】从两个实际任务掌握图像分类
  11. es定期删除数据
  12. 【Beta阶段】第八次Scrum Meeting!
  13. EF使用MySql DBFirst产品的问题总结
  14. 《Web性能权威指南》
  15. javaScript高级教程(八)-----正则表达式温故知新
  16. BZOJ4767: 两双手【组合数学+容斥原理】
  17. Yii2 环境配置生产环境和测试环境
  18. Mapped Statements collection does not contain value for 问题的解决
  19. MAC上类apt-get工具brew的安装与使用
  20. PHP 成长规划

热门文章

  1. 设计模式:命令模式(Command Pattern)
  2. git-忽略文件改动不进行提交
  3. error: version in "./docker-compose.yml" is unsupported
  4. Yii2 advance swiftmailer 不能发送邮件
  5. 复制webp图片到word || 微信webp图片不能复制 || 如何复制webp到word
  6. C++之类成员的访问权限详解(一)
  7. Linux中一些约定俗成的文件扩展名
  8. web即时通信技术
  9. tornado框架基础04-模板基础
  10. LeetCode(81) Search in Rotated Array II