参照官方文档http://nlp.solutions.asia/?p=180

中间碰到的问题,解决方法参考

http://blog.javachen.com/2014/05/20/nutch-intro/





问题1:

compile-core:

[javac] Compiling 180 source files to /root/nutch/build/classes

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._zipfs.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunec.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunjce_provider.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._sunpkcs11.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._jfxrt.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._dnsns.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._nashorn.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._localedata.jar; error in opening zip file

[javac] error: error reading /usr/lib/jvm/jdk1.8.0_20/jre/lib/ext/._cldrdata.jar; error in opening zip file

[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6

[javac] 9 errors

[javac] 1 warning

BUILD FAILED

/root/nutch/build.xml:101: Compile failed; see the compiler error output for details.



原ext文件夹没有._这些jar,但是有同名zipfs,直接copy,编译通过;







问题2:

root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.gora.sql.store.SqlStore

at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:259)

at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:90)

at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:74)

at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)

at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

参考以下文章:

http://blog.sina.com.cn/s/blog_3c9872d00101p4f0.html





问题三:

root@iZ280izbfjqZ:~/nutch/runtime/local# bin/nutch crawl urls -depth 3 -topN 5

InjectorJob: Using class org.apache.gora.sql.store.SqlStore as the Gora storage class.

InjectorJob: total number of urls rejected by filters: 0

InjectorJob: total number of urls injected after normalization and filtering: 1

Exception in thread "main" java.lang.RuntimeException: job failed: name=generate: *, jobid=job_local1888916405_0002

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55)

at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)

at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)

at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

nutch/src/java/org/apache/nutch/crawl/GeneratorReducer.java,然后看其100行左右





batchId=new Utf8(conf.get(GeneratorJob.BATCH_ID));





改为:

int randomSeed = Math.abs(new Random().nextInt());

String batchIdStr = (System.currentTimeMillis()/1000)+"-"+randomSeed;

batchId = new Utf8( batchIdStr );





问题4.

解决

alter table webpage add batchId varchar(767) DEFAULT NULL;

然后就成功了,庆祝一下

最新文章

  1. [转]jQuery的each方法的几种常用的用法
  2. MySQL模糊查询
  3. ADO .NET 链接 增删改查
  4. WebBrowser控件应用:弹出新窗体和关闭窗口
  5. 单独调用Ueditor的图片上传功能
  6. 10.30Daily Scrum
  7. Android Studio 导入第三方jar包
  8. js 截取字符串
  9. struts1面试题
  10. BZOJ 3505: [Cqoi2014]数三角形( 组合数 )
  11. Android GsonUtils工具类
  12. 网络资源(3) - iBatis视频
  13. 细说Linux权限
  14. EF时,数据库字段和实体类不一致问题
  15. 【Teradata Utility】使用SQL Assistant导出导入数据
  16. 还不知道spring的RestTemplate的妙用吗
  17. Android Launcher分析和修改2——Icon修改、界面布局调整、壁纸设置
  18. C#窗口禁止移动的方法
  19. leetcode149
  20. CGJ02、BD09、西安80、北京54、CGCS2000常用坐标系详解

热门文章

  1. scapy--初识
  2. Linux相关常用命令
  3. selenium库:自动化测试工具
  4. C语言函数篇(二)函数参数基础设计
  5. linux最大进程数
  6. Pandas 数值计算和统计基础
  7. Docker构建nginx+uwsgi+flask镜像(一)
  8. Java与Scala的两种简易版连接池
  9. 顺序查找&二分查找&索引查找
  10. global js库