配置nutch

(nutch文件夹已在/home目录下)

1. 修改系统环境变量

sudo gedit /etc/profile

//增加

#set nutch
export PATH=/home/nutch/runtime/local/bin:$PATH

2. 测试(nutch/runtime/local/bin中./nutch  &  ./crawl)

nutch
//结果如下:
Usage: nutch COMMAND
where COMMAND is one of:
inject inject new urls into the database
hostinject creates or updates an existing host table from a text file
generate generate new batches to fetch from crawl db
fetch fetch URLs marked during generate
parse parse URLs marked during fetch
updatedb update web table after parsing
updatehostdb update host table after parsing
readdb read/dump records from page database
readhostdb display entries from the hostDB
elasticindex run the elasticsearch indexer
solrindex run the solr indexer on parsed batches
solrdedup remove duplicates from solr
parsechecker check the parser for a given url
indexchecker check the indexing filters for a given url
plugin load a plugin and run one of its classes main()
nutchserver run a (local) Nutch server on a user defined port
junit runs the given JUnit test
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
crawl
//结果如下:
Missing seedDir : crawl <seedDir> <crawlID> <solrURL> <numberOfRounds>

最新文章

  1. JSP中编译指令include与动作指令include的区别
  2. ios coreData使用
  3. loadView, viewDidLoad 快速使用
  4. Mysql 的MYISAM引擎拷贝出现异常——Incorrect information in file &#39;xxx.frm&#39;
  5. 您需要来自administrators的权限才能对此文件进行更改
  6. Linux 根文件系统的制作
  7. 热键HotKeys
  8. HDU4525+公式
  9. XMLHttpRequest发送请求
  10. js中Date对象
  11. css精灵(css script 技术)
  12. Shell 文字 顶替tomcat星团war包裹
  13. Tomcat如何实现Comet
  14. java基础语法(一)
  15. Windows下建立FTP服务器站点
  16. Ext.net MessageBox提示
  17. 转载:使用Tornado+Redis维护ADSL拨号服务器代理池
  18. 学习笔记DL007:Moore-Penrose伪逆,迹运算,行列式,主成分分析PCA
  19. wpf 寻找TreeView的子元素,并对其进行操作
  20. 【下一代核心技术DevOps】:(三)私有代码库阿里云Git使用

热门文章

  1. 闲鱼Flutter&FaaS云端一体化架构
  2. 微信小程序开发资源整理
  3. HZOJ Dash Speed
  4. 如何在WPF控件上应用简单的褪色透明效果?
  5. python 集合创建
  6. laravel 踩坑 env,config
  7. shell爬虫
  8. Python基础:27执行环境
  9. 2014年最热门的国人开发开源软件TOP100
  10. IntelliJ IDEA和Eclipse设置JVM运行参数