When web scraping, you'll often want to get more than just one page of data. Xray supports pagination by finding the "next" or "more" button on each page and cycling through each new page until it can no longer find that link. This lesson demonstrates how to paginate as well as more advanced selectors for when links are difficult to scrape.

/**
* Created by Answer1215 on 8/22/2015.
*/
var Xray = require('x-ray');
var xray = new Xray(); xray('https://news.ycombinator.com/', '.athing', [{
rank: '.rank',
title: 'td:nth-child(3) a',
link: "td:nth-child(3) a@href"
}])
.paginate('a[rel="nofollow"]:last-child@href')
.limit(3)
.write('./results2.json'); ///////////////////////////////
// test
/////////////////////////////// xray('https://news.ycombinator.com/', 'a[rel="nofollow"]', [{
show: ''
}]).write('./results2.json');
/**
* [
{
"show": "Segment is hiring security engineers to help secure our container fleet"
},
{
"show": "Modafinil for cognitive neuroenhancement: a systematic review"
},
{
"show": "Ports and Power in the Indian Ocean"
},
{
"show": "Natural and Artificial Intelligence (1988) [pdf]"
},
{
"show": "Proofing Spirits with a Homemade Electrobalance"
},
{
"show": "Seth Nickell on Replacing the Aging Init Procedure on Linux (2003)"
},
{
"show": "More"
}
]
* */ xray('https://news.ycombinator.com/', 'a[rel="nofollow"]:last-child', [{
show: ''
}]).write('./results2.json');
/*
* [
{
"show": "More"
}
]
* */

最新文章

  1. 15个C++项目列表
  2. 如何根据执行计划,判断Mysql语句是否走索引
  3. 自定义cell侧滑删除
  4. C#学习笔记-Windows窗体基本功能(Login登录界面)
  5. 从头开始一步一步实现EF6+Autofac+MVC5+Bootstarp极简前后台ajax表格展示及分页(二)前端修改、添加表格行点击弹出模态框
  6. samba的rpm包,只有tar.gz文件安装
  7. RelativeLayout不能调用measure去直接测量子元素
  8. cadence16.3破解方法
  9. hdu4576 概率dp n^2的矩阵
  10. RMQ(dp)
  11. NSRunLoop 详解
  12. 【夸QT十一】外来物品:通用脚本帮助Web运行基础Linux命令
  13. mybatis基础学习4-插件生成器
  14. KMP及其改进算法
  15. Python 第三方库 cp27、cp35 等文件名的含义(转)
  16. flare-spork: 自己维护的Pig on Spark项目
  17. ARM的栈指令(转)
  18. day5 python
  19. maybe i have no answer
  20. JAVA补充-抽象类

热门文章

  1. 听同事讲 Bayesian statistics: Part 2 - Bayesian inference
  2. Web-Scale IT 我之见!
  3. 在 Visual Studio 2010 中创建 ASP.Net Web Service
  4. JDBC事务控制管理
  5. BZOJ_1202_狡猾的商人_(并查集)
  6. 统计难题 HDOJ --1251
  7. python 零散记录(四) 强调字典中的键值唯一性 字典的一些常用方法
  8. 豆约翰博客备份专家博客导出示例(PDF,CHM)
  9. Linux内核学习笔记1——系统调用原理【转】
  10. FireMonkey隐藏任务栏图标