WikiScraper.java

package master.haku.scrape;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*; public class WikiScraper {
public static void main(String[] args) {
scrapeTopic("/wiki/Python");
} public static void scrapeTopic(String url) {
String html = getUrl("https://en.wikipedia.org" + url);
Document doc = Jsoup.parse(html);
String contentText = doc.select("#mw-content-text > p").first().text();
System.out.println(contentText);
} public static String getUrl(String url) {
URL urlObj = null;
try {
urlObj = new URL(url);
} catch (MalformedURLException e) {
System.out.println("The url was malformed!");
return "";
} URLConnection urlCon = null;
BufferedReader in = null;
String outputText = ""; try {
urlCon = urlObj.openConnection();
in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
String line = "";
while ((line = in.readLine()) != null) {
outputText += line;
}
in.close();
} catch (IOException e) {
System.out.println("There was an error connecting to the URL");
return "";
} return outputText;
}
}

运行结果:

A python is a constricting snake belonging to the Python (genus), or, more generally, any snake in the family Pythonidae (containing the Python genus).

最新文章

  1. Android 摇一摇功能的注意事项
  2. python学习之——splinter介绍
  3. JS 获取浏览器和屏幕宽高等信息代码
  4. Golang学习 - strconv 包
  5. 【剑指Offer学习】【面试题19 :二叉树的镜像】
  6. 【Tools】Chrome 控制台不完全指南
  7. 调试器带参数调试(OD,EDB)
  8. 【C#】委托与事件
  9. 于CentOS 6 安装 Wordpress
  10. DS博客作业01--日期抽象数据类型设计与实现
  11. 回顾django内容
  12. javascript获取时间戳
  13. k8s搭建问题(1)--OOMKilled
  14. springMVC数据模型model,modelmap,map,@ModelAttribute的相互关系
  15. 001.KVM介绍
  16. springboot之异步调用@Async
  17. 编程之法section II: 2.2 和为定值的两个数
  18. 个人整理的一些iOS Entitlements
  19. linux防火墙(一)—— iptables架构介绍
  20. IOS 10 微信 ajax readystate=0 status=0 解决方法

热门文章

  1. Unity 2D游戏开发教程之游戏中精灵的跳跃状态
  2. ddms 安卓录制
  3. zend studio10 创建重复project from remote server
  4. [CodeForces-440D]Berland Federalization
  5. spark1.0.0 mllib机器学习库使用初探
  6. 实用在线小工具 -- Google URL Shortener
  7. Codeforces Beta Round #4 (Div. 2 Only) B. Before an Exam dp
  8. HDU 5298 Solid Geometry Homework 暴力
  9. mysqldump之字符集问题解决
  10. 复制到剪切板js代码(转)