爬虫（GET）——爬baidu.com主页

工具：python3

目标：www.baidu.com

工作流程：

1）反爬虫第一步：抓包工具fiddler抓取页面请求信息，得到User-Agent的值，用于重构urllib.request.Request()。

2）爬取数据

3）存储数据　　

# 在python3中，urllib.request等价于urllib2
import urllib.request

# 重构Request（）函数
ua_headers = {

"User-Agent": "Mozilla/5.0 (Windows NT 9.0; WOW32) AppleWebKit/532.36 (KHTML, like Gecko) Chrome/66.0.3359.171 Safari/537.34"
}
request = urllib.request.Request("http://www.baidu.com", headers=ua_headers)

# 发送url地址到指定的服务器，有data参数是post，没有data就是get请求，response接受服务器返回的响应
response = urllib.request.urlopen(request)

# response是一个类文件对象，支持python文件对象的操作方法
html = response.read()

# 我把他们写在了一个文件中，方便读取
f=open("baidu.txt", "w")
f.write(str(html))
f.close()

巴特西

爬虫（GET）——爬baidu.com主页

最新文章

热门文章