python爬虫挂代理
2024-10-09 08:02:32
以下是GET的方法,使用的代理接口网站是 http://www.xicidaili.com/nn/
#-*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import requests,chardet,urllib2 ip_list=[]
def get_ip_list(url, headers):
web_data = requests.get(url, headers=headers)
soup = BeautifulSoup(web_data.text, 'lxml')
ips = soup.find_all('tr')
ip_list = []
for i in range(1, len(ips)):
ip_info = ips[i]
tds = ip_info.find_all('td')
ip_list.append('http://' + tds[1].text + ':' + tds[2].text)
return ip_list def get_random_ip(ip_list):
proxies = {'http': ip_list[0]}
return proxies def getip():
global ip_list
url = 'http://www.xicidaili.com/nn/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}
if not ip_list:
ip_list = get_ip_list(url, headers=headers)
print ip_list
proxies = get_random_ip(ip_list)
return proxies def deleteip():
global ip_list
ip_list.pop(0) def urllink(link): # 网页HTML获取以及编码转换
for i in range(12) :
try:
ip = getip()
print ip
proxy_support = urllib2.ProxyHandler(ip)
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html_1 = urllib2.urlopen(link, timeout=10).read()
break
except Exception,e:
deleteip()
print '错误',i,e
pass
if i==11:
return ''
encoding_dict = chardet.detect(html_1)
web_encoding = encoding_dict['encoding']
if web_encoding == 'utf-8' or web_encoding == 'UTF-8':
html = html_1
else:
html = html_1.decode('gbk', 'ignore').encode('utf-8')
return html print urllink("http://ccdas.ipmph.com/pc/clinicalExam/getClinicalExamDetail?articleId=8165")
最新文章
- VS2015 自动添加头部注释
- define与typedef 区别
- CSS重置代码和常用公共代码
- linux笔记:linux帮助命令,man,help,whatis,apropos
- Java中数字操作
- char数组与char指针
- sqlserver优化查询
- hdu3652B-number
- DDD实战进阶第一波(六):开发一般业务的大健康行业直销系统(实现产品上下文仓储与应用服务层)
- jsp中【<;%=request.getContextPath()%>;】项目路径
- 百度地图API示例:使用vue添加删除覆盖物
- vue和jQuery嵌套实现异步ajax通信
- .NET, ASP.NET, ADO.NET, C# 区别
- Python 人工智能之人脸识别 face_recognition 模块安装
- MVC 2nd
- WopiServerTutorial
- valgrind: failed to start tool &#39;memcheck&#39; for platform &#39;amd64-linux&#39;: No such file or directory
- JSONObject以及json(转)
- 运维监控之zabbix(yum安装)
- HTTP 两种基本请求方法 GET和 POST的区别
热门文章
- 怎样从外网访问内网CouchDB数据库
- java ==与equals()方法的总结
- pyCharm的第一个项目
- 【题解】Luogu P2730 魔板
- hashmap相关面试题
- Docker 部署 elk + filebeat
- ORA-12805: parallel query server died unexpectedly ORA-04030 (sort subheap,sort key) 原因排查与解决方法
- Dockerfile构建容器---构建本地tomcat
- 剑指offer(44)单词翻转序列
- Linux——入门命令