BS4爬虫实例应用-CISP
2024-08-29 20:39:04
爬取目前在官网可查询的CISP证书编号以及有效期并入库
也算是暴力破解,burp使用grep功能呢也可以实现。
下面是python的代码
#coding=utf-8
import requests
import sys
from bs4 import BeautifulSoup
#demourl='http://www.itsec.gov.cn/export/sites/itsec/person/peregester/CNITSEC2012CISE01098/'
counter = 1
for i in range(2000,2017):
for t in ['CISE','CISA','CISO','CISM','CISE-E','CISO-E','CISM-E','CISA-E','CISP-Auditor']:
for j in range(10000):
SNum = "CNITSEC"+str(i)+t+""+str(j).zfill(4)
url = "http://www.itsec.gov.cn/export/sites/itsec/person/peregester/%s/"% SNum
print counter , SNum ,' Checking .........'
try:
res = requests.get(url)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text,'html.parser')
clength = res.headers['content-length'] if 200<= int(res.status_code) <=210 :
itsecid = soup.select('.detail_title')[0].text.encode('gb2312','ignore').strip()
starttime = soup.select('.tdm')[0].text.encode('utf-8','ignore').strip().replace("\n","").replace(" ","")
endtime = soup.select('.tdm')[1].text.encode('utf-8','ignore').strip().replace("\n","").replace(" ","")
username = soup.select('.tdm')[2].text.encode('utf-8','ignore').strip()
authlevel = soup.select('.tdm')[3].text.encode('utf-8','ignore').strip()
print clength
print itsecid
print starttime
print endtime
print username
print authlevel
with open('cispall.txt','a') as f:
f.writelines("%s%s%s%s%s %s\n"%(itsecid,starttime,endtime,username,authlevel,clength))
else:
print SNum ,'Non-existent ########'
counter+=1
except:
info=sys.exc_info()
print 'except error'
print info[0],":",info[1]
过程:
根据分割特点可入库存储
最新文章
- Plant Design Review Based on AnyCAD
- 相邻div实现一个跟着另一个自适应高度示例代码
- 破解入门【OllyDebug爆破程序】
- 基于Httpfs访问HDFS的C++实现
- sizeof _countof _tcslen的比较
- apache配置文件中的项目
- 海量路由表能够使用HASH表存储吗-HASH查找和TRIE树查找
- java之内存可见型
- 双卡双待支持双电池 夏新N808深度评测_夏新手机评测-泡泡网
- Dojo初探之3:dojo的DOM操作、query操作和domConstruct元素位置操作(基于dojo1.11.2版本)
- python-装饰器简述
- EBS DBA指南笔记(一)
- NLP入门(二)探究TF-IDF的原理
- 刷题之路第三题--Longest Substring Without Repeating Characters
- tensorflow(4):神经网络框架总结
- word表分页表头
- FasterRCNN 提升分类精度(转)
- 【转】web.xml不同版本的头
- 使用 Bulk Copy 将大量数据复制到数据库
- Python模块之pxssh