from:http://stackoverflow.com/questions/699468/python-html-sanitizer-scrubber-filter 通过下面这个代码就可以把内容过滤成干净的HTML内容,说明,这个代码来自上面Stackoverflow的回答 Use lxml.html.clean! It's VERY easy! from lxml.html.clean import clean_html print clean_html(html) <html> <
Use lxml.html.clean! It's VERY easy! from lxml.html.clean import clean_html print clean_html(html) Suppose the following html: html = '''\ <html> <head> <script type="text/javascript" src="evil-site"></script> &