Beautiful Soup

Beautiful Soup 是用Python写的一个HTML/XML的解析器，它可以很好的处理不规范标记并生成剖析树(parse tree)。它提供简单又常用的导航（navigating），搜索以及修改剖析树的操作。它可以大大节省你的编程时间。对于Ruby，使用Rubyful Soup。

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

# 添加文章，并且过滤文章内容

def add_artical(request, username):

    if request.method == "POST":

        user = request.user

        artical_title = request.POST.get("artical_title")

        artical_content = request.POST.get("artical_content")

        # desc = artical_content[0:150]

        # 解释html标签

        from bs4 import BeautifulSoup

        # html.parser为解析器，是python标准库

        bs = BeautifulSoup(artical_content, "html.parser")

        desc = bs.text[0:150] + "..."

        # 过滤非法标签

        for tag in bs.find_all():

            if tag.name in ["script", "link"]:

                # 将该非法标签从对象中移除

                tag.decompose()

        # 打印结果为"123 <class 'bs4.BeautifulSoup'>"

        print(bs,type(bs))

        try:

            artical_obj = models.Artical.objects.create(user=user, desc=desc, title=artical_title)

            models.ArticalDetail.objects.create(content=str(bs), artical=artical_obj)

        except:

            return HttpResponse("更新文章失败 ")

        return HttpResponse("添加成功")

    return render(request, "add_artical.html")

巴特西

python过滤文件中特殊标签

Beautiful Soup

最新文章

热门文章