def SplitHtmlTag(file): with open(file,"r") as f,open("result.txt","w+") as c: lines=f.readlines() for line in lines: re_html=re.compile(r'<[^>]+>')#从'<'开始匹配,不是'>'的字符都跳过,直到'>' line=re_html.sub('',line) c.wri
(?<!href="|">)(https?:\/\/[\w\-\.!~?&=+\*\'(),\/]+)((?!\<\/\a\>).)* 这个正则可以匹配文本中以http开头的链接, 但是不会匹配那些已经被A标签包围的链接. 测试文本: Test, Here\'s an interesting in-house litigation position with JPMorgan Chase in New York I thought you might b