原文链接:http://www.one2know.cn/nlp11/

  • gensim.summarization库的函数

    gensim.summarization.summarize(text, ratio=0.2, word_count=None, split=False)

    Parameters(参数):

    text : str

    Given text.

    ratio : float, optional

    Number between 0 and 1 that determines the proportion of the number of

    sentences of the original text to be chosen for the summary.

    word_count : int or None, optional

    Determines how many words will the output contain.

    If both parameters are provided, the ratio will be ignored.

    split : bool, optional

    If True, list of sentences will be returned. Otherwise joined

    strings will bwe returned.
  • 代码
from gensim.summarization import summarize # 基于文本排序的摘要算法
from bs4 import BeautifulSoup # 用于解析HTML文档的BeautifulSoup库
import requests # 用于下载HTTP资源的库
urls = { # 题目:网站 字典
'Deconstructing Voice-over-IP':
'http://scigen.csail.mit.edu/scicache/269/scimakelatex.25977.A.+G.+Hassan.html',
'Exploration of the Location-Identity Split':
'http://scigen.csail.mit.edu/scicache/270/scimakelatex.26087.Ali+Veli.Veli+Ali.Vel+Al.html',
}
# 摘要(真实的):
# 1.The implications of ambimorphic archetypes have been far-reaching and pervasive. After years of natural research into consistent hashing, we argue the simulation of public-private key pairs, which embodies the confirmed principles of theory. Such a hypothesis might seem perverse but is derived from known results. Our focus in this paper is not on whether the well-known knowledge-based algorithm for the emulation of checksums by Herbert Simon runs in Θ( n ) time, but rather on exploring a semantic tool for harnessing telephony (Swale).
# 2.Superblocks must work. Given the current status of homogeneous configurations, security experts particularly desire the simulation of 802.11b. we consider how the Internet can be applied to the refinement of Scheme.
for key in urls.keys():
url = urls[key]
r = requests.get(url)
soup = BeautifulSoup(r.text,'html.parser')
data = soup.get_text() # HTML去标签后的文本
pos1 = data.find('1 Introduction') + len('1 Introduction')
pos2 = data.find('Related Work')
text = data[pos1:pos2].strip() # 提取pos1与pos2之间的引言部分
print('PAPER URL: {}'.format(url))
print('TITLE: {}'.format(key))
print('GENERATED SUMMARY: {}'.format(summarize(text)))
print()

输出:

PAPER URL: http://scigen.csail.mit.edu/scicache/269/scimakelatex.25977.A.+G.+Hassan.html
TITLE: Deconstructing Voice-over-IP
GENERATED SUMMARY: 。。。。。。 PAPER URL: http://scigen.csail.mit.edu/scicache/270/scimakelatex.26087.Ali+Veli.Veli+Ali.Vel+Al.html
TITLE: Exploration of the Location-Identity Split
GENERATED SUMMARY: 。。。。。。

最新文章

  1. Javascript中addEventListener和attachEvent的区别
  2. python--函数式编程 (高阶函数(map , reduce ,filter,sorted),匿名函数(lambda))
  3. runtime笔记一
  4. Gradle学习系列之四——增量式构建
  5. lucene之排序、设置权重、优化、分布式搜索(转)
  6. POJ 2362 Square
  7. Codeforces Round #363
  8. MySQL 5.6 中 TIMESTAMP 的变化
  9. app.listen(3000)与app是不一样的
  10. django-celery提供给顾客使用实例
  11. Openjudge-计算概论(A)-求分数序列和
  12. linux服务器解压缩文件的命令
  13. Unity3D打包 将发布的exe文件打包成一个Windows安装文件(自解压文件)
  14. 韩顺平玩转Oracle视频资料整理
  15. oracle锁表与解表
  16. window mysql安装
  17. Vue-devtools安装步骤
  18. 向comboboxEdit中动态添加数据库中保存的用户自定义单位制的名称
  19. 4.html基础标签:表单
  20. Bootstrap 警告、进度条、列表组、面板

热门文章

  1. 从后端到前端之Vue(五)小试路由
  2. Eclipse 设置黑色主题
  3. 【转】解决eclipse连接不到genymotion的问题
  4. 一道看似简单的go程序的深入分析
  5. 浅谈 ASCII、Unicode、UTF-8,一目了然
  6. php 生成随机字符串,数字,大写字母,小写字母,特殊字符可以随意组合
  7. Android 属性动画实战
  8. 通过Powershell修改文件默认打开方式
  9. Python 使用k-means方法将列表中相似的句子聚为一类
  10. 使用阿里云oss