1:下载一首英文的歌词或文章

love story-taylor swift
we were both young when i first saw you
i close my eyes and the flashback starts
i'm standing there on a balcony in summer air
see the lights, see the party, the ball gowns
see you make your way through the crowd
and say hello, little did i know
that you were romeo, you were throwing pebbles
and my daddy said stay away from juliet
and i was crying on the staircase, begging you please don't go
and i said
romeo take me somewhere we can be alone
i'll be waiting, all there's left to do is run
you'll be the prince and i'll be this princess
it's a love story
baby, just say yes
so i sneak out to the garden to see you
we keep quiet 'cause we're dead if they knew
so close your eyes, escape this town for a little while
oh, oh, oh
'cause you were romeo, i was a scarlet letter
and my daddy said stay away from juliet
but you were everything to me, i was begging you please don't go
and i said
romeo take me somewhere we can be alone
i'll be waiting, all there's left to do is run
you'll be the prince and i'll be the princess
it's a love story
baby, just say yes
romeo save me try to tell me how it feels
this might be stupid boy, but its so real
don't be afraid now we'll get out of this mess
it's a love story
baby, just say yes
i got tired of waiting wondering if you were ever coming around
my faith in you is better
when i met you on the outskirts of town
and i said
romeo save me ive been feeling so alone
ill keep waiting for you but you never come
is this in my head, i don't know what to think
he fell to the ground and pulled out a ring
and said
marry me juliet you'll never have to be alone
i love you and that's all i really know
i talked to your dad you'll pick out a white dress
it's a love story
baby, just say yes
oh, oh, oh
we were both young when i first saw you

2:将所有,.?!’:等分隔符全部替换为空格

sep=''';,.?!'''for i in sep:

    str=str.replace(i,' ')

3.将所有大写转换为小写
str=str.lower()

4:生成单词列表
   str_list=str.split()

5:
str_list=str.split()
print(str_list) str_dict={}
for i in str_list:
str_dict[i]=str_dict.get(i,0)+1
#去掉不要的单词
for w in str:
del (str_dict)
print(w,str_dict[w])
6:排序
strList=list(str_dict.items())
strList.sort(key=lambda x:x[1] ,reverse=True)
7:排除语法型词汇,代词、冠词、连词
exclude={'the','top','is','while','when','why'}
for i in exclude:
del(str_dict) 8:输出词频最大TOP20
for i in range(20):
print(strList[i]) 9:将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。
file=open('shuihuzhuan.txt','r',encoding='utf-8')
myarticle=file.read()

二、中文词频统计,下载一长篇中文文章。

代码如下:

import jieba
file=open("hh.txt","r",encoding='utf-8')
mynotes=file.read()
file.close(); sep = ''':。,?!;∶ ...“”'''
for i in sep:
mynotes = mynotes.replace(i, ' '); mynotes_list = list(jieba.cut(mynotes)); exclude =[' ','\n','你','我','他','和','但','了','的','来','是','去','在','上','高'] mynotes_dict={}
for w in mynotes_list:
mynotes_dict[w] = mynotes_dict.get(w,0)+1

//取出指定内容
for w in exclude:
del (mynotes_dict[w]); for w in mynotes_dict:
print(w,mynotes_dict[w]) //排序
dictList = list(mynotes_dict.items())
dictList.sort(key=lambda x:x[1],reverse=True);
print(dictList) //输出20的文本内容
for i in range(20):
print(dictList[i])

//把频率多于20的输出到文本
outfile = open("mytop20.txt","a")
for i in range(20):
outfile.write(dictList[i][0]+" "+str(dictList[i][1])+"\n")
outfile.close();

  

最新文章

  1. Python之美--Decorator深入详解
  2. centos6.4下安装php7+nginx+mariadb环境
  3. IIS7 全新管理工具AppCmd.exe的命令使用实例分享
  4. datePiker弹出框被其他div遮挡
  5. java获取获得Timestamp类型的当前系统时间
  6. “我爱淘”第二冲刺阶段Scrum站立会议2
  7. struts2学习笔记(2)——简单的输入验证以及标签库的运用
  8. PuTTY?Bash?Out了!!!终端应该这么玩~
  9. hdu2492 Ping pong
  10. 如何理解java的引用传递
  11. CentOS 6.4 安装setuptools 和 pip
  12. Python 基于Python结合pykafka实现kafka生产及消费速率&主题分区偏移实时监控
  13. WinForm登录验证
  14. js打印html指定元素,解决动态获取的图片无法打印问题
  15. maven源码打包
  16. Docker 与 虚拟机比较
  17. SD从零开始71 业务信息仓库(BW)
  18. Spark2.3(三十五)Spark Structured Streaming源代码剖析(从CSDN和Github中看到别人分析的源代码的文章值得收藏)
  19. nginx 长连接keeplive
  20. IIS7.5和IIS8如何设置FTP的pasv端口范围

热门文章

  1. let 、const 、var、function声明关键字的新理解
  2. Python模块之pexpect
  3. flask入门 七行代码讲解
  4. Linux--shell数组和字符串--09
  5. C++ 重载运算符(详)
  6. DNS域名解析服务及其配置
  7. SQL奇技淫巧(01):给查出的数据排序编个号【row_number() over(order by c)】(mysql,db2,oracle,sqlserver通用)
  8. Hadoop RPC机制详解
  9. js获取Cookie,获取url参数
  10. unity shader 纹理&透明效果