I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

最新文章

  1. wex5 实战 HeidiSQL 导入Excel数据
  2. mysql数据库服务日志
  3. JS组件系列——封装自己的JS组件
  4. html5 canvas(小树姐的牛掰到爆了的作品)
  5. [原创]CI持续集成系统环境---部署Jenkins完整记录
  6. tar 实现增量备份
  7. WCF入门(三)---WCF与Web服务/Web Service
  8. Swift—下标-备
  9. Watson Conversation Service Implementation Methodology
  10. Delphi关于TAdvStringGrid控件颜色的设置
  11. 原生js获取left值和top值
  12. JS对json操作的扩展
  13. http无状态
  14. java之servlet学习基础(一)
  15. 【代码笔记】Web-JavaScript-javaScript for循环
  16. [Kubernetes]关于K8s,你应该知道的一些东西
  17. 【未解决】对于使用Windows的IDEA进行编译的文件,但无法在Linux系统中统计代码行数的疑问
  18. java生成pdf文件 --- Table
  19. 在U盘上安装Windows 7的详细步骤
  20. MyEclipse个性化设置

热门文章

  1. 再来一个expect脚本
  2. Visual Studio 2010自动添加头部注释信息
  3. Revit 2017 编程须要用Visual Studio2015 +.NET Framework 4.52
  4. 修改原型给数组对象添加forEach
  5. db2 reorg pending
  6. css横向 弹性盒子布局的一些属性
  7. PAXOS: libevent_paxos
  8. 怎样使用Intent传递对象
  9. Lingo (Spring Remoting) : Passing client credentials to the server
  10. tarjan求强连通分量+缩点 模板