参考:

http://oatest.dragonbravo.com/Authenticate/SignIn?returnUrl=%2f

http://drops.wooyun.org/tips/6313

http://blog.csdn.net/nwpulei/article/details/8457738

http://www.pythonclub.org/project/captcha/python-pil

http://blog.csdn.net/csapr1987/article/details/7728315  创建二维码图片

python验证码识别库安装

1.安装图像处理库PIL,即Python Image Library。

下载地址:http://www.pythonware.com/products/pil/

2. 安装google OCR识别引擎pytesseract

以管理员身份运行命令提示行。

cd C:\Python27\Scripts

pip install pytesseract

单色无干扰验证码识别

对于完全单色没有任何干挠的验证码,识别起来比较容易。代码如下:

import os
import pytesseract
import Image os.chdir('C:\Users\Administrator\Downloads\picture')
image = Image.open('verifycode.jpg')
vcode = pytesseract.image_to_string(image)
print vcode

彩色有干扰验证码识别

  1. 中值过滤去噪。此种类型验证码包含了噪点,所以第一步就是去噪。
  2. 对图像亮度进行加强处理。中值过滤时,不少噪点淡化了,但是如果直接转换为单色,这些噪点又被强化显示了,因此增加这一步对图像亮度进行加强处理。
  3. 转换为单色。即通过二值化,将低于阈值的设置为0,高于阈值的设置为1,从而实现将图片变为黑白色。黑色像素输出1,白色像素输出0。

代码如下:

os.chdir('C:\Users\Administrator\Downloads\picture')
image = Image.open('vcode.gif')
images = image.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(images)
images = enhancer.enhance(2)
images = images.convert('')
images.show()

验证码图像处理脑图如下:

附上pytesseract简介:

Metadata-Version: 1.1
Name: pytesseract
Version: 0.1.6
Summary: Python-tesseract is a python wrapper for google's Tesseract-OCR
Home-page: https://github.com/madmaze/python-tesseract
Author: Matthias Lee
Author-email: pytesseract@madmaze.net
License: GPLv3
Description: Python-tesseract is an optical character recognition (OCR) tool for python.
        That is, it will recognize and "read" the text embedded in images.
       
        Python-tesseract is a wrapper for google's Tesseract-OCR
        ( http://code.google.com/p/tesseract-ocr/ ).  It is also useful as a
        stand-alone invocation script to tesseract, as it can read all image types
        supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff,
        and others, whereas tesseract-ocr by default only supports tiff and bmp.
        Additionally, if used as a script, Python-tesseract will print the recognized
        text in stead of writing it to a file. Support for confidence estimates and
        bounding box data is planned for future releases.
       
       
        USAGE:
        ```
         > try:
         >     import Image
         > except ImportError:
         >     from PIL import Image
         > import pytesseract
         > print(pytesseract.image_to_string(Image.open('test.png')))
         > print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
        ```
       
        INSTALLATION:
       
        Prerequisites:
        * Python-tesseract requires python 2.5 or later or python 3.
        * You will need the Python Imaging Library (PIL).  Under Debian/Ubuntu, this is
          the package "python-imaging" or "python3-imaging" for python3.
        * Install google tesseract-ocr from http://code.google.com/p/tesseract-ocr/ .
          You must be able to invoke the tesseract command as "tesseract". If this
          isn't the case, for example because tesseract isn't in your PATH, you will
          have to change the "tesseract_cmd" variable at the top of 'tesseract.py'.
          Under Debian/Ubuntu you can use the package "tesseract-ocr".
         
        Installing via pip:  
        See the [pytesseract package page](https://pypi.python.org/pypi/pytesseract)  
        ```
        $> sudo pip install pytesseract  
        ```
       
        Installing from source:  
        ```
        $> git clone git@github.com:madmaze/pytesseract.git  
        $> sudo python setup.py install 
        ```
       
        LICENSE:
        Python-tesseract is released under the GPL v3.
       
        CONTRIBUTERS:
        - Originally written by [Samuel Hoffstaetter](https://github.com/hoffstaetter)
        - [Juarez Bochi](https://github.com/jbochi)
        - [Matthias Lee](https://github.com/madmaze)
        - [Lars Kistner](https://github.com/Sr4l)
Keywords: python-tesseract OCR Python
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3

最新文章

  1. c++宏定义命令
  2. 添加右键菜单命令 在此处打开命令窗口(E)(带图标)
  3. WPF自定义控件与样式(8)-ComboBox与自定义多选控件MultComboBox
  4. c#基础之数组
  5. 用fasterjson需要注意的地方
  6. iOS开发拓展篇—封装音频文件播放工具类
  7. Mybatis各种模糊查询
  8. C#核心语法
  9. 分布式环境下的id生成方法
  10. QT窗口渐现效果,窗口震动效果,鼠标移动窗口
  11. linq 总结
  12. overridePendingTransition介绍
  13. Linux显示内存统计最大和最小的详情
  14. 【Spark调优】提交job资源参数调优
  15. Fiddler抓包和修改WebSocket数据,支持wss
  16. ConcurrentLinkedQueue源码解读
  17. AngularJS学习 之 UI以及逻辑生成
  18. NOI经验谈
  19. 使用python type动态创建类
  20. linux 信号处理 一 (基本概念)

热门文章

  1. chapter02 - 03
  2. Hadoop自学系列集(三) ---- Hadoop安装
  3. SQL语句中的as
  4. 【iOS】Error: Error Domain=PBErrorDomain Code=7 "Cannot connect to pasteboard server
  5. 章节十五、5-记录日志---Log4j
  6. Thinkphp5.0快速入门笔记(2)
  7. Windows to Linux API 映射
  8. SpringBoot第二天
  9. 解读 PHP 的 P++提案
  10. Flink Metrics 源码解析