







# -*- coding: utf-8 -*-
from numpy import * #引入科学计算包numpy
from os import listdir
import operator #经典python函数库,运算符模块 #算法核心
def classifyO(inX,dataSet,labels,k):
dataSetSize=dataSet.shape[0] #得到数组的行数,即知道有几个训练数据
diffMat=tile(inX,(dataSetSize,1))-dataSet #tile是numpy中的函数,tile将原来的一个数组,扩充成了4个一样的数组;diffMat得到目标与训练数值之间的差值
sqDiffMat=diffMat**2 #各个元素分别平方
distances=sqDistances**0.5 #开方,得到距离
sortedDistIndicies=distances.argsort() #升序排列
for i in range(k):
return sortedClassCount[0][0] def img2vector(filename):
for i in range(32):
for j in range(32):
return returnVect


>>> import kNN
>>> testVector=kNN.img2vector('digits/testDigits/0_13.txt') #根据自己的目录写
>>> testVector[0,0:31]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.])
>>> testVector[0,32:63]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.])


  我们已经将数据处理成分类器可以识别的格式,现在要做的是将这些数据输入到分类器,检查分类器的执行结果。handwritingClassTest()是测试分类器的代码,将其写入kNN.py文件中。在写入之前,保证将from os import listdir写入文件的起始部分。这段代码主要功能是从os模块中导入函数listdir,它可以列出给定目录的文件名。

def handwritingClassTest():
trainingFileList=listdir('E:\\python excise\\digits\\trainingDigits')
for i in range(m):
trainingMat[i,:]=img2vector('digits/trainingDigits/%s' %fileNameStr)
testFileList=listdir('E:/python excise/digits/testDigits')
for i in range(mTest):
print "the classifier came back with:%d,the real answeris:%d" %(classifierResult,classNumStr)
if(classifierResult !=classNumStr):errorCount+=1.0
print "\nthe total number of error is:%d"%errorCount
print "\nthe total error rate is:%f"%(errorCount/float(mTest))

  解释:将E:\\python excise\\digits\\trainingDigits目录中的文件内容存储到列表trainingFileList中,然后可以得到文件中有有多少文件,并将其存储在变量m中。接着,代码创建一个m行1024列的训练矩阵,该矩阵的每行数据存储一个图像。我们可以从文件名中解析出分类数字,该目录下的文件按照规则命名,如文件9_45.txt的分类是9,它是数字9的第45个实例。然后我们可以将类代码存储到hwLabels向量中,使用前面的img2vector函数载入图像。

  下一步中,对E:/python excise/digits/testDigits目录中文件执行相似的操作,不同的是我们并不将这个目录下的文件载入矩阵,而是使用classifyO()函数测试该目录下的每个文件。由于文件中的值已经在0和1之间,所以不用归一化。


>>> kNN.handwritingClassTest()
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:0,the real answeris:0
the classifier came back with:1,the real answeris:1
the classifier came back with:1,the real answeris:1
the classifier came back with:1,the real answeris:1
the classifier came back with:1,the real answeris:1
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9
the classifier came back with:9,the real answeris:9 the total number of error is:11 the total error rate is:0.011628




