【数据挖掘】分类之kNN(转载)
2024-10-04 17:04:42
1.算法简介
kNN的思想很简单:计算待分类的数据点与训练集所有样本点,取距离最近的k个样本;统计这k个样本的类别数量;根据多数表决方案,取数量最多的那一类作为待测样本的类别。距离度量可采用Euclidean distance,Manhattan distance和cosine。
import numpy as np
import scipy.spatial.distance as ssd def read_data(fn):
""" read dataset and separate into characteristics data
and label data
""" # read dataset file
with open(fn) as f:
raw_data = np.loadtxt(f, delimiter= ',', dtype="float",
skiprows=1, usecols=None) #initialize
charac=[]; label=[]
#obtain input characrisitics and label
for row in raw_data:
charac.append(row[:-1])
label.append(int (row[-1]))
return np.array(charac),np.array(label) def knn(k,dtrain,dtest,dtr_label):
"""k-nearest neighbors algorithm""" pred_label=[]
#for each instance in test dataset, calculate
#distance in respect to train dataset
for di in dtest:
distances=[]
for ij,dj in enumerate(dtrain):
distances.append((ssd.euclidean(di,dj),ij)) #sort the distances to get k-neighbors
k_nn=sorted(distances)[:k] #classify accroding to the maxmium label
dlabel=[]
for dis,idtr in k_nn:
dlabel.append(dtr_label[idtr])
pred_label.append(np.argmax(np.bincount(dlabel))) return pred_label def evaluate(result):
"""evaluate the predicited label""" eval_result=np.zeros(2,int)
for x in result:
#pred_label==dte_label
if x==0:
eval_result[0]+=1
#pred_label!=dte_label
else:
eval_result[1]+=1 return eval_result dtrain,dtr_label=read_data('iris-train.csv')
dtest,dte_label=read_data('iris-test.csv') K=[1,3,7,11] print "knn classification result for iris data set:\n"
print "k | number of correct/wrong classified test records" for k in K:
pred_label=knn(k,dtrain,dtest,dtr_label)
eval_result=evaluate(pred_label-dte_label) #print the evaluted result into screen
print k," | ", eval_result[0], "/", eval_result[1] print
2. Referrence
[1] M. Saad Nurul Ishlah, Python: Simple K Nearest Neighbours Classifier.
最新文章
- laravel5 数据库配置(MySQL)
- [Spring MVC] - Interceptor 拦截器
- Remote Debugging Chrome 结合Genymotion模拟器的移动端web真机调试(转)
- 以一个上传文件的例子来说 DistributedFileSystem
- HTML头部<;head>;学习
- 异常Address already in use: JVM_Bind的处理
- 【Java 它 JVM】对象的创建过程
- C# 编写简易 ASP.NET Web 服务器
- 使用javascript解一道关于会议日程安排的面试题
- Spring mvc之 发邮件(qq.163...)
- SpringMVC源码情操陶冶-AbstractUrlHandlerMapping
- 窗口函数解决数据岛问题(mysql暂无窗口函数,可以通过用户变量解决窗口函数问题)
- linux内核中断之看门狗
- tongweb安装后无法启动问题
- set non-stop on
- 作业-JSP简单入门
- Qt 编程指南 8 显示静态小图片和动态大图片
- PHP独立环境搭建细节
- ASP.NET中母版页引用外部js或css文件无效,提示对象未定义解决方法
- 利用SimpleDateFormat进行时间的跨时区转换 - Java