使用python语言 学习k近邻分类器的api

欢迎来到我的git查看源代码: https://github.com/linyi0604/MachineLearning

 from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report '''
k近邻分类器
通过数据的分布对预测数据做出决策
属于无参数估计的一种
非常高的计算复杂度和内存消耗
''' '''
1 准备数据
'''
# 读取鸢尾花数据集
iris = load_iris()
# 检查数据规模
# print(iris.data.shape) # (150, 4)
# 查看数据说明
# print(iris.DESCR)
'''
Iris Plants Database
==================== Notes
-----
Data Set Characteristics:
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica
:Summary Statistics: ============== ==== ==== ======= ===== ====================
Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ==================== :Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988 This is a copy of UCI ML iris datasets.
http://archive.ics.uci.edu/ml/datasets/Iris The famous Iris database, first used by Sir R.A Fisher This is perhaps the best known database to be found in the
pattern recognition literature. Fisher's paper is a classic in the field and
is referenced frequently to this day. (See Duda & Hart, for example.) The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other. References
----------
- Fisher,R.A. "The use of multiple measurements in taxonomic problems"
Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
Mathematical Statistics" (John Wiley, NY, 1950).
- Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
Structure and Classification Rule for Recognition in Partially Exposed
Environments". IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.
- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
on Information Theory, May 1972, 431-433.
- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
conceptual clustering system finds 3 classes in the data.
- Many, many more ... 共有150个数据样本
均匀分布在3个亚种上
每个样本采样4个花瓣、花萼的形状描述
''' '''
2 划分训练集合和测试集合
'''
x_train, x_test, y_train, y_test = train_test_split(iris.data,
iris.target,
test_size=0.25,
random_state=33) '''
3 k近邻分类器 学习模型和预测
'''
# 训练数据和测试数据进行标准化
ss = StandardScaler()
x_train = ss.fit_transform(x_train)
x_test = ss.transform(x_test) # 建立一个k近邻模型对象
knc = KNeighborsClassifier()
# 输入训练数据进行学习建模
knc.fit(x_train, y_train)
# 对测试数据进行预测
y_predict = knc.predict(x_test) '''
4 模型评估
'''
print("准确率:", knc.score(x_test, y_test))
print("其他指标:\n", classification_report(y_test, y_predict, target_names=iris.target_names))
'''
准确率: 0.8947368421052632
其他指标:
precision recall f1-score support setosa 1.00 1.00 1.00 8
versicolor 0.73 1.00 0.85 11
virginica 1.00 0.79 0.88 19 avg / total 0.92 0.89 0.90 38
'''

最新文章

  1. Spring源码解析 - AntPathMatcher
  2. 数据库事务中的隔离级别和锁+spring Transactional注解
  3. ios9和ios10的新特性
  4. C\C++ 框架和库整理(转)
  5. Go语言学习笔记一(语法篇)
  6. HTTP协议介绍(SERVLET)
  7. 连接mysql数据库2+操作入门
  8. AOP:代理思想 (没有考虑到Spring)
  9. MYSQL updatexml报错注入
  10. java.net.UnknownHostException
  11. BZOJ 3613: [Heoi2014]南园满地堆轻絮(二分)
  12. socket(套接字)初使用
  13. 关于ijkplayer下载的demo不能运行,这是因为FFmpeg
  14. flask 连接MogoDB数据库
  15. python信号量
  16. deepin云打印实现连接Windows打印机
  17. mpvue 使用echarts动态绘制图表(数据改变重新渲染图表)
  18. Python之模块(一)
  19. linuxI/O重定向
  20. Android运行时权限开启问题

热门文章

  1. 20165320 预备作业3 :Linux安装及命令入门
  2. MySQL实现强制查询走索引和强制查询不缓存
  3. collision
  4. 事件,使用.net自带委托EventHandler
  5. Owin WebApi版本控制
  6. MongoDB-MongoDB重装系统后恢复
  7. vim 中替换命令
  8. 一个文件系统过滤驱动的demo
  9. Python的简单语法(一)
  10. Sourcetree使用 - git图形化工具(三)