参考'LogisticRegression in MLLib' (http://www.cnblogs.com/luweiseu/p/7809521.html)

通过pySpark MLlib训练logistic模型,再利用Matplotlib作图画出分类边界。

from pyspark.sql import Row
from pyspark.sql import HiveContext
import pyspark
from IPython.display import display
import matplotlib
import matplotlib.pyplot as plt import os
os.environ['SPARK_HOME'] ="C:\\Users\\software\\spark-2.1.0-bin-hadoop2.7" %matplotlib inline sc = pyspark.SparkContext(master='local').getOrCreate()
sqlContext = HiveContext(sc) # get data
irisData = sc.textFile("iris.txt") from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.classification import LogisticRegressionWithLBFGS def toLabeledPoint(line):
linesp = line.split()
return LabeledPoint(int(linesp[2]), Vectors.dense(float(linesp[0]), float(linesp[1]))) data = irisData.map(toLabeledPoint) #Split data into training (60%) and test (40%).
splits = data.randomSplit([0.6, 0.4],seed=11)
training = splits[0].cache()
test = splits[1] trainer = LogisticRegressionWithLBFGS() model = trainer.train(training,intercept=True,numClasses=3) # testdata
def predicTest(lp):
label=lp.label
features=lp.features
prediction = model.predict(features)
return (float(prediction), label)
predictionAndLabels = test.map(predicTest) from pyspark.mllib.evaluation import MulticlassMetrics #accuracy
metrics = MulticlassMetrics(predictionAndLabels)
accuracy = metrics.accuracy
accuracy # plot boundary
import numpy as np ## meshgrid
x0, x1 = np.meshgrid(
np.linspace(0, 8, 500).reshape(-1, 1),
np.linspace(0, 3.5, 200).reshape(-1, 1),
)
X_new = np.c_[x0.ravel(), x1.ravel()] ## predict
y_predict = [model.predict(Vectors.dense(X_new_i)) for X_new_i in X_new] y = data.map(lambda d: d.label).collect()
X = data.map(lambda d: [d.features[0], d.features[1]]).collect() y=np.array(y)
X=np.array(X) ## draw
zz = np.array(y_predict).reshape(x0.shape) plt.figure(figsize=(10, 4))
plt.plot(X[y==2, 0], X[y==2, 1], "g^", label="Iris-Virginica")
plt.plot(X[y==1, 0], X[y==1, 1], "bs", label="Iris-Versicolor")
plt.plot(X[y==0, 0], X[y==0, 1], "yo", label="Iris-Setosa") from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0']) plt.contourf(x0, x1, zz, cmap=custom_cmap, linewidth=5)
# plt.clabel(contour, inline=1, fontsize=12)
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="center left", fontsize=14)
plt.axis([0, 7, 0, 3.5])
plt.show()

最终结果:

最新文章

  1. [LeetCode] Word Pattern II 词语模式之二
  2. 设计模式--组合模式Composite(结构型)
  3. Conversion Operators in OpenCascade
  4. log4Net(写入日志文件)
  5. jQuery入门(2)使用jQuery操作元素的属性与样式
  6. [分享]4412开发板Android教程——Android开发环境搭建
  7. 安装jasperwave出错
  8. GSS6 4487. Can you answer these queries VI splay
  9. HTML中常用的列表标签
  10. 【转】Android C程序也可自己手动用交叉编译器编译 (
  11. 算法录 之 BFS和DFS
  12. 使用SSM框架 搭建属于自己的APP二维码合成、解析、下载
  13. Android调试错误-No resource identifier found for attribute 'showAsAction'
  14. jakarta-taglibs-standard-1.1.0查找下载
  15. java 大文件分割与组装
  16. win10 图标异常 ,重命名后,图标不显示,名字错乱。
  17. 深度学习框架之TensorFlow的概念及安装(ubuntu下基于pip的安装,IDE为Pycharm)
  18. CF1065E Side Transmutations
  19. 【Intel AF 2.1 学习笔记三】
  20. HTML解析HtmlAgility学习

热门文章

  1. Eclipse 安装使用 M2Eclipse 插件
  2. Ubuntu安装R及R包
  3. Null value was assigned to a property of primitive type setter of cn.itcast.oa.domain.Forum.topicCount
  4. 高负载PHP调优
  5. Tomcat优化方案
  6. python里面的数学
  7. 第9章 符合Python风格的对象
  8. n维向量空间W中有子空间U,V,如果dim(U)=r dim(V)=n-r U交V !={0},那么U,V的任意2组基向量的组合必定线性相关
  9. flask中flash不显示
  10. lnmp源码编译安装zabbix