使用Sklearn-train_test_split 划分数据集

使用sklearn.model_selection.train_test_split可以在数据集上随机划分出一定比例的训练集和测试集

1.使用形式为：

 from sklearn.model_selection import train_test_split

 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

2.参数解释：

train_data：样本特征集

train_target：样本的标签集

test_size：样本占比，测试集占数据集的比重，如果是整数的话就是样本的数量

random_state：是随机数的种子。在同一份数据集上，相同的种子产生相同的结果，不同的种子产生不同的划分结果

X_train,y_train:构成了训练集

X_test,y_test：构成了测试集

3.举例：

生成一个包含100个样本的数据集，随机换分出20%为测试集

 #py36

 #!/usr/bin/env python

 # -*- coding: utf-8 -*-

 #from sklearn.cross_validation import train_test_split

 from sklearn.model_selection import train_test_split 

 # 生成100条数据：100个2维的特征向量，对应100个标签

 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50

 y = [1] * 50 + [2] * 50

 # 随机抽取20%的测试集

 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)

 print ("train:",len(X_train), "test:",len(X_test))

 # 查看被划分出的测试集

 for i in range(len(X_test)):

     print ("".join(X_test[i]), y_test[i])

 '''

 train: 80 test: 20

 feature two  2

 feature two  2

 feature one  1

 feature two  2

 feature two  2

 feature one  1

 feature one  1

 feature two  2

 feature two  2

 feature two  2

 feature two  2

 feature one  1

 feature two  2

 feature two  2

 feature two  2

 feature one  1

 feature one  1

 feature one  1

 feature two  2

 feature one  1

 '''

巴特西

使用Sklearn-train_test_split 划分数据集

最新文章

热门文章