



[ML] LIBSVM Data: Classification, Regression, and Multi-label

Solver Classes

Among others, the solver can be used for the following classes of problems

    • GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation
    • KMeans
    • Gradient Boosting Machine (GBM) via XGBoost
    • Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition
    • Principal Components Analysis(PCA)

Real time bench mark: https://www.youtube.com/watch?v=LrC3mBNG7WU,速度快二十倍。



hadoop@unsw-ThinkPad-T490:~/NVIDIA_CUDA-.1_Samples/bin/x86_64/linux/release$ nvidia-smi
Thu Nov ::
| NVIDIA-SMI 440.31 Driver Version: 440.31 CUDA Version: 10.2 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| GeForce MX250 Off | :3C:00.0 Off | N/A |
| N/A 58C P0 N/A / N/A | 390MiB / 2002MiB | % Default |
+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| G /usr/lib/xorg/Xorg 190MiB |
| G /usr/bin/gnome-shell 136MiB |
| G ...uest-channel-token= 59MiB |



import os
import time
from sklearn.linear_model import MultiTaskLasso, Lasso
from sklearn.datasets import load_svmlight_file
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error import h2o4gpu
import h2o4gpu.util.import_data as io
import h2o4gpu.util.metrics as metrics
import pandas as pd
import numpy as np #from joblib import Memory
#mem = Memory("./mycache") # This maybe a tricky way to load files.
def get_data():
data = load_svmlight_file("/home/hadoop/YearPredictionMSD")
return data[0], data[1] print("Loading data.")
train_x, train_y = load_svmlight_file("/home/hadoop/YearPredictionMSD")
train_x = train_x.todense() test_x, test_y = load_svmlight_file("/home/hadoop/YearPredictionMSD.t")
test_x = test_x.todense() for max_iter in [100, 500, 1000, 2000, 4000, 8000]:
print("Setting up solver, msx_iter is {}".format(max_iter))
model = h2o4gpu.Lasso(alpha=0.01, fit_intercept=False, max_iter=max_iter)
#model = Lasso(alpha=0.1, fit_intercept=False, max_iter=500) time_start=time.time()
model.fit(train_x, train_y)
print('train totally cost {} sec'.format(time_end-time_start)) time_start=time.time()
y_pred_lasso = model.predict(test_x)
y_pred_lasso = np.squeeze(y_pred_lasso)
print('test totally cost {} sec'.format(time_end-time_start)) print(y_pred_lasso.shape )
print(test_y.shape ) print(y_pred_lasso[:10])
print(test_y[:10]) mse = mean_squared_error(test_y, y_pred_lasso)
print("mse on test data : %f" % mse)
r2_score_lasso = r2_score(test_y, y_pred_lasso)
print("r^2 on test data : %f" % r2_score_lasso)



