ML Boot Camp V Mail.Ru. - . .
.
ML Boot Camp III - 2017 - . 5 kaggle . , , 3 .
100.000 . , , , , , .
, , . , .
50 30+ , 16020, . .
python :
Csv vs pickle
csv . pickle 2 . :
with gzip.open('../run/local/pred_1.pickle.gz', 'wb') as f:
pickle.dump((x, y), f)
,
github. old/ , , , . - , .
2
2 , . . - , , . , .
2
2 . .
: -> -> 1 -> 2 . , , , . . , , , . , .
, . , .
2 1 . - , . . ( ), . , .
, .
Random subspace method, , random . ( * (2 ^ )). , , , 2 .
, - . - . , .
, . (2) xgboost. .
:
- 0.0001, , .
- : 0, 1. .
- .
- ( .2) , .
- ( .3) , , .
- , NaN.
. , , - .. .
, . . , .
:
- , , x. / ( ). .
- , .1, . . .
- , , . (ord()). -1.
- , .3, (one-hot encoding).
- .4, PCA kaggle mercedes, .
- . , 10 , . 10 9 (- ). , , .
- .6, .2.
- .7, 5.
- .7, 3.
- k-means, 2, 5, 10, 15, 25. .
- .10, , 3.
( ), , . . , . - , . , - . .
. . , .
. . 3 :
- , , ;
- , ;
- , , ;
, python. . 2 , , .
, 0 1 . logloss- , 0 1 1e-5. np.clip(z, 1e-5, 1-1e-5) . , 0.1-0.93.
hyperopt
hyperopt (
). , 20. 2
hyperopt bootstrapping 20 ,
. .
1
1 0 . . .
, 1 - . 2 :
- (keras)
- (XGBoost, LightGBM, rf, et)
. hyperopt.
- , . 64-64
leaky relu 1-5 , .
:
- ;
- ( 256);
- - , ( 0.7, , ); nan- batch normalization ;
- - (64-128);
- ;
- - (16);
- ;
- 1 .
. , , 2 .
(- 0 ), ReLU (- , , 0, ) - .
Parametric Relu,
Scaled Exponential Linear Units. - .
,
KFold sklearn. , .
, , . , . callback- keras , learning rate .
, ( ) learning rate - . .
, callback- . callback- . learning rate , , , callback-.
,
2 bagging
random forest extra trees 2 XGBoost LightGBM. - , - , , . LightGBM XGBoost .
( 3) . 2 . 1 .
LightGBM XGBoost , . 10000 . . random forest extra trees sklearn , hyperopt, , , . , , .
. 1 . . , , 1 , . 2 .
2
, 1 1 4 , 2 190 . . 2 1 ( 2 ).
2 , , .
2 . , .
, , , . 2 .
BayesianRidge,
Ridge . 20 .
hyperopt , - BayesianRidge Ridge sklearn, BayesianRidge Ridge.
10 . cv 0.534-0.535 0.543-0.544, . , 30 . 30 10 .
0.535-0.536, 0.543 . 3 30 30 0.7 0.3 . 30 cv . random_state. 0.537.
, , , . 2 0.543 0.538 . , 12 7 3 , .
https://habrahabr.ru/post/335188/