Skip to content

Commit e7d64da

Browse files
committed
v0.4.0
1 parent 4909830 commit e7d64da

14 files changed

+454
-45
lines changed

.travis.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,14 @@
22
# check if .travis.yml is valid: http://lint.travis-ci.org/
33
# to skip build for given commit put [ci skip] or [skip ci] in commit message
44

5+
# required for Python >= 3.7
6+
dist: xenial
7+
58
language: python
69

7-
# versions supported by scikit-learn
10+
# versions supported by scikit-learn and some additional versions
811
python:
12+
- "3.7"
913
- "3.6"
1014
- "3.5"
1115
- "3.4"
@@ -16,6 +20,7 @@ branches:
1620
only:
1721
- master
1822
- dev
23+
- py2
1924

2025
install:
2126
- pip install numpy

CHANGELOG.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Changelog
2+
3+
### v0.4.0 -- August 12, 2019
4+
5+
Since v0.4.0 vecstack provides official support for Python 3.5 and higher only,
6+
but still there is unofficial support for Python 2.7 and Python 3.4.
7+
Please see [details](https://github.com/vecxoz/vecstack/blob/master/PY2.md).
8+
9+
Scikit-learn API:
10+
* Fixed #31. `sklearn.externals.six` deprecation
11+
* Fixed #29. Out-of-memory in `np.random.choice` for very large ranges
12+
13+
Functional API:
14+
* Feature #18. Added support for N-dimensional input. Useful for convolutional nets.
15+
* Added aliases for `mode` parameter values which correspond to respective `variant` parameter values of `StackingTransformer`:
16+
* 'oof_pred_bag' == 'A'
17+
* 'oof_pred' == 'B'
18+
19+
### v0.3.0 -- April 6, 2018
20+
21+
Introducing Scikit-learn API: `StackingTransformer`
22+
23+
* Standard transformer class with `fit` and `transform` methods
24+
* Compatible with `Pipeline` and `FeatureUnion`
25+
26+
### v0.2.2 -- February 23, 2018
27+
28+
* Fixed #5. Wrong behavior during sparse matrix processing
29+
* Improved input data validation
30+
* Improved sparse matrix processing
31+
32+
### v0.2.1 -- January 24, 2018 -- Maintenance release
33+
34+
* Minor modifications
35+
36+
### v0.2 -- January 23, 2018
37+
38+
New features:
39+
40+
* Classification with probabilities
41+
* Modes: compute only what you need (only OOF, only predictions, both, etc.)
42+
* Save resulting arrays and log with model parameters
43+
44+
### v0.1 -- November 22, 2016 -- Initial release
45+
46+
Features:
47+
48+
* Functional stacking API
49+
* Regression
50+
* Classification with class labels
51+
* Ordinary and stratified k-fold split
52+
* User-defined metric
53+
* User-defined transformations for target and prediction

LICENSE.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
MIT License
22

33
Vecstack. Python package for stacking (machine learning technique)
4-
Copyright (c) 2016-2018 Igor Ivanov
4+
Copyright (c) 2016-2019 Igor Ivanov
55
66

77
Permission is hereby granted, free of charge, to any person obtaining a copy

PY2.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
### Python 3.x
2+
3+
Since v0.4.0 vecstack provides official support for Python 3.5 and higher only,
4+
but still there is unofficial support for Python 2.7 and Python 3.4. See details below.
5+
6+
The reason for these changes is global movement in Python 3.x direction.
7+
Vecstack depends on scikit-learn which has already stopped support for Python < 3.5.
8+
Scikit-learn v0.20.x is the last version supporting Python 2.7 and Python 3.4.
9+
Vecstack follows this direction as well.
10+
Please see [python3statement.org](https://python3statement.org/) for more details.
11+
12+
### Unofficial support for Python 2.7 and Python 3.4
13+
14+
You can still install and run latest vecstack on Python 2.7 and Python 3.4.
15+
NOTE. It will require legacy versions of the following packages:
16+
* numpy<1.17
17+
* scipy<1.3
18+
* scikit-learn>=0.18,<0.21
19+
There is a dedicated branch on GitHub called `py2` with appropriate requirements in `setup.py`.
20+
Installation:
21+
22+
`pip install https://github.com/vecxoz/vecstack/archive/py2.zip`

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ Convenient way to automate OOF computation, prediction and bagging using any num
2222
* Predict [class labels or probabilities](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L119) in classification task
2323
* Apply any [user-defined metric](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L124)
2424
* Apply any [user-defined transformations](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L87) for target and prediction
25-
* Python 2, Python 3
25+
* Python 3.5 and higher, [unofficial support for Python 2.7 and 3.4](https://github.com/vecxoz/vecstack/blob/master/PY2.md)
2626
* Win, Linux, Mac
2727
* [MIT license](https://github.com/vecxoz/vecstack/blob/master/LICENSE.txt)
28-
* Depends on **numpy**, **scipy**, **scikit-learn>=18.0**
28+
* Depends on **numpy**, **scipy**, **scikit-learn>=0.18**
2929

3030
# Get started
3131
* [FAQ](https://github.com/vecxoz/vecstack#stacking-faq)
@@ -292,14 +292,15 @@ Stacking API comparison:
292292
| Estimator implementation restrictions | Must have only `fit` and `predict` (`predict_proba`) methods | Must be fully scikit-learn compatible |
293293
| `NaN` and `inf` in input data | Allowed | Not allowed |
294294
| Can automatically save OOF and log in files | Yes | No |
295+
| Input dimensionality (`X_train`, `X_test`) | Arbitrary | 2-D |
295296

296297
### 21. How do parameters of `stacking` function and `StackingTransformer` correspond?
297298

298-
| **stacking function** | **StackingTransformer** |
299-
|-------------------------|-----------------------------------|
300-
| `models=[Ridge()]` | `estimators=[('ridge', Ridge())]` |
301-
| `mode='oof_pred_bag'` | `variant='A'` |
302-
| `mode='oof_pred'` | `variant='B'` |
299+
| **stacking function** | **StackingTransformer** |
300+
|---------------------------------------|-----------------------------------|
301+
| `models=[Ridge()]` | `estimators=[('ridge', Ridge())]` |
302+
| `mode='oof_pred_bag'` (alias `'A'`) | `variant='A'` |
303+
| `mode='oof_pred'` (alias `'B'`) | `variant='B'` |
303304

304305
### 22. Why Scikit-learn API was implemented as transformer and not predictor?
305306

examples/04_sklearn_api_regression_pipeline.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -512,7 +512,7 @@
512512
"source": [
513513
"# 2. Pipeline\n",
514514
"\n",
515-
"StackingTransformer is fully scikit-learn compatible so we can easily implement **arbitrary number of stacking layers** using Pipeline\n"
515+
"StackingTransformer is fully scikit-learn compatible so we can easily implement **arbitrary number of stacking levels** using Pipeline\n"
516516
]
517517
},
518518
{
@@ -535,7 +535,7 @@
535535
"metadata": {},
536536
"outputs": [],
537537
"source": [
538-
"# If we have several stacking layers our Pipeline steps would be:\n",
538+
"# If we have several stacking levels our Pipeline steps would be:\n",
539539
"# steps = [('stack_L1', stack_L1),\n",
540540
"# ('stack_L2', stack_L2),\n",
541541
"# ('stack_L99', stack_L99), # :-)\n",

setup.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,26 @@
22

33
from setuptools import setup
44

5+
long_desc = '''
6+
Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API.
7+
Convenient way to automate OOF computation, prediction and bagging using any number of models.
8+
'''
9+
510
setup(name='vecstack',
6-
version='0.3.0',
11+
version='0.4.0',
712
description='Python package for stacking (machine learning technique)',
8-
long_description='Convenient way to automate OOF computation, prediction and bagging using any number of models',
13+
long_description=long_desc,
914
classifiers=[
1015
'License :: OSI Approved :: MIT License',
1116
'Operating System :: MacOS',
1217
'Operating System :: Microsoft :: Windows',
1318
'Operating System :: POSIX',
1419
'Operating System :: Unix',
1520
'Programming Language :: Python',
16-
'Programming Language :: Python :: 2',
17-
'Programming Language :: Python :: 2.7',
1821
'Programming Language :: Python :: 3',
19-
'Programming Language :: Python :: 3.4',
2022
'Programming Language :: Python :: 3.5',
2123
'Programming Language :: Python :: 3.6',
24+
'Programming Language :: Python :: 3.7',
2225
'Topic :: Scientific/Engineering',
2326
'Topic :: Scientific/Engineering :: Artificial Intelligence',
2427
'Topic :: Scientific/Engineering :: Information Analysis',

tests/test_func_api_classification_binary.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,33 @@
6363
y_test = y[ind_test]
6464

6565

66+
# Create 4-dim data
67+
np.random.seed(42)
68+
X_train_4d = np.random.normal(size=(400, 8, 8, 3))
69+
X_test_4d = np.random.normal(size=(100, 8, 8, 3))
70+
y_train_4d = np.random.randint(n_classes, size=400)
71+
72+
# Reshape 4-dim to 2-dim
73+
X_train_4d_unrolled = X_train_4d.reshape(X_train_4d.shape[0], -1)
74+
X_test_4d_unrolled = X_test_4d.reshape(X_test_4d.shape[0], -1)
75+
76+
#------------------------------------------------------------------------------
77+
#------------------------------------------------------------------------------
78+
79+
class LogisticRegressionUnrolled(LogisticRegression):
80+
"""
81+
For tests related to N-dim input.
82+
Estimator accepts N-dim array and reshape it to 2-dim array
83+
"""
84+
def fit(self, X, y):
85+
return super(LogisticRegressionUnrolled, self).fit(X.reshape(X.shape[0], -1), y)
86+
87+
def predict(self, X):
88+
return super(LogisticRegressionUnrolled, self).predict(X.reshape(X.shape[0], -1))
89+
90+
def predict_proba(self, X):
91+
return super(LogisticRegressionUnrolled, self).predict_proba(X.reshape(X.shape[0], -1))
92+
6693
#-------------------------------------------------------------------------------
6794
#-------------------------------------------------------------------------------
6895

@@ -775,7 +802,48 @@ def test_oof_pred_bag_mode_proba_2_models(self):
775802

776803
assert_array_equal(S_train_1, S_train_3)
777804
assert_array_equal(S_test_1, S_test_3)
805+
806+
def test_N_dim_input(self):
807+
"""
808+
This is `test_oof_pred_bag_mode` function with `LogisticRegressionUnrolled` estimator
809+
"""
810+
S_test_temp = np.zeros((X_test_4d_unrolled.shape[0], n_folds))
811+
# Usind StratifiedKFold because by defauld cross_val_predict uses StratifiedKFold
812+
kf = StratifiedKFold(n_splits = n_folds, shuffle = False, random_state = 0)
813+
for fold_counter, (tr_index, te_index) in enumerate(kf.split(X_train_4d_unrolled, y_train_4d)):
814+
# Split data and target
815+
X_tr = X_train_4d_unrolled[tr_index]
816+
y_tr = y_train_4d[tr_index]
817+
X_te = X_train_4d_unrolled[te_index]
818+
y_te = y_train_4d[te_index]
819+
model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
820+
_ = model.fit(X_tr, y_tr)
821+
S_test_temp[:, fold_counter] = model.predict(X_test_4d_unrolled)
822+
S_test_1 = st.mode(S_test_temp, axis = 1)[0]
778823

824+
model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
825+
S_train_1 = cross_val_predict(model, X_train_4d_unrolled, y = y_train_4d, cv = n_folds,
826+
n_jobs = 1, verbose = 0, method = 'predict').reshape(-1, 1)
827+
828+
models = [LogisticRegressionUnrolled(random_state=0, solver='liblinear', multi_class='ovr')]
829+
S_train_2, S_test_2 = stacking(models, X_train_4d, y_train_4d, X_test_4d,
830+
regression = False, n_folds = n_folds, shuffle = False, save_dir=temp_dir,
831+
mode = 'oof_pred_bag', random_state = 0, verbose = 0, stratified = True)
832+
833+
# Load OOF from file
834+
# Normally if cleaning is performed there is only one .npy file at given moment
835+
# But if we have no cleaning there may be more then one file so we take the latest
836+
file_name = sorted(glob.glob(os.path.join(temp_dir, '*.npy')))[-1] # take the latest file
837+
S = np.load(file_name)
838+
S_train_3 = S[0]
839+
S_test_3 = S[1]
840+
841+
assert_array_equal(S_train_1, S_train_2)
842+
assert_array_equal(S_test_1, S_test_2)
843+
844+
assert_array_equal(S_train_1, S_train_3)
845+
assert_array_equal(S_test_1, S_test_3)
846+
779847
#-------------------------------------------------------------------------------
780848
#-------------------------------------------------------------------------------
781849

tests/test_func_api_classification_multiclass.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,33 @@
6060
y_test = y[ind_test]
6161

6262

63+
# Create 4-dim data
64+
np.random.seed(42)
65+
X_train_4d = np.random.normal(size=(400, 8, 8, 3))
66+
X_test_4d = np.random.normal(size=(100, 8, 8, 3))
67+
y_train_4d = np.random.randint(n_classes, size=400)
68+
69+
# Reshape 4-dim to 2-dim
70+
X_train_4d_unrolled = X_train_4d.reshape(X_train_4d.shape[0], -1)
71+
X_test_4d_unrolled = X_test_4d.reshape(X_test_4d.shape[0], -1)
72+
73+
#------------------------------------------------------------------------------
74+
#------------------------------------------------------------------------------
75+
76+
class LogisticRegressionUnrolled(LogisticRegression):
77+
"""
78+
For tests related to N-dim input.
79+
Estimator accepts N-dim array and reshape it to 2-dim array
80+
"""
81+
def fit(self, X, y):
82+
return super(LogisticRegressionUnrolled, self).fit(X.reshape(X.shape[0], -1), y)
83+
84+
def predict(self, X):
85+
return super(LogisticRegressionUnrolled, self).predict(X.reshape(X.shape[0], -1))
86+
87+
def predict_proba(self, X):
88+
return super(LogisticRegressionUnrolled, self).predict_proba(X.reshape(X.shape[0], -1))
89+
6390
#-------------------------------------------------------------------------------
6491
#-------------------------------------------------------------------------------
6592

@@ -772,7 +799,49 @@ def test_oof_pred_bag_mode_proba_2_models(self):
772799

773800
assert_array_equal(S_train_1, S_train_3)
774801
assert_array_equal(S_test_1, S_test_3)
802+
803+
804+
def test_N_dim_input(self):
805+
"""
806+
This is `test_oof_pred_bag_mode` function with `LogisticRegressionUnrolled` estimator
807+
"""
808+
S_test_temp = np.zeros((X_test_4d_unrolled.shape[0], n_folds))
809+
# Usind StratifiedKFold because by defauld cross_val_predict uses StratifiedKFold
810+
kf = StratifiedKFold(n_splits = n_folds, shuffle = False, random_state = 0)
811+
for fold_counter, (tr_index, te_index) in enumerate(kf.split(X_train_4d_unrolled, y_train_4d)):
812+
# Split data and target
813+
X_tr = X_train_4d_unrolled[tr_index]
814+
y_tr = y_train_4d[tr_index]
815+
X_te = X_train_4d_unrolled[te_index]
816+
y_te = y_train_4d[te_index]
817+
model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
818+
_ = model.fit(X_tr, y_tr)
819+
S_test_temp[:, fold_counter] = model.predict(X_test_4d_unrolled)
820+
S_test_1 = st.mode(S_test_temp, axis = 1)[0]
775821

822+
model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
823+
S_train_1 = cross_val_predict(model, X_train_4d_unrolled, y = y_train_4d, cv = n_folds,
824+
n_jobs = 1, verbose = 0, method = 'predict').reshape(-1, 1)
825+
826+
models = [LogisticRegressionUnrolled(random_state=0, solver='liblinear', multi_class='ovr')]
827+
S_train_2, S_test_2 = stacking(models, X_train_4d, y_train_4d, X_test_4d,
828+
regression = False, n_folds = n_folds, shuffle = False, save_dir=temp_dir,
829+
mode = 'oof_pred_bag', random_state = 0, verbose = 0, stratified = True)
830+
831+
# Load OOF from file
832+
# Normally if cleaning is performed there is only one .npy file at given moment
833+
# But if we have no cleaning there may be more then one file so we take the latest
834+
file_name = sorted(glob.glob(os.path.join(temp_dir, '*.npy')))[-1] # take the latest file
835+
S = np.load(file_name)
836+
S_train_3 = S[0]
837+
S_test_3 = S[1]
838+
839+
assert_array_equal(S_train_1, S_train_2)
840+
assert_array_equal(S_test_1, S_test_2)
841+
842+
assert_array_equal(S_train_1, S_train_3)
843+
assert_array_equal(S_test_1, S_test_3)
844+
776845
#-------------------------------------------------------------------------------
777846
#-------------------------------------------------------------------------------
778847

0 commit comments

Comments
 (0)