vecxoz
diff --git a/‎.travis.yml‎
Lines changed: 6 additions & 1 deletion b/‎.travis.yml‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 53 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎LICENSE.txt‎
Lines changed: 1 addition & 1 deletion b/‎LICENSE.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎PY2.md‎
Lines changed: 22 additions & 0 deletions b/‎PY2.md‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 8 additions & 7 deletions b/‎README.md‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎examples/04_sklearn_api_regression_pipeline.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎examples/04_sklearn_api_regression_pipeline.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎setup.py‎
Lines changed: 8 additions & 5 deletions b/‎setup.py‎
Lines changed: 8 additions & 5 deletions
diff --git a/‎tests/test_func_api_classification_binary.py‎
Lines changed: 68 additions & 0 deletions b/‎tests/test_func_api_classification_binary.py‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎tests/test_func_api_classification_multiclass.py‎
Lines changed: 69 additions & 0 deletions b/‎tests/test_func_api_classification_multiclass.py‎
Lines changed: 69 additions & 0 deletions
@@ -2,10 +2,14 @@
 # check if .travis.yml is valid: http://lint.travis-ci.org/
 # to skip build for given commit put [ci skip] or [skip ci] in commit message
 
+# required for Python >= 3.7
+dist: xenial
+
 language: python
 
-# versions supported by scikit-learn
+# versions supported by scikit-learn and some additional versions
 python:
+  - "3.7"
   - "3.6"
   - "3.5"
   - "3.4"
@@ -16,6 +20,7 @@ branches:
   only:
   - master
   - dev
+  - py2
 
 install:
   - pip install numpy
 
@@ -0,0 +1,53 @@
+# Changelog
+
+### v0.4.0 -- August 12, 2019
+
+Since v0.4.0 vecstack provides official support for Python 3.5 and higher only,  
+but still there is unofficial support for Python 2.7 and Python 3.4.  
+Please see [details](https://github.com/vecxoz/vecstack/blob/master/PY2.md).
+
+Scikit-learn API:
+* Fixed #31. `sklearn.externals.six` deprecation
+* Fixed #29. Out-of-memory in `np.random.choice` for very large ranges
+
+Functional API:
+* Feature #18. Added support for N-dimensional input. Useful for convolutional nets.
+* Added aliases for `mode` parameter values which correspond to respective `variant` parameter values of `StackingTransformer`:
+  * 'oof_pred_bag' == 'A'
+  * 'oof_pred' == 'B'
+
+### v0.3.0 -- April 6, 2018
+
+Introducing Scikit-learn API: `StackingTransformer`
+
+* Standard transformer class with `fit` and `transform` methods
+* Compatible with `Pipeline` and `FeatureUnion`
+
+### v0.2.2 -- February 23, 2018
+
+* Fixed #5. Wrong behavior during sparse matrix processing
+* Improved input data validation
+* Improved sparse matrix processing
+
+### v0.2.1 -- January 24, 2018 -- Maintenance release
+
+* Minor modifications
+
+### v0.2 -- January 23, 2018
+
+New features:
+
+* Classification with probabilities
+* Modes: compute only what you need (only OOF, only predictions, both, etc.)
+* Save resulting arrays and log with model parameters
+
+### v0.1 -- November 22, 2016 -- Initial release
+
+Features:
+
+* Functional stacking API
+* Regression
+* Classification with class labels
+* Ordinary and stratified k-fold split
+* User-defined metric
+* User-defined transformations for target and prediction
@@ -1,7 +1,7 @@
 MIT License
 
 Vecstack. Python package for stacking (machine learning technique)
-Copyright (c) 2016-2018 Igor Ivanov
+Copyright (c) 2016-2019 Igor Ivanov
 Email: [email protected]
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 
@@ -0,0 +1,22 @@
+### Python 3.x
+
+Since v0.4.0 vecstack provides official support for Python 3.5 and higher only,  
+but still there is unofficial support for Python 2.7 and Python 3.4. See details below.  
+
+The reason for these changes is global movement in Python 3.x direction.  
+Vecstack depends on scikit-learn which has already stopped support for Python < 3.5.  
+Scikit-learn v0.20.x is the last version supporting Python 2.7 and Python 3.4.  
+Vecstack follows this direction as well.  
+Please see [python3statement.org](https://python3statement.org/) for more details.  
+
+### Unofficial support for Python 2.7 and Python 3.4
+
+You can still install and run latest vecstack on Python 2.7 and Python 3.4.  
+NOTE. It will require legacy versions of the following packages:   
+* numpy<1.17
+* scipy<1.3
+* scikit-learn>=0.18,<0.21
+There is a dedicated branch on GitHub called `py2` with appropriate requirements in `setup.py`.  
+Installation:  
+
+`pip install https://github.com/vecxoz/vecstack/archive/py2.zip`
@@ -22,10 +22,10 @@ Convenient way to automate OOF computation, prediction and bagging using any num
     * Predict [class labels or probabilities](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L119) in classification task
     * Apply any [user-defined metric](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L124)
     * Apply any [user-defined transformations](https://github.com/vecxoz/vecstack/blob/master/vecstack/coresk.py#L87) for target and prediction
-    * Python 2, Python 3
+    * Python 3.5 and higher, [unofficial support for Python 2.7 and 3.4](https://github.com/vecxoz/vecstack/blob/master/PY2.md)
     * Win, Linux, Mac
     * [MIT license](https://github.com/vecxoz/vecstack/blob/master/LICENSE.txt)
-    * Depends on **numpy**, **scipy**, **scikit-learn>=18.0**
+    * Depends on **numpy**, **scipy**, **scikit-learn>=0.18**
 
 # Get started
 * [FAQ](https://github.com/vecxoz/vecstack#stacking-faq)
@@ -292,14 +292,15 @@ Stacking API comparison:
 | Estimator implementation restrictions | Must have only `fit` and `predict` (`predict_proba`) methods | Must be fully scikit-learn compatible |
 | `NaN` and `inf` in input data | Allowed | Not allowed |
 | Can automatically save OOF and log in files | Yes | No |
+| Input dimensionality (`X_train`, `X_test`) | Arbitrary | 2-D |
 
 ### 21. How do parameters of `stacking` function and `StackingTransformer` correspond?
 
-| **stacking function**   | **StackingTransformer**           |
-|-------------------------|-----------------------------------|
-| `models=[Ridge()]`      | `estimators=[('ridge', Ridge())]` |
-| `mode='oof_pred_bag'`   | `variant='A'`                     |
-| `mode='oof_pred'`       | `variant='B'`                     |
+| **stacking function**                 | **StackingTransformer**           |
+|---------------------------------------|-----------------------------------|
+| `models=[Ridge()]`                    | `estimators=[('ridge', Ridge())]` |
+| `mode='oof_pred_bag'` (alias `'A'`)   | `variant='A'`                     |
+| `mode='oof_pred'` (alias `'B'`)       | `variant='B'`                     |
 
 ### 22. Why Scikit-learn API was implemented as transformer and not predictor?
 
 
@@ -512,7 +512,7 @@
    "source": [
     "# 2. Pipeline\n",
     "\n",
-    "StackingTransformer is fully scikit-learn compatible so we can easily implement **arbitrary number of stacking layers** using Pipeline\n"
+    "StackingTransformer is fully scikit-learn compatible so we can easily implement **arbitrary number of stacking levels** using Pipeline\n"
    ]
   },
   {
@@ -535,7 +535,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# If we have several stacking layers our Pipeline steps would be:\n",
+    "# If we have several stacking levels our Pipeline steps would be:\n",
     "# steps = [('stack_L1', stack_L1),\n",
     "#          ('stack_L2', stack_L2),\n",
     "#          ('stack_L99', stack_L99), # :-)\n",
 
@@ -2,23 +2,26 @@
 
 from setuptools import setup
 
+long_desc = '''
+Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API.
+Convenient way to automate OOF computation, prediction and bagging using any number of models.
+'''
+
 setup(name='vecstack',
-      version='0.3.0',
+      version='0.4.0',
       description='Python package for stacking (machine learning technique)',
-      long_description='Convenient way to automate OOF computation, prediction and bagging using any number of models',
+      long_description=long_desc,
       classifiers=[
           'License :: OSI Approved :: MIT License',
           'Operating System :: MacOS',
           'Operating System :: Microsoft :: Windows',
           'Operating System :: POSIX',
           'Operating System :: Unix',
           'Programming Language :: Python',
-          'Programming Language :: Python :: 2',
-          'Programming Language :: Python :: 2.7',
           'Programming Language :: Python :: 3',
-          'Programming Language :: Python :: 3.4',
           'Programming Language :: Python :: 3.5',
           'Programming Language :: Python :: 3.6',
+          'Programming Language :: Python :: 3.7',
           'Topic :: Scientific/Engineering',
           'Topic :: Scientific/Engineering :: Artificial Intelligence',
           'Topic :: Scientific/Engineering :: Information Analysis',
 
@@ -63,6 +63,33 @@
 y_test = y[ind_test]
 
 
+# Create 4-dim data
+np.random.seed(42)
+X_train_4d = np.random.normal(size=(400, 8, 8, 3))
+X_test_4d = np.random.normal(size=(100, 8, 8, 3))
+y_train_4d = np.random.randint(n_classes, size=400)
+
+# Reshape 4-dim to 2-dim
+X_train_4d_unrolled = X_train_4d.reshape(X_train_4d.shape[0], -1)
+X_test_4d_unrolled = X_test_4d.reshape(X_test_4d.shape[0], -1)
+
+#------------------------------------------------------------------------------
+#------------------------------------------------------------------------------
+
+class LogisticRegressionUnrolled(LogisticRegression):
+    """
+    For tests related to N-dim input.
+    Estimator accepts N-dim array and reshape it to 2-dim array
+    """
+    def fit(self, X, y):
+        return super(LogisticRegressionUnrolled, self).fit(X.reshape(X.shape[0], -1), y)
+
+    def predict(self, X):
+        return super(LogisticRegressionUnrolled, self).predict(X.reshape(X.shape[0], -1))
+
+    def predict_proba(self, X):
+        return super(LogisticRegressionUnrolled, self).predict_proba(X.reshape(X.shape[0], -1))
+
 #-------------------------------------------------------------------------------
 #-------------------------------------------------------------------------------
 
@@ -775,7 +802,48 @@ def test_oof_pred_bag_mode_proba_2_models(self):
 
         assert_array_equal(S_train_1, S_train_3)
         assert_array_equal(S_test_1, S_test_3)
+
+    def test_N_dim_input(self):
+        """
+        This is `test_oof_pred_bag_mode` function with `LogisticRegressionUnrolled` estimator
+        """
+        S_test_temp = np.zeros((X_test_4d_unrolled.shape[0], n_folds))
+        # Usind StratifiedKFold because by defauld cross_val_predict uses StratifiedKFold
+        kf = StratifiedKFold(n_splits = n_folds, shuffle = False, random_state = 0)
+        for fold_counter, (tr_index, te_index) in enumerate(kf.split(X_train_4d_unrolled, y_train_4d)):
+            # Split data and target
+            X_tr = X_train_4d_unrolled[tr_index]
+            y_tr = y_train_4d[tr_index]
+            X_te = X_train_4d_unrolled[te_index]
+            y_te = y_train_4d[te_index]
+            model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
+            _ = model.fit(X_tr, y_tr)
+            S_test_temp[:, fold_counter] = model.predict(X_test_4d_unrolled)
+        S_test_1 = st.mode(S_test_temp, axis = 1)[0]
 
+        model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
+        S_train_1 = cross_val_predict(model, X_train_4d_unrolled, y = y_train_4d, cv = n_folds,
+            n_jobs = 1, verbose = 0, method = 'predict').reshape(-1, 1)
+
+        models = [LogisticRegressionUnrolled(random_state=0, solver='liblinear', multi_class='ovr')]
+        S_train_2, S_test_2 = stacking(models, X_train_4d, y_train_4d, X_test_4d,
+            regression = False, n_folds = n_folds, shuffle = False, save_dir=temp_dir,
+            mode = 'oof_pred_bag', random_state = 0, verbose = 0, stratified = True)
+
+        # Load OOF from file
+        # Normally if cleaning is performed there is only one .npy file at given moment
+        # But if we have no cleaning there may be more then one file so we take the latest
+        file_name = sorted(glob.glob(os.path.join(temp_dir, '*.npy')))[-1] # take the latest file
+        S = np.load(file_name)
+        S_train_3 = S[0]
+        S_test_3 = S[1]
+
+        assert_array_equal(S_train_1, S_train_2)
+        assert_array_equal(S_test_1, S_test_2)
+
+        assert_array_equal(S_train_1, S_train_3)
+        assert_array_equal(S_test_1, S_test_3)
+
 #-------------------------------------------------------------------------------
 #-------------------------------------------------------------------------------
 
 
@@ -60,6 +60,33 @@
 y_test = y[ind_test]
 
 
+# Create 4-dim data
+np.random.seed(42)
+X_train_4d = np.random.normal(size=(400, 8, 8, 3))
+X_test_4d = np.random.normal(size=(100, 8, 8, 3))
+y_train_4d = np.random.randint(n_classes, size=400)
+
+# Reshape 4-dim to 2-dim
+X_train_4d_unrolled = X_train_4d.reshape(X_train_4d.shape[0], -1)
+X_test_4d_unrolled = X_test_4d.reshape(X_test_4d.shape[0], -1)
+
+#------------------------------------------------------------------------------
+#------------------------------------------------------------------------------
+
+class LogisticRegressionUnrolled(LogisticRegression):
+    """
+    For tests related to N-dim input.
+    Estimator accepts N-dim array and reshape it to 2-dim array
+    """
+    def fit(self, X, y):
+        return super(LogisticRegressionUnrolled, self).fit(X.reshape(X.shape[0], -1), y)
+
+    def predict(self, X):
+        return super(LogisticRegressionUnrolled, self).predict(X.reshape(X.shape[0], -1))
+
+    def predict_proba(self, X):
+        return super(LogisticRegressionUnrolled, self).predict_proba(X.reshape(X.shape[0], -1))
+
 #-------------------------------------------------------------------------------
 #-------------------------------------------------------------------------------
 
@@ -772,7 +799,49 @@ def test_oof_pred_bag_mode_proba_2_models(self):
 
         assert_array_equal(S_train_1, S_train_3)
         assert_array_equal(S_test_1, S_test_3)
+
+
+    def test_N_dim_input(self):
+        """
+        This is `test_oof_pred_bag_mode` function with `LogisticRegressionUnrolled` estimator
+        """
+        S_test_temp = np.zeros((X_test_4d_unrolled.shape[0], n_folds))
+        # Usind StratifiedKFold because by defauld cross_val_predict uses StratifiedKFold
+        kf = StratifiedKFold(n_splits = n_folds, shuffle = False, random_state = 0)
+        for fold_counter, (tr_index, te_index) in enumerate(kf.split(X_train_4d_unrolled, y_train_4d)):
+            # Split data and target
+            X_tr = X_train_4d_unrolled[tr_index]
+            y_tr = y_train_4d[tr_index]
+            X_te = X_train_4d_unrolled[te_index]
+            y_te = y_train_4d[te_index]
+            model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
+            _ = model.fit(X_tr, y_tr)
+            S_test_temp[:, fold_counter] = model.predict(X_test_4d_unrolled)
+        S_test_1 = st.mode(S_test_temp, axis = 1)[0]
 
+        model = LogisticRegression(random_state=0, solver='liblinear', multi_class='ovr')
+        S_train_1 = cross_val_predict(model, X_train_4d_unrolled, y = y_train_4d, cv = n_folds,
+            n_jobs = 1, verbose = 0, method = 'predict').reshape(-1, 1)
+
+        models = [LogisticRegressionUnrolled(random_state=0, solver='liblinear', multi_class='ovr')]
+        S_train_2, S_test_2 = stacking(models, X_train_4d, y_train_4d, X_test_4d,
+            regression = False, n_folds = n_folds, shuffle = False, save_dir=temp_dir,
+            mode = 'oof_pred_bag', random_state = 0, verbose = 0, stratified = True)
+
+        # Load OOF from file
+        # Normally if cleaning is performed there is only one .npy file at given moment
+        # But if we have no cleaning there may be more then one file so we take the latest
+        file_name = sorted(glob.glob(os.path.join(temp_dir, '*.npy')))[-1] # take the latest file
+        S = np.load(file_name)
+        S_train_3 = S[0]
+        S_test_3 = S[1]
+
+        assert_array_equal(S_train_1, S_train_2)
+        assert_array_equal(S_test_1, S_test_2)
+
+        assert_array_equal(S_train_1, S_train_3)
+        assert_array_equal(S_test_1, S_test_3)
+
 #-------------------------------------------------------------------------------
 #-------------------------------------------------------------------------------