Skip to content

Commit bcbca76

Browse files
committed
Global maintenance 2025 turn 2
1 parent 3fe47e8 commit bcbca76

File tree

4 files changed

+55
-11
lines changed

4 files changed

+55
-11
lines changed

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[![PyPI version](https://img.shields.io/pypi/v/vecstack.svg?colorB=4cc61e)](https://pypi.python.org/pypi/vecstack)
22
[![PyPI license](https://img.shields.io/pypi/l/vecstack.svg)](https://github.com/vecxoz/vecstack/blob/master/LICENSE.txt)
3-
[![Build Status](https://travis-ci.org/vecxoz/vecstack.svg?branch=master)](https://travis-ci.org/vecxoz/vecstack)
3+
[![Build status](https://github.com/vecxoz/vecstack/actions/workflows/actions.yaml/badge.svg?branch=master)](https://github.com/vecxoz/vecstack/actions)
44
[![Coverage Status](https://coveralls.io/repos/github/vecxoz/vecstack/badge.svg?branch=master)](https://coveralls.io/github/vecxoz/vecstack?branch=master)
55
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/vecstack.svg)](https://pypi.python.org/pypi/vecstack/)
66

@@ -137,7 +137,7 @@ S_test = stack.transform(X_test)
137137
28. [Can I use `(Randomized)GridSearchCV` to tune the whole stacking Pipeline?](https://github.com/vecxoz/vecstack#28-can-i-use-randomizedgridsearchcv-to-tune-the-whole-stacking-pipeline)
138138
29. [How to define custom metric, especially AUC?](https://github.com/vecxoz/vecstack#29-how-to-define-custom-metric-especially-auc)
139139
30. [Do folds (splits) have to be the same across estimators and stacking levels? How does `random_state` work?](https://github.com/vecxoz/vecstack#30-do-folds-splits-have-to-be-the-same-across-estimators-and-stacking-levels-how-does-random_state-work)
140-
31. [How does `vecstack.StackingTransformer` differ from `sklearn.ensemble.StackingClassifier`?](https://github.com/vecxoz/vecstack#31)
140+
31. [How does `vecstack.StackingTransformer` differ from `sklearn.ensemble.StackingClassifier`?](https://github.com/vecxoz/vecstack#31-how-does-vecstackstackingtransformer-differ-from-sklearnensemblestackingclassifier)
141141

142142
### 1. How can I report an issue? How can I ask a question about stacking or vecstack package?
143143

@@ -410,13 +410,6 @@ It significantly differs. Please see a [detailed explanation](https://github.com
410410
9. You can also look at animation of [Variant A](https://github.com/vecxoz/vecstack#variant-a-animation) and [Variant B](https://github.com/vecxoz/vecstack#variant-b-animation).
411411

412412

413-
# References
414-
415-
* [Ensemble Learning](https://en.wikipedia.org/wiki/Ensemble_learning) ([Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) in Wikipedia
416-
* Classical [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/) or try [another link](https://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/)
417-
* [Stacked Generalization](https://www.researchgate.net/publication/222467943_Stacked_Generalization) paper by David H. Wolpert
418-
419-
420413
# Variant A
421414

422415
![Fold 1 of 3](https://github.com/vecxoz/vecstack/raw/master/pic/dia1.png "Fold 1 of 3")
@@ -442,3 +435,10 @@ It significantly differs. Please see a [detailed explanation](https://github.com
442435
# Variant B. Animation
443436

444437
![Variant B. Animation](https://github.com/vecxoz/vecstack/raw/master/pic/animation2.gif "Variant B. Animation")
438+
439+
440+
# References
441+
442+
* [Ensemble Learning](https://en.wikipedia.org/wiki/Ensemble_learning) ([Stacking](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking)) in Wikipedia
443+
* Classical [Kaggle Ensembling Guide](https://mlwave.com/kaggle-ensembling-guide/) or try [another link](https://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/)
444+
* [Stacked Generalization](https://www.researchgate.net/publication/222467943_Stacked_Generalization) paper by David H. Wolpert

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
long_desc = '''
66
Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API.
77
Convenient way to automate OOF computation, prediction and bagging using any number of models.
8+
All details, FAQ, and tutorials: https://github.com/vecxoz/vecstack
89
'''
910

1011
setup(name='vecstack',

tests/test_sklearn_api_regression.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2042,6 +2042,48 @@ def test_compare_with_stackingregressor_from_sklearn(self):
20422042
y_pred_rf = rf.fit(X_train, y_train).predict(X_train)
20432043
assert_array_equal(S_train_sklearn, np.hstack([y_pred_et.reshape(-1, 1), y_pred_rf.reshape(-1, 1)]))
20442044

2045+
# -------------------------------------------------------------------------
2046+
# Added 20250924
2047+
# Explicitly check that `validate_data` checks number of features
2048+
# -------------------------------------------------------------------------
2049+
2050+
def test_inconsistent_shape_passed_to_transform(self):
2051+
"""
2052+
When transforming non-training set there was a check:
2053+
```
2054+
if X.shape[1] != self.n_features_:
2055+
raise ValueError('Inconsistent number of features.')
2056+
```
2057+
It was needed because I used `check_array` function to validate data
2058+
and probably number of features was not checked.
2059+
2060+
Now I check data with `validate_data` which checks `self.n_features_in_`.
2061+
So my manual check can never happen and coverage dropped.
2062+
So I removed my manual check and created this test case to confirm explicitly that `validate_data` works.
2063+
2064+
In version 0.4.0 there was no specific test for this case,
2065+
probably because it was included in `check_estimator`.
2066+
"""
2067+
estimators = [
2068+
('lr', LinearRegression()),
2069+
('ridge', Ridge())]
2070+
2071+
stack = StackingTransformer(estimators=estimators,
2072+
regression=True,
2073+
variant='B',
2074+
n_folds=5,
2075+
shuffle=False)
2076+
2077+
stack = stack.fit(X_train, y_train)
2078+
S_train = stack.transform(X_train) # OK
2079+
S_test = stack.transform(X_test) # OK
2080+
2081+
# Transform train set with different number of features - in fact it is identified as non-train set because shape is different
2082+
assert_raises(ValueError, stack.transform, X_train[:, 1:])
2083+
2084+
# Transform test set with different number of features
2085+
assert_raises(ValueError, stack.transform, X_test[:, :-1])
2086+
20452087
# -----------------------------------------------------------------------------
20462088
# -----------------------------------------------------------------------------
20472089

vecstack/coresk.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -755,9 +755,10 @@ def transform(self, X, is_train_set=None):
755755
# Transform any other set
756756
# *********************************************************************
757757
else:
758+
# Legacy check included in `validate_data`
758759
# Check n_features
759-
if X.shape[1] != self.n_features_:
760-
raise ValueError('Inconsistent number of features.')
760+
# if X.shape[1] != self.n_features_:
761+
# raise ValueError('Inconsistent number of features.')
761762

762763
# Create empty numpy array for test predictions
763764
S_test = np.zeros((X.shape[0], self.n_estimators_ * self.n_classes_implicit_))

0 commit comments

Comments
 (0)