Addressing Random Seed Sensitivity in Feature Selection: A Survey of Methods and Recent Advances (2025–2026)
Stability Selection with Python Code
Overview
Stability Selection (Meinshausen & Bühlmann, 2010) is a general technique that improves the reliability of feature selection by combining subsampling with any base selection algorithm (e.g., Lasso, Random Forest).
Core Idea
- Repeatedly draw random subsamples (typically half of the data) — B times (e.g., B = 100).
- Apply a base feature selector on each subsample.
- For each feature, compute its selection probability = (# times selected) / B.
- Retain only features whose selection probability exceeds a threshold π (e.g., 0.6–0.9).
This filters out features that appear significant only due to random fluctuations, yielding a stable and reproducible feature set with theoretical false discovery control.
Python Example (Lasso-based)
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
# 1. Generate synthetic data: 200 samples, 50 features, only 10 informative
X, y, true_coef = make_regression(
n_samples=200, n_features=50, n_informative=10,
noise=5.0, coef=True, random_state=0
)
X = StandardScaler().fit_transform(X)
# 2. Stability Selection parameters
n_bootstrap = 100 # number of subsamples (B)
subsample_ratio = 0.5 # fraction of data per subsample
alpha = 0.05 # Lasso regularization strength
threshold = 0.7 # selection probability cutoff (π)
n_samples, n_features = X.shape
selection_counts = np.zeros(n_features)
# 3. Repeated subsampling + Lasso selection
rng = np.random.default_rng(42)
sub_size = int(n_samples * subsample_ratio)
for _ in range(n_bootstrap):
idx = rng.choice(n_samples, size=sub_size, replace=False)
lasso = Lasso(alpha=alpha, max_iter=10000).fit(X[idx], y[idx])
selection_counts += (lasso.coef_ != 0).astype(int)
# 4. Compute selection probabilities and apply threshold
selection_prob = selection_counts / n_bootstrap
stable_features = np.where(selection_prob >= threshold)[0]
# 5. Report
print(f"Stable features (prob ≥ {threshold}): {stable_features}")
print(f"Selection probabilities: {np.round(selection_prob, 2)}")
print(f"True informative features: {np.where(true_coef != 0)[0]}")Key Parameters
| Parameter | Typical Value | Role |
|---|---|---|
n_bootstrap (B) | 100 | More iterations → smoother probability estimates |
subsample_ratio | 0.5 | Half-sampling is standard |
alpha | Cross-validated | Regularization strength of base selector |
threshold (π) | 0.6 – 0.9 | Higher threshold → fewer but more reliable features |
Practical Notes
- Swap
Lassowith any selector (Random Forest importance, Boruta, mutual information, etc.). - For stricter false discovery control, use the bound from Meinshausen & Bühlmann: π ≥ 0.5 with an appropriate regularization range.
- The
stability-selectionpackage provides a scikit-learn compatible implementation.
Our Score
Click to rate this post!
[Total: 0 Average: 0]
Visited 13 times, 1 visit(s) today
