An Introductory Survey on Polynomial Machine Learning: Taxonomic Axes and Hierarchical Levels

This report surveys Polynomial Machine Learning (PML) at an introductory level. PML refers to the family of techniques that exploit higher-order and interaction terms of input variables to learn nonlinear relationships. The discussion is organized along three taxonomic axes and arranged into six hierarchical levels. Deep mathematical or theoretical analysis is intentionally avoided; the goal is to provide a structured starting point for further study.
Taxonomic Axes and Hierarchical Levels
The taxonomy uses three classification axes as a coordinate system, and the six levels are positions arranged along that system.
| Axis | Meaning | Levels |
|---|---|---|
| Axis A: Mathematical Foundation | On which mathematical property (orthogonality, domain) the polynomial is defined | Level 1, Level 2 |
| Axis B: Model Architecture | Into which computational structure (neural network, tensor decomposition) the polynomial is embedded | Level 3, Level 4 |
| Axis C: Application Pattern | How the polynomial is combined with or discovered from other models | Level 5, Level 6 |
Evolution Across Levels (Add / Subtract / Exchange)
The following table shows what is added (+), removed (−), or exchanged (↔) when moving from one level to the next.
| Transition | (+) Added | (−) Removed | (↔) Exchanged |
|---|---|---|---|
| L1 → L2 | Orthogonality constraint, link to probability distributions | Free use of arbitrary monomials | Power basis ↔ Orthogonal polynomial basis |
| L2 → L3 | Learnable weights, hierarchical layers, optional activations | Closed-form coefficient estimation | Single regression equation ↔ Multilayer neural network |
| L3 → L4 | Tensor decomposition, latent vector representation | Per-term explicit weight | Explicit term weight ↔ Latent vector inner product |
| L4 → L5 | Coupling with non-polynomial models (GP, CNN, physics) | Single-model assumption | Polynomial-only model ↔ Polynomial + residual hybrid |
| L5 → L6 | Discovery of the equation itself from data | Pre-fixed model form | Human-prescribed form ↔ Data-discovered form |
Taxonomy Hierarchy
Polynomial Machine Learning (PML)
│
[Axis A: Mathematical Foundation]
├── [Level 1] Classical Polynomial Models
│ ├── 1.1 Polynomial Regression
│ ├── 1.2 Polynomial Kernel Methods
│ └── 1.3 Response Surface Methodology (RSM)
│
├── [Level 2] Orthogonal Polynomial Basis Models
│ ├── Group 2-A: Theory-based Orthogonal Polynomials
│ │ ├── 2.1 Spatial-domain (Zernike / Chebyshev / Legendre / Fourier-Bessel)
│ │ ├── 2.2 Probabilistic-domain — Wiener-Askey (Hermite / Laguerre / Jacobi / Gegenbauer)
│ │ └── 2.3 Discrete-domain (Charlier / Krawtchouk / Meixner / Hahn)
│ ├── Group 2-B: Learning Frameworks Using Orthogonal Polynomials
│ │ ├── 2.4 Polynomial Chaos Expansion (PCE)
│ │ ├── 2.5 Sparse PCE & LARS-PCE
│ │ └── 2.6 Arbitrary PCE (aPCE)
│ └── Group 2-C: Data-driven Orthogonal Bases
│ └── 2.7 Karhunen-Loève (KL) Expansion / Proper Orthogonal Decomposition (POD)
│
[Axis B: Model Architecture]
├── [Level 3] Polynomial Neural Architectures
│ ├── 3.1 Group Method of Data Handling (GMDH)
│ ├── 3.2 Modern Polynomial Neural Networks (PNN)
│ ├── 3.3 Pi-Nets
│ ├── 3.4 Self-Organizing Polynomial NN (SOPNN)
│ └── 3.5 Kolmogorov-Arnold Networks (KAN)
│
├── [Level 4] Tensor & Factorization-based Polynomial Models
│ ├── 4.1 Factorization Machines (FM)
│ ├── 4.2 Higher-Order FM (HOFM)
│ ├── 4.3 Tensor Train / Tensor Regression
│ └── 4.4 Polynomial Tensor Decomposition
│
[Axis C: Application Pattern]
├── [Level 5] Hybrid & Surrogate Modeling
│ ├── 5.1 PCE-Kriging
│ ├── 5.2 Global + Residual Models (Spline / GNN / CNN)
│ ├── 5.3 Physics-Informed Polynomial Models
│ └── 5.4 Multi-fidelity Polynomial Surrogates
│
└── [Level 6] Symbolic & Sparse Polynomial Discovery
├── 6.1 Sparse Identification of Nonlinear Dynamics (SINDy)
├── 6.2 Symbolic Regression with Polynomial Basis
└── 6.3 LASSO / Elastic-Net Polynomial Feature Selection
Level 1. Classical Polynomial Models [Axis A]
The most fundamental layer, where linear models are extended directly using a power basis ($1, x, x^2, \ldots$). No orthogonality constraint applies.
1.1 Polynomial Regression
A monomial-basis regression of the form $y = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_d x^d + \epsilon$. As the degree increases, multicollinearity becomes severe, so it is standard practice to combine it with regularization such as Ridge or Least Absolute Shrinkage and Selection Operator (LASSO).
1.2 Polynomial Kernel Methods
The kernel $K(x, y) = (x^T y + c)^d$ enables computation of inner products in a polynomial feature space without explicit high-dimensional mapping. It is used in Support Vector Machines (SVM), Kernel Ridge Regression, and Kernel Principal Component Analysis (Kernel PCA).
1.3 Response Surface Methodology (RSM)
A second-order polynomial model $y = \beta_0 + \sum \beta_i x_i + \sum \beta_{ii} x_i^2 + \sum \beta_{ij} x_i x_j$ is used to find process optima. Combined with Central Composite Design (CCD) and Box-Behnken Design, it serves as a de facto standard for recipe optimization in etch, deposition, and Chemical Mechanical Planarization (CMP) processes (Myers et al. 2016).
Level 2. Orthogonal Polynomial Basis Models [Axis A]
Level 2 introduces the constraint of orthogonality on top of Level 1 to gain stability and interpretability in coefficient estimation. Two polynomials $\phi_i, \phi_j$ are orthogonal under a weight function $w(x)$ if
Benefits gained by exploiting orthogonality include: (i) coefficients can be estimated independently via orthogonal projection; (ii) adding higher-order terms does not perturb existing coefficients; (iii) output variance decomposes additively, enabling direct sensitivity analysis such as Sobol indices (see Appendix A); and (iv) the regression matrix is well-conditioned. A general treatment of orthogonal polynomials is given in Appendix B.
The seven sub-items are organized along two sub-axes: (A) the source of the basis functions (analytically defined vs. data-driven) and (B) the type of domain (spatial / probabilistic / discrete).
| Group | Domain | Items |
|---|---|---|
| 2-A: Theory-based | Spatial | 2.1 Zernike / Chebyshev / Legendre / Fourier-Bessel |
| Probabilistic | 2.2 Wiener-Askey (Hermite / Laguerre / Jacobi / Gegenbauer) | |
| Discrete lattice | 2.3 Charlier / Krawtchouk / Meixner / Hahn | |
| 2-B: Learning frameworks | Stochastic input → output regression | 2.4 PCE / 2.5 Sparse PCE / 2.6 aPCE |
| 2-C: Data-driven bases | Measurement data | 2.7 KL Expansion / POD |
2.1 Spatial-domain Orthogonal Polynomials
These polynomials are orthogonal on a specific geometric domain such as a wafer or die. They are used to decompose measured spatial data into global patterns.
| Polynomial | Domain | Strength | Process Variation Use Case |
|---|---|---|---|
| Zernike | Circle | Aligns with optical aberration orthogonality | Wafer-level Warp/Bow, global thickness and overlay decomposition |
| Chebyshev | Square | Suppresses Runge phenomenon, minimax approximation | Scanner slit area, intra-die pattern variation |
| Legendre | Square / interval | Simple integration, center-weighted | Flatness/roughness variation, linear trend separation |
| Fourier-Bessel | Circle | Stable at the edge, captures high-frequency content | Wafer edge roll-off, post-CMP edge zone |
2.2 Probabilistic-domain Orthogonal Polynomials (Wiener-Askey Mapping)
When the input is a random variable, the orthogonal polynomial is selected so that its weight function matches the Probability Density Function (PDF) of that distribution. Xiu & Karniadakis (2002) extended the original Hermite-Gaussian pairing of PCE to the entire Askey scheme, providing a one-to-one mapping between distributions and orthogonal families. This report refers to that mapping as the Wiener-Askey mapping.
| Polynomial | Interval / Weight | Corresponding Distribution |
|---|---|---|
| Hermite | $(-\infty, \infty)$, $e^{-x^2/2}$ | Gaussian |
| Laguerre | $[0, \infty)$, $e^{-x}$ | Gamma / Exponential |
| Jacobi | $[-1, 1]$, $(1-x)^\alpha(1+x)^\beta$ | Beta |
| Gegenbauer | $[-1, 1]$, $(1-x^2)^{\alpha-1/2}$ | Special case of Beta |
2.3 Discrete-domain Orthogonal Polynomials
These are families orthogonal on integer lattices, suitable for discrete inputs such as defect counts.
- Charlier: Poisson distribution
- Krawtchouk: Binomial distribution
- Meixner: Negative Binomial distribution
- Hahn: Hypergeometric distribution
2.4 Polynomial Chaos Expansion (PCE)
PCE expands the response of a system with stochastic inputs as a series of orthogonal polynomials (Ghanem & Spanos 1991; Xiu & Karniadakis 2002).
| Symbol | Meaning |
|---|---|
| $Y$ | System output (scalar or vector), e.g., wafer thickness, critical dimension |
| $\xi = (\xi_1, \ldots, \xi_d)$ | Standardized stochastic input vector, each $\xi_i$ following a known distribution |
| $d$ | Number of stochastic input variables |
| $\alpha = (\alpha_1, \ldots, \alpha_d)$ | Multi-index, $\alpha_i \in \mathbb{N}_0$, indicating the polynomial degree per input |
| $\mathcal{A}$ | Set of multi-indices used (typically $\sum_i \alpha_i \leq p$) |
| $\Psi_\alpha(\xi)$ | Product of univariate orthogonal polynomials: $\prod_{i=1}^d \psi_{\alpha_i}(\xi_i)$ |
| $c_\alpha$ | Polynomial coefficient (target of learning) |
| $p$ | Truncation order of the expansion |
Thanks to orthogonality, Sobol sensitivity indices (see Appendix A) can be obtained analytically from $c_\alpha$.
2.5 Sparse PCE & LARS-PCE
Why does the term count explode? The number of PCE terms with $d$-dimensional input and total degree up to $p$ is
This counts all integer combinations whose degree-sum is at most $p$, leading to combinatorial explosion. For example: $d=10, p=4 \Rightarrow 1{,}001$ terms; $d=20, p=4 \Rightarrow 10{,}626$; $d=50, p=4 \Rightarrow 316{,}251$. In semiconductor processes with tens to hundreds of inputs, the number of terms quickly exceeds the number of available samples, making naive PCE infeasible.
The remedy is sparsity. Least Angle Regression (LARS) or Orthogonal Matching Pursuit (OMP) selects only the most important polynomial terms. Adaptive Sparse PCE (Blatman & Sudret 2011) is a representative method.
2.6 Arbitrary PCE (aPCE)
When the input distribution is unknown or non-standard, the orthogonal polynomial basis can be constructed directly from the empirical moments of the data (Oladyshkin & Nowak 2012). aPCE is useful for irregular process data where the Wiener-Askey mapping does not apply.
2.7 Data-driven Orthogonal Bases (KL Expansion / POD)
Whereas 2.1–2.6 use polynomials defined a priori, 2.7 builds the basis directly from the data. The covariance structure of measurements is eigen-decomposed, and the eigenvectors corresponding to the largest eigenvalues serve as an orthogonal basis. Conceptually, the data itself reveals its own dominant variation modes (mode 1, mode 2, mode 3, …).
- Karhunen-Loève (KL) Expansion: Optimal orthogonal decomposition of a random field; suitable for extracting principal modes of W2W variation (Loève 1978).
- Proper Orthogonal Decomposition (POD): Discrete and practical version of KL; mathematically equivalent to Principal Component Analysis (PCA).
Level 3. Polynomial Neural Architectures [Axis B]
Level 3 embeds polynomial combinations (higher-order and interaction terms) inside neuron computations.
3.1 Group Method of Data Handling (GMDH)
The progenitor of Polynomial Neural Networks (PNN) (Ivakhnenko 1971). At each layer, second-order polynomial candidates over variable pairs $(x_i, x_j)$ are generated, and only nodes that pass an external validation criterion advance to the next layer, allowing the network to grow autonomously.
3.2 Modern Polynomial Neural Networks (PNN)
A modern extension of GMDH in which the degree, variable selection, and number of layers are determined adaptively.
3.3 Pi-Nets
Pi-Nets (Chrysos et al. 2020) express the output as a higher-order polynomial expansion of the input and use tensor decompositions (CANDECOMP/PARAFAC, Tucker) to keep parameter counts tractable. They achieve strong expressive power even without activation functions.
3.4 Self-Organizing Polynomial Neural Networks (SOPNN)
An extension of GMDH (Oh & Pedrycz 2002) that allows partial polynomials of varying degree at each node.
3.5 Kolmogorov-Arnold Networks (KAN)
KAN (Liu et al. 2024) places learnable univariate functions (B-splines or polynomials) on the edges of the network. Choosing polynomial bases for the edge functions effectively yields a structured generalization of PNN.
Level 4. Tensor & Factorization-based Polynomial Models [Axis B]
This level handles higher-order interaction terms efficiently through tensor decomposition.
4.1 Factorization Machines (FM)
Factorization Machines (Rendle 2010) represent the interaction weights of a second-order polynomial regression as inner products of low-dimensional latent vectors rather than learning each weight independently.
FM is particularly effective on sparse data such as recommendation systems and click-through-rate prediction.
4.2 Higher-Order Factorization Machines (HOFM)
HOFM (Blondel et al. 2016) extends FM to third- and fourth-order interactions efficiently using ANOVA kernels.
4.3 Tensor Train / Tensor Regression
The coefficients of a multivariate polynomial are viewed as a tensor and compressed via Tensor Train (TT) decomposition or Tucker decomposition. Low-rank PCE applied to high-dimensional PCE belongs to this category (Konakli & Sudret 2016).
4.4 Polynomial Tensor Decomposition
A formulation that casts tensor decomposition itself in polynomial form.
Level 5. Hybrid & Surrogate Modeling [Axis C]
Level 5 combines different polynomial techniques, or polynomial models with non-polynomial models, to maximize expressive power.
5.1 PCE-Kriging
PC-Kriging (Schöbi et al. 2015) captures global trends with PCE and models the residual as a Gaussian Process (GP). It is a standard paradigm in virtual metrology.
5.2 Global + Residual Models
Low-order orthogonal polynomials (e.g., Zernike) capture global shape, while a Spline (notably the Thin Plate Spline), Graph Neural Network (GNN), or Convolutional Neural Network (CNN) learns the fine-scale residual. This hybrid is effective for local distortions such as those caused by chuck adsorption.
5.3 Physics-Informed Polynomial Models
A polynomial variant of Physics-Informed Neural Networks (PINN) (Raissi et al. 2019). Governing equations are included in the loss function, and the solution is expanded in a polynomial basis (typically Chebyshev or Legendre); this construction is referred to as a Spectral PINN.
5.4 Multi-fidelity Polynomial Surrogates
Combines low-fidelity (fast simulation) and high-fidelity (measurement) data (Kennedy & O’Hagan 2000). It is essential for virtual metrology that fuses Technology Computer-Aided Design (TCAD) simulations with measurements.
Level 6. Symbolic & Sparse Polynomial Discovery [Axis C]
The most recent direction: discovering interpretable polynomial expressions directly from data.
6.1 Sparse Identification of Nonlinear Dynamics (SINDy)
SINDy (Brunton et al. 2016) builds a library matrix from candidate functions (polynomials, trigonometric terms, etc.) and applies LASSO to recover a sparse solution, thereby identifying the governing equations of a dynamical system.
6.2 Symbolic Regression with Polynomial Basis
Tools based on Genetic Programming (GP), such as PySR (Cranmer 2023), use polynomial terms as building blocks and search over expressions.
6.3 LASSO / Elastic-Net Polynomial Feature Selection
A classical approach: a large number of polynomial features are generated and then pruned via regularization (Tibshirani 1996; Zou & Hastie 2005).
Mapping to Wafer Process Variation Modeling (WiW / W2W)
| Application | Taxonomy Position | Method |
|---|---|---|
| WiW spatial decomposition (global shape) | Level 2.1 | Zernike / Chebyshev / Legendre / Fourier-Bessel regression |
| WiW spatial decomposition (local residual) | Level 5.2 | Global + Spline/GNN/CNN residual |
| W2W variation mode extraction | Level 2.7 | KL Expansion / POD |
| Process variation Uncertainty Quantification (UQ) | Level 2.4, 2.5 | PCE / Sparse PCE |
| Virtual Metrology | Level 5.1, 5.4 | PCE-Kriging, Multi-fidelity surrogate |
| Process recipe optimization (DOE) | Level 1.3 | RSM |
| Process dynamics discovery | Level 6.1 | SINDy |
Appendix A. Sobol Sensitivity Indices
Sobol indices (Sobol 1993) are a global-sensitivity measure that quantifies how much of the output variance is attributable to each input variable, or to combinations of inputs.
A.1 ANOVA Decomposition
Suppose $Y = f(X_1, X_2, \ldots, X_d)$ admits the Analysis of Variance (ANOVA) decomposition
If the components are mutually orthogonal (independent), the output variance decomposes as $\mathrm{Var}(Y) = \sum_i V_i + \sum_{i \lt j} V_{ij} + \cdots$, where $V_i = \mathrm{Var}(f_i(X_i))$ is the contribution of $X_i$ alone and $V_{ij}$ captures the interaction between $X_i$ and $X_j$.
A.2 First-order and Total-effect Indices
$S_i$ is the contribution of $X_i$ acting alone, while the total-effect index $S_i^T$ is the total contribution of $X_i$ including all its interactions.
A.3 Closed-form Computation in PCE
In a PCE $Y = \sum_\alpha c_\alpha \Psi_\alpha(\xi)$, orthogonality makes the variance a simple sum:
Letting $\mathcal{A}_i = \{\alpha : \alpha_i > 0,\ \alpha_j = 0\ \forall j \neq i\}$, we have
All Sobol indices follow in closed form from the PCE coefficients alone, with no additional simulation required. This property (Sudret 2008) is the central reason PCE has become a standard in UQ.
Appendix B. Orthogonal Polynomials
B.1 Definition
Given an interval $[a, b]$ and a non-negative weight $w(x)$, a sequence $\{\phi_0, \phi_1, \phi_2, \ldots\}$ of polynomials is called an orthogonal polynomial sequence with respect to $w(x)$ if
where $h_i > 0$ are normalization constants and $\delta_{ij}$ is the Kronecker delta. If $h_i = 1$, the system is orthonormal.
B.2 Key Properties
(1) Three-term recurrence. Every orthogonal polynomial system satisfies a recurrence of the form
which makes evaluation numerically stable and efficient.
(2) Distribution of zeros. $\phi_n(x)$ has exactly $n$ simple real zeros inside $[a, b]$, which serve as the nodes of Gaussian quadrature.
(3) Best approximation. In the $L^2_w$ norm, the best polynomial approximation of $f$ of degree at most $n$ is
each coefficient determined independently of the others — the orthogonal projection.
B.3 Representative Polynomials
| Name | Interval $[a,b]$ | Weight $w(x)$ | Initial Recurrence |
|---|---|---|---|
| Legendre $P_n$ | $[-1, 1]$ | $1$ | $P_0=1, P_1=x$ |
| Chebyshev (1st kind) $T_n$ | $[-1, 1]$ | $(1-x^2)^{-1/2}$ | $T_0=1, T_1=x$ |
| Hermite $H_n$ (probabilists’) | $(-\infty, \infty)$ | $e^{-x^2/2}$ | $H_0=1, H_1=x$ |
| Laguerre $L_n$ | $[0, \infty)$ | $e^{-x}$ | $L_0=1, L_1=1-x$ |
| Jacobi $P_n^{(\alpha,\beta)}$ | $[-1, 1]$ | $(1-x)^\alpha(1+x)^\beta$ | $P_0=1$ |
| Zernike $Z_n^m(r,\theta)$ | Unit disk | $1$ | 2D, separable in radius and angle |
B.4 Wiener-Askey Mapping (Distribution ↔ Orthogonal Polynomial)
| Distribution | Weight = PDF | Orthogonal Polynomial |
|---|---|---|
| Gaussian | $\propto e^{-x^2/2}$ | Hermite |
| Uniform on $[-1,1]$ | $1/2$ | Legendre |
| Gamma / Exponential | $\propto e^{-x}$ | Laguerre |
| Beta | $\propto (1-x)^\alpha(1+x)^\beta$ | Jacobi |
| Poisson | discrete | Charlier |
| Binomial | discrete | Krawtchouk |
This mapping (Xiu & Karniadakis 2002) is the basis on which PCE automatically chooses the orthogonal polynomial family that matches the input distribution.
B.5 Why Orthogonal Polynomials Are Powerful in PML
- Numerical stability: The monomial basis $\{1, x, x^2, \ldots\}$ becomes nearly parallel on $[0,1]$, producing an ill-conditioned Vandermonde regression matrix. Orthogonal bases avoid this.
- Coefficient interpretability: Each coefficient directly represents the strength of its corresponding polynomial mode.
- Modularity in adding terms: Adding higher-order terms does not alter existing coefficients, enabling adaptive modeling.
- Variance decomposition: ANOVA-style decomposition follows automatically — Sobol indices and sensitivity analysis become direct.
- Compatibility with Gaussian quadrature: The zeros of the polynomials serve as quadrature nodes, making integrals and expectations efficient.
References
- Blatman, G., & Sudret, B. (2011). Adaptive sparse polynomial chaos expansion based on least angle regression. Journal of Computational Physics, 230(6), 2345–2367.
- Blondel, M., Fujino, A., Ueda, N., & Ishihata, M. (2016). Higher-order factorization machines. Advances in Neural Information Processing Systems, 29.
- Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15), 3932–3937.
- Chrysos, G. G., Moschoglou, S., Bouritsas, G., Panagakis, Y., Deng, J., & Zafeiriou, S. (2020). P-Nets: Deep polynomial neural networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Cranmer, M. (2023). Interpretable machine learning for science with PySR and SymbolicRegression.jl. arXiv preprint arXiv:2305.01582.
- Ghanem, R. G., & Spanos, P. D. (1991). Stochastic Finite Elements: A Spectral Approach. Springer.
- Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-1(4), 364–378.
- Kennedy, M. C., & O’Hagan, A. (2000). Predicting the output from a complex computer code when fast approximations are available. Biometrika, 87(1), 1–13.
- Konakli, K., & Sudret, B. (2016). Polynomial meta-models with canonical low-rank approximations: Numerical insights and comparison to sparse polynomial chaos expansions. Journal of Computational Physics, 321, 1144–1169.
- Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T. Y., & Tegmark, M. (2024). KAN: Kolmogorov-Arnold networks. arXiv preprint arXiv:2404.19756.
- Loève, M. (1978). Probability Theory II (4th ed.). Springer.
- Myers, R. H., Montgomery, D. C., & Anderson-Cook, C. M. (2016). Response Surface Methodology: Process and Product Optimization Using Designed Experiments (4th ed.). Wiley.
- Oh, S. K., & Pedrycz, W. (2002). Self-organizing polynomial neural networks based on polynomial and fuzzy polynomial neurons: Analysis and design. Fuzzy Sets and Systems, 142(2), 163–198.
- Oladyshkin, S., & Nowak, W. (2012). Data-driven uncertainty quantification using the arbitrary polynomial chaos expansion. Reliability Engineering & System Safety, 106, 179–190.
- Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.
- Rendle, S. (2010). Factorization machines. IEEE International Conference on Data Mining, 995–1000.
- Schöbi, R., Sudret, B., & Wiart, J. (2015). Polynomial-chaos-based Kriging. International Journal for Uncertainty Quantification, 5(2), 171–193.
- Sobol, I. M. (1993). Sensitivity estimates for nonlinear mathematical models. Mathematical Modelling and Computational Experiments, 1(4), 407–414.
- Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability Engineering & System Safety, 93(7), 964–979.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
- Xiu, D., & Karniadakis, G. E. (2002). The Wiener-Askey polynomial chaos for stochastic differential equations. SIAM Journal on Scientific Computing, 24(2), 619–644.
- Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.
