| |

Physics-Informed Machine Learning: Integrating Physical Laws and Domain Knowledge into Numeric AI/ML

The Core Principle of PIML: Loss Function

$$ { \mathcal{L}_{total} = \mathcal{L}_{data} + \lambda \cdot \mathcal{L}_{phys} } $$
Experience
(Observations)
+
Reason
(First Principles)

Numeric data-driven AI/ML models are powerful when abundant training data is available, but they suffer from fundamental limitations: poor performance with scarce data, physically implausible predictions, and rapid degradation outside the training distribution (extrapolation regime). To address these issues, the Physics-Informed Machine Learning (PIML) paradigm has rapidly evolved, injecting physical laws and domain knowledge at various points of the ML pipeline [1][2]. This report classifies methodologies by injection point, then summarizes the principles, strengths, weaknesses, and latest advances of key techniques.

High-Level Taxonomy

The most powerful axis for classifying PIML is “at which stage of the ML pipeline is physical knowledge injected?” Karniadakis et al. (2021) [1] proposed three categories of bias that serve as the backbone of this taxonomy:

  • Observational Bias — Inject physics through data (physics-based simulation, physically meaningful feature engineering)
  • Inductive Bias — Inject physics through architecture (equivariant networks, conservation-preserving structures)
  • Learning Bias — Inject physics through the loss function (PINNs, soft constraints)

To this we add two additional practical categories:

  • Hybrid Modeling — Couple physics-based solvers with ML models [3]
  • Constrained Optimization — Enforce physical constraints on outputs (hard constraints)

Taxonomy Hierarchy

Physics-Informed Machine Learning (PIML)
│
├── Observational Bias — inject physics through data
│   ├── Physics-based Feature Engineering
│   │   ├── Dimensionless numbers (Re, Pr, Nu, ...)
│   │   ├── Derived variables from governing equations
│   │   └── Symbolic feature generation
│   └── Physics-based Data Augmentation
│       ├── Simulation-augmented training
│       └── Synthetic data from reduced-order models
│
├── Inductive Bias — inject physics through architecture
│   ├── Equivariant / Invariant Networks
│   │   ├── E(3)-equivariant NN
│   │   └── Hamiltonian / Lagrangian NN
│   ├── Conservation-preserving architectures
│   │   └── Symplectic networks
│   └── Kolmogorov-Arnold Networks (KAN, 2024)
│       └── PIKAN / KINN / SPIKAN (2024-2025)
│
├── Learning Bias — inject physics through loss
│   ├── PINNs (Physics-Informed Neural Networks, 2019)
│   │   ├── Vanilla PINN
│   │   ├── XPINN / cPINN (domain decomposition)
│   │   └── Self-adaptive PINN
│   └── Neural Operators
│       ├── DeepONet (2021)
│       ├── Fourier Neural Operator — FNO (2020)
│       ├── PI-DeepONet / PINO (physics-informed variants)
│       ├── Graph Neural Operator (GNO)
│       ├── Mamba Neural Operator (NeurIPS 2024)
│       └── PI-Latent-NO (2025)
│
├── Hybrid Modeling — couple physics + ML
│   ├── Residual Modeling (physics predicts; ML corrects)
│   ├── Parameter Estimation (ML calibrates physics params)
│   └── Physics as Surrogate (fast physics + ML refinement)
│
└── Constrained Optimization — enforce hard constraints
    ├── Projection Layers (KKT projection)
    ├── Lagrangian / Augmented Lagrangian
    └── Differentiable Convex Optimization Layers (cvxpylayers)

Method Summaries

1) Physics-based Feature Engineering

Principle. Add features to the input vector that are derived directly from governing physics. Rather than feeding raw variables $(v, L, \nu)$ into the model, feed the Reynolds number $Re = \frac{vL}{\nu}$ that the underlying physics actually depends on. The general form is:

$$\mathbf{x}_{aug} = [\ \mathbf{x}_{raw},\ f_{1}(\mathbf{x}),\ f_{2}(\mathbf{x}),\ \ldots,\ f_{k}(\mathbf{x})\ ]$$

where each $f_{i}$ is a physically meaningful transformation (dimensionless groups, conservation-law derived quantities, characteristic time/length scales, etc.) [4].

Strengths. Very low implementation cost; model-agnostic (works with any regressor); improves interpretability and extrapolation; reduces required training data.

Weaknesses. Requires strong domain expertise; cannot overcome fundamental lack of information; does not guarantee output plausibility.

Example. In a semiconductor MOSFET threshold-voltage prediction task, adding $V_{GS} – V_{TH}$, $\sqrt{V_{DS}}$, and $\frac{W}{L}$ (derived from the square-law transistor model) as features typically reduces MAPE by 30-50% compared to feeding $V_{GS}, V_{DS}, W, L$ raw.

2) Equivariant / Invariant Networks

Principle. Embed symmetries of the physical system (rotation, translation, Galilean invariance) directly into network architecture such that:

$$f(g \cdot \mathbf{x}) = g \cdot f(\mathbf{x}) \quad \forall g \in G$$

for a symmetry group $G$. Hamiltonian Neural Networks (HNN) [5] go further by parameterizing the Hamiltonian $H_{\theta}$ and using Hamilton’s equations to predict dynamics:

$$\frac{d\mathbf{q}}{dt} = \frac{\partial H_{\theta}}{\partial \mathbf{p}}, \quad \frac{d\mathbf{p}}{dt} = -\frac{\partial H_{\theta}}{\partial \mathbf{q}}$$

This guarantees energy conservation by construction.

Strengths. Conservation laws enforced exactly (not softly); excellent generalization from very small data; physically plausible forever.

Weaknesses. Architecture design requires deep physics knowledge; implementation complex; only applicable when symmetries are explicitly known.

Example. Predicting three-body orbital dynamics — a standard MLP diverges within a few orbital periods, whereas an HNN remains stable for hundreds of periods with the same training data.

3) Kolmogorov-Arnold Networks — PIKAN (2024–2025)

Principle. KAN [6] replaces fixed activations + learnable weights of MLPs with learnable univariate activation functions on the edges, based on the Kolmogorov-Arnold representation theorem:

$$f(x_{1},\ \ldots,\ x_{n}) = \sum_{q=1}^{2n+1} \Phi_{q} \left( \sum_{p=1}^{n} \phi_{q,p}(x_{p}) \right)$$

PIKAN [7] replaces the MLP in PINNs with a KAN, and KINN [8] / SPIKAN [9] extend this for computational mechanics and higher dimensions respectively. KAN Meets Science (Phys. Rev. X, Dec 2025) [10] established KANs as tools for scientific discovery — identifying features, modular structures, and symbolic formulas.

Strengths. 99% accuracy in most PDEs with fewer parameters than PINNs; superior interpretability (the learned $\phi$ functions can be inspected as symbolic expressions); better convergence for multi-scale and singular problems.

Weaknesses. Slower per-iteration training; B-spline basis tuning required; not yet mature for industrial deployment.

Example. Solving the 2D Poisson equation with a stress concentration — PIKAN with Chebyshev basis achieves comparable accuracy to a PINN with 50% fewer parameters and faster convergence.

4) PINNs (Physics-Informed Neural Networks, Raissi et al. 2019)

Principle. The network $u_{\theta}(\mathbf{x}, t)$ approximates the solution of a PDE $\mathcal{N}[u] = 0$, and the physics residual is added as a soft penalty to the loss [11]:

$${ \mathcal{L}(\theta) = \frac{1}{N_{d}} \sum_{i=1}^{N_{d}} (u_{\theta}(\mathbf{x}_{i}, t_{i}) – u_{i})^{2} + \lambda \frac{1}{N_{r}} \sum_{j=1}^{N_{r}} (\mathcal{N}[u_{\theta}](\mathbf{x}_{j}, t_{j}))^{2} + \mu \mathcal{L}_{BC/IC} }$$

Here the first term is the data loss, the second is the physics residual, and the third enforces boundary / initial conditions. Derivatives $\frac{\partial u}{\partial t}$ and $\nabla^{2} u$ appearing in $\mathcal{N}[u]$ are computed via automatic differentiation. Recent variants include XPINN (domain decomposition, 2021), Self-Adaptive PINN (per-point $\lambda$, 2021), and data-free finite-volume / finite-element PINNs (2024–2025).

Strengths. Works with very little or even zero labeled data (when physics is fully known); unified framework for forward and inverse problems; naturally handles irregular domains.

Weaknesses. Notoriously hard to train (loss balancing is fragile); spectral bias toward smooth solutions (struggles with high-frequency / shock problems); scaling to high dimensions remains challenging.

Example. Inverse problem — given sparse noisy temperature sensors, a PINN can simultaneously reconstruct the full 3D temperature field and identify the unknown thermal diffusivity $\alpha$, all while respecting the heat equation $\frac{\partial T}{\partial t} = \alpha \nabla^{2} T$.

5) Neural Operators — DeepONet / FNO and Beyond (2020–2025)

Principle. Rather than learning a single function, neural operators learn mappings between function spaces $\mathcal{G}: f \mapsto u$. DeepONet [12] factorizes this as:

$$\mathcal{G}{\theta}(f)(\mathbf{y}) = \sum{k=1}^{p} b_{k}(f(\mathbf{x}{1}),\ \ldots,\ f(\mathbf{x}{m})) \cdot t_{k}(\mathbf{y})$$

where $b_{k}$ is the branch network and $t_{k}$ is the trunk network. FNO [13] parameterizes the integral kernel in Fourier space:

$$(\mathcal{K} v)(\mathbf{x}) = \mathcal{F}^{-1} \left( R_{\theta} \cdot \mathcal{F}(v) \right)(\mathbf{x})$$

Adding a physics residual yields PI-DeepONet and PINO (Physics-Informed Neural Operator). Notable recent advances include Mamba Neural Operator (NeurIPS 2024) [14] applying state-space models to PDEs at lower cost than transformers, PI-GANO (CMAME 2025) [15] achieving less than 3% error across diverse geometries, PI-Latent-NO (2025) [16] reducing training time by 15–67% via coupled latent DeepONets, and Temporal Neural Operator (TNO, Sep 2025) [17] enabling reliable temporal extrapolation.

Strengths. Once trained, inference is orders of magnitude faster than numerical solvers; generalizes across entire families of PDEs (different initial/boundary conditions, parameters); handles operator learning on sparse inputs.

Weaknesses. Vanilla versions require large training datasets from expensive simulations; FNO struggles on complex geometries; temporal extrapolation remains a research frontier.

Example. Training FNO on 1,000 Darcy-flow solutions enables instant (sub-second) inference of flow fields for arbitrary new permeability fields — replacing hours-long FEM runs.

6) Hybrid Modeling

Principle. Combine a physics-based model $\hat{y}_{phys}$ with an ML model $\hat{y}_{ML}$ [3]. The most common patterns are:

Residual Modeling:

$$\hat{y} = \hat{y}_{phys}(\mathbf{x}) + f_{\theta}(\mathbf{x})$$

The physics model provides the backbone (validated, explainable); the ML model learns the residual (unmodeled effects, manufacturing variations).

Parameter Estimation:

$$\hat{y} = \hat{y}_{phys}(\mathbf{x};\ \boldsymbol{\psi}_{\theta}(\mathbf{x}))$$

The physics structure is fixed; ML predicts the physics parameters conditional on the operating point.

Strengths. Retains the physics model’s reliability and interpretability; ML component is small and easy to validate; works well when partial physics knowledge is available.

Weaknesses. Requires an existing physics model; performance is capped by the quality of the physics model’s structural assumptions.

Example. Lithium-ion battery State-of-Charge estimation — an electrochemical model (P2D) computes a baseline; a neural residual learns temperature- and aging-dependent corrections that the P2D model does not capture.

7) Constrained Optimization

Principle. Enforce hard constraints on the model output so predictions physically cannot violate them. For a prediction $\hat{y}$ that must satisfy $g(\hat{y}) \leq 0$ and $h(\hat{y}) = 0$:

Projection approach:

$$\hat{y}{final} = \arg\min{z}\ (z – \hat{y})^{\top} (z – \hat{y}) \quad \text{s.t.} \quad g(z) \leq 0,\ h(z) = 0$$

Augmented Lagrangian approach:

$$\mathcal{L}(\theta,\ \boldsymbol{\lambda}) = \mathcal{L}{data}(\theta) + \boldsymbol{\lambda}^{\top} g(\hat{y}{\theta}) + \frac{\rho}{2}\ g(\hat{y}{\theta})^{\top} g(\hat{y}{\theta})$$

Modern implementations use differentiable convex-optimization layers (cvxpylayers) [18], allowing constraint-respecting projections to be embedded directly in neural networks with gradients flowing through them.

Strengths. Guarantees physical feasibility (unlike soft penalties in PINNs); essential for safety-critical applications; simplifies downstream validation.

Weaknesses. Limited expressivity of constraints (must be convex or tractable); projection solve adds inference cost; difficult for equality constraints involving nonlinear dynamics.

Example. In a power-grid dispatch ML model, enforcing $P_{min} \leq P_{gen} \leq P_{max}$ and $\sum P_{gen} = P_{load}$ via a projection layer guarantees feasible dispatch no matter how the upstream network is trained.

8) Method Comparison Table

MethodPhysics InjectionData RequirementExtrapolationReliability GuaranteeMaturity
Feature EngineeringDataLowFairNoneVery High
Equivariant / HNNArchitectureVery LowExcellentConservation lawsHigh
PINN (2019)LossVery LowGoodSoftHigh
PIKAN (2024)Architecture+LossLowGoodSoftMedium
DeepONet / FNO (2020–21)Data (+ loss)HighMediumSoftHigh
PI-Latent-NO (2025)Architecture+LossMediumGoodSoftLow
Mamba Neural Op (2024)ArchitectureMediumGoodSoftLow
Hybrid (Residual)ExternalMediumGoodInherited from physicsVery High
Constrained Optim.OutputFlexibleExcellentHardMedium

9) Practical Considerations

These approaches are particularly effective in the following situations:

Data Scarcity — When experimental cost is high and training data is limited. PINNs and hybrid models can operate with 10-100x less data than pure ML, because physics regularizes the solution space. In a semiconductor device characterization task, obtaining 10,000 labeled TCAD samples may take weeks, whereas a PINN can work with 100 samples by leveraging the device’s drift-diffusion equations.

Reliability-Critical Applications — When physically implausible outputs are unacceptable, such as semiconductor device characterization or process optimization. A purely data-driven model might predict a negative threshold voltage for a p-channel MOSFET or a gate leakage below the thermal noise floor. Using an equivariant architecture, output constraints, or a hybrid residual model eliminates this risk.

Extrapolation — When predictions must be made outside the training data range. Pure ML models typically fail catastrophically in the extrapolation regime. Physics-informed methods that embed conservation laws or governing equations directly into architecture (HNN, equivariant networks) or constraints maintain validity in the unexplored region, because physical laws hold universally regardless of training data distribution.

Hybrid Observations — When first-principles models exist but are incomplete (missing unmodeled dynamics, calibration drift, manufacturing variability). Hybrid residual modeling is ideal: the physics handles the bulk of the behavior, the ML handles the unknown delta.

10) Practical Recommendations

Start with the cheapest intervention — physics-based feature engineering almost always provides value and costs little, so try this first before moving to PINNs or neural operators. Match the method to the problem: known PDE with small data calls for a PINN; operator learning (solving many similar PDEs) calls for DeepONet/FNO; known conservation laws call for equivariant architectures; reliability-critical outputs call for constrained-optimization layers. Stay current with 2024–2025 advances: KANs and PIKANs offer real interpretability advantages, Mamba-based neural operators scale better than transformers, and latent neural operators cut training time substantially. Finally, combine approaches — the strongest real systems usually combine categories (e.g., a hybrid model with physics-based features feeding a PINN whose output is projected onto a feasibility set).

References

[1] Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L. (2021). Physics-informed machine learning. Nature Reviews Physics, 3, 422–440.
[2] Hao, Z., Liu, S., Zhang, Y., Ying, C., Feng, Y., Su, H., Zhu, J. (2023). Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications. arXiv:2211.08064.
[3] Willard, J., Jia, X., Xu, S., Steinbach, M., Kumar, V. (2023). Integrating Scientific Knowledge with Machine Learning for Engineering and Environmental Systems. ACM Computing Surveys, 55(4).
[4] von Rueden, L., Mayer, S., Beckh, K., et al. (2023). Informed Machine Learning — A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE TKDE, 35(1), 614–633.
[5] Greydanus, S., Dzamba, M., Yosinski, J. (2019). Hamiltonian Neural Networks. NeurIPS 2019.
[6] Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T.Y., Tegmark, M. (2024). KAN: Kolmogorov-Arnold Networks. arXiv:2404.19756.
[7] Patra, S., Panda, S., Parida, B.K., Arya, M., Jacobs, K., Bondar, D.I., Sen, A. (2024). Physics Informed Kolmogorov-Arnold Neural Networks for Dynamical Analysis via Efficient-KAN and WAV-KAN. arXiv:2407.18373.
[8] Wang, Y., Sun, J., Bai, J., Anitescu, C., Eshaghi, M.S., Zhuang, X., Rabczuk, T., Liu, Y. (2024). Kolmogorov-Arnold-Informed Neural Network (KINN): A Physics-Informed Deep Learning Framework for Solving PDEs Based on KAN. arXiv:2406.11045.
[9] Jacob, B., Howard, A.A., Stinis, P. (2024). SPIKANs: Separable Physics-Informed Kolmogorov-Arnold Networks. arXiv:2411.06286.
[10] Liu, Z., Ma, P., Wang, Y., Matusik, W., Tegmark, M. (2025). KAN 2.0: Kolmogorov-Arnold Networks Meet Science. Physical Review X, 15, 041051.
[11] Raissi, M., Perdikaris, P., Karniadakis, G.E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.
[12] Lu, L., Jin, P., Pang, G., Zhang, Z., Karniadakis, G.E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3, 218–229.
[13] Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., Anandkumar, A. (2021). Fourier Neural Operator for Parametric Partial Differential Equations. ICLR 2021.
[14] Zheng, C., Wang, Z., Ji, S. (2024). Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs. NeurIPS 2024.
[15] Zhong, Y., Meidani, H. (2025). Physics-Informed Geometry-Aware Neural Operator (PI-GANO). Computer Methods in Applied Mechanics and Engineering, 436.
[16] Kontolati, K., Goswami, S., Shields, M.D., Karniadakis, G.E. (2025). Physics-Informed Latent Neural Operator for Real-Time Predictions of Time-Dependent Parametric PDEs. Computer Methods in Applied Mechanics and Engineering.
[17] Michałowska, K., Goswami, S., Karniadakis, G.E., Riemer-Sørensen, S. (2025). Temporal Neural Operator for Modeling Time-Dependent Physical Phenomena. Scientific Reports, 15.
[18] Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z. (2019). Differentiable Convex Optimization Layers. NeurIPS 2019.

Our Score
Click to rate this post!
[Total: 0 Average: 0]
Visited 42 times, 1 visit(s) today

Leave a Comment

Your email address will not be published. Required fields are marked *