|

Mean, Variance, and Agreement Metrics for Regression in AI/ML

Regression metrics
├── Mean-based
│   ├── Scale-dependent   → MAE, MSE, RMSE, Huber
│   └── Scale-independent → MPE, MAPE, SMAPE, CV(RMSE)
├── Variance-based
│   └── Scale-independent → R², Adj. R²
└── Mean+Variance-based (Hybrid)
    └── Scale-independent → CCC, KGE

1. Executive Summary

In advanced engineering domains—such as semiconductor manufacturing, virtual metrology, and multi-sensor time-series analysis—the validation of predictive models requires more than a single performance score. We must distinguish between how well a model tracks a trend (Precision) versus how close the prediction is to the physical truth (Accuracy).

This report establishes a rigorous taxonomy for evaluation metrics, categorized into Variance-based, Mean-based, and Agreement-based indices. This hierarchy provides a systematic framework for interpreting model performance relative to the ideal $y=x$ (1:1) line, with a specific focus on the risks associated with the Low Variance Effect.

2. Metric Hierarchy and Detailed Analysis

I. Variance Index (Trend & Correlation Focus)

These metrics assess the linearity and the strength of the relationship between observed and predicted values. They focus on whether the model captures the “shape” of the data, regardless of absolute magnitude.

1. Pearson Correlation Coefficient ($r$)

  • Formula: $$r = \frac{\sum (y_i – \mu_y)(\hat{y_i} – \mu_{\hat{y}})}{\sqrt{\sum (y_i – \mu_y)^2 \sum (\hat{y_i} – \mu_{\hat{y}})^2}}$$
    where $y_i$: Observed ground truth value,
       $\hat{y}_i$: Predicted value from the model,
       $\mu_y, \mu_{\hat{y}}$: Means of observed and predicted values respectively
  • Relation to 1:1 Line: $r$ measures how tightly data clusters around any straight line. A perfect $r=1$ does not guarantee the data is on the $1:1$ line; it could be on $y = 2x + 10$.
  • Limitation (Low Variance Effect): If the data has very low variance (e.g., a sensor outputting a nearly constant value), the denominator approaches zero. This makes $r$ extremely sensitive to tiny amounts of noise, often resulting in a low or undefined correlation despite the prediction being physically close to the truth.
  • Application: Initial feature selection and identifying sensors with similar behavioral patterns.

2. Coefficient of Determination ($R^2$)

  • Formula: $$R^2 = 1 – \frac{SS_{res}}{SS_{tot}} = 1 – \frac{\sum (y_i – \hat{y}_i)^2}{\sum (y_i – \mu_y)^2}$$
    where $SS_{res}$: Residual sum of squares (unexplained variance),
       $SS_{tot}$: Total sum of squares (total variance in data)
  • Relation to 1:1 Line: Represents the proportion of variance explained by the model. While it penalizes distance from the 1:1 line more than $r$, it can still be misleading if the model is systematically biased.
  • Limitation (Low Variance Effect): $R^2$ is notoriously deceptive when target variance is low. Because the denominator ($SS_{tot}$) is small, even a tiny prediction error can result in a negative or near-zero $R^2$, suggesting a “bad” model even when the absolute error (RMSE) is within acceptable engineering tolerances.
  • Application: Standard benchmark for regression model explanatory power in manufacturing yield analysis.

3. Explained Variance Score

  • Formula: $$ExpVar = 1 – \frac{Var(y – \hat{y})}{Var(y)}$$
    where $Var(y – \hat{y})$: Variance of the residuals,
       $Var(y)$: Variance of the ground truth
  • Relation to 1:1 Line: Similar to $R^2$, but it ignores the mean of the residuals. It focuses purely on whether the fluctuations in the prediction match the fluctuations in the truth.
  • Limitation (Low Variance Effect): Like $R^2$, this metric collapses when $Var(y)$ is small. It fails to provide a meaningful score for stable processes where the goal is to maintain a constant setpoint.
  • Application: Signal processing where the relative change is more important than the absolute baseline.

II. Mean Index (Magnitude & Distance Focus)

These metrics measure the physical distance between the predicted vector and the ground truth. They are essential for understanding the actual “cost” of an error.

1. Mean Absolute Error (MAE)

  • Formula: $$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i – \hat{y}_i|$$
    where $n$: Number of samples
  • Relation to 1:1 Line: The average vertical distance to the 1:1 line.
  • Limitation: Does not highlight large, infrequent errors; it treats all deviations linearly. It is unaffected by the Low Variance Effect, making it more reliable for stable processes.
  • Application: Situations where the error cost is strictly proportional to error magnitude.

2. Mean Squared Error (MSE) / RMSE

  • Formula: $$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2, \quad RMSE = \sqrt{MSE}$$
  • Relation to 1:1 Line: The average of the squared distances to the 1:1 line. RMSE represents the “typical” distance in original units.
  • Limitation: Heavily influenced by outliers. While robust to low variance in the target, these metrics do not tell you if the model is capturing the trend of the data.
  • Application: Standard loss function for training; critical for thickness prediction where large deviations lead to wafer scrap.

3. Mean Percentage Error (MPE) / Mean Absolute Percentage Error (MAPE)

  • Formula: $$MPE = \frac{100%}{n} \sum_{i=1}^{n} \frac{y_i – \hat{y}i}{y_i}, \quad MAPE = \frac{100%}{n} \sum_{i=1}^{n} \left| \frac{y_i – \hat{y}_i}{y_i} \right|$$
  • Relation to 1:1 Line: These metrics evaluate the relative deviation from the 1:1 line. MPE measures the average percentage bias (whether the model consistently overestimates or underestimates), while MAPE represents the average magnitude of percentage error relative to the identity line.
  • Limitation: The most significant weakness is the “division by zero” or “near-zero” problem; if the target value is zero or very small, the metrics explode to infinity. Additionally, MAPE is asymmetric, as it penalizes overestimations more heavily than underestimations in certain contexts. MAPE is highly sensitive to the magnitude of the actual values, where errors are heavily penalized (or amplified) as the denominator approaches zero. The primary weakness of MAPE is its dependence on a variable denominator ($y_i$). However, if the data scale remains consistent across all points, the denominator functions effectively as a constant. In such cases, MAPE becomes a stable metric, behaving similarly to MSE or RMSE by scaling linearly with absolute error. Unlike CCC, they do not distinguish between scale shifts and location shifts.
  • Application: Essential for communicating model performance to non-technical stakeholders in business terms; widely used in financial forecasting and yield management to understand the “percentage of accuracy” relative to the target thickness or price.

4. CV(RMSE) (Coefficient of Variation of RMSE)

  • Formula: $$CV(RMSE) = \frac{RMSE}{\mu_y}$$
  • Relation to 1:1 Line: Normalizes the error by the mean.
  • Limitation (Low Variance Effect): If the mean ($\mu_y$) is near zero, this metric explodes. However, it is generally more stable than $R^2$ for low-variance, non-zero datasets.
  • Application: Comparing model performance across different sensor types with varying scales.

III. Agreement Index (Fidelity & Calibration Focus)

These evaluate Fidelity: the requirement that the model must follow the trend and match the absolute values simultaneously.

1. Lin’s Concordance Correlation Coefficient (CCC)

  • Formula: $$\rho_c = \frac{2 \rho \sigma_y \sigma_{\hat{y}}}{\sigma_y^2 + \sigma_{\hat{y}}^2 + (\mu_y – \mu_{\hat{y}})^2}$$
    where $\rho$: Pearson correlation coefficient,
       $\sigma_y, \sigma_{\hat{y}}$: Standard deviations of observed and predicted values
  • Relation to 1:1 Line: Directly measures how far the data deviates from the 45-degree line. It combines $r$ (precision) with a bias penalty (accuracy).
  • Limitation (Low Variance Effect): Since $\rho$ (Pearson) is a component, CCC will also decrease if the variance of the data is extremely low, potentially masking a model that is actually performing well in terms of absolute distance.
  • Application: Validating new metrology sensors against gold-standard lab measurements.

2. Kling-Gupta Efficiency (KGE)

  • Formula: $$KGE = 1 – \sqrt{(r-1)^2 + (\alpha-1)^2 + (\beta-1)^2}$$
    where $r$: Pearson correlation,
       $\alpha = \sigma_{\hat{y}}/\sigma_y$ (Variability ratio),
       $\beta = \mu_{\hat{y}}/\mu_y$ (Bias ratio)
  • Relation to 1:1 Line: A holistic “Agreement” metric. Reaches 1.0 only if $r, \alpha, \beta$ are all 1.
  • Limitation (Low Variance Effect): Extremely sensitive to the variability ratio ($\alpha$). If the ground truth has nearly zero variance ($\sigma_y \approx 0$), $\alpha$ becomes undefined or unstable, causing the KGE to fail.
  • Application: Complex industrial process control and high-fidelity 시계열 (time-series) simulation.

3. Comparative Summary Table

CategoryPrimary FocusBest Use CaseRelation to $y=x$Low Variance Effect
Variance-basedTrend/PatternFeature selectionHigh score if linear,
even if biased.
High Risk: Scores collapse or become noisy
even if the error is small.
Mean-basedAbsolute ErrorModel Training
(Loss)
0 only if exactly on the line.Robust: Remains stable and interpretable
regardless of variance.
Agreement-basedFidelity/CalibrationSystem Validation1 only if exactly on the line.Moderate Risk: Sensitivity inherited
from the correlation component.

4. Professional Recommendation for Engineering Teams

When deploying AI for semiconductor or sensor infrastructures, never rely on Variance Indices alone. In high-precision manufacturing, sensors often operate within a very tight, stable range (Low Variance). In these cases, Pearson and $R^2$ will suggest the model is failing, when in fact it may be predicting within sub-micron accuracy.

Standard Protocol:

  1. Use RMSE/MAE as the primary source of truth in low-variance environments.
  2. Use Agreement Indices (CCC/KGE) for system-wide validation only when the data range is sufficient.
  3. Always check the Low Variance Effect before interpreting a drop in $R^2$—it is often a mathematical artifact rather than a loss of predictive power.
Our Score
Click to rate this post!
[Total: 1 Average: 3]
Visited 48 times, 1 visit(s) today

Leave a Comment

Your email address will not be published. Required fields are marked *