Impact of Target Variable Ranges on Model Performance

1. Analysis of Small Positive Ranges (0 to 0.2)
While a target range of 0 to 0.2 is mathematically valid, it presents several practical challenges in model training and optimization.
1.1 Training Speed and Convergence Issues
- Small Loss: Since the discrepancy between the predicted and actual values is minimal, the resulting output of the Loss Function is also very small.
- Small Gradient (Gradient Vanishing): When the loss value is small, the gradient of the loss with respect to the weights ($W$) becomes significantly diminished.
- Slow Weight Update: Weights are updated according to the formula $W = W – (\eta \times \text{Gradient})$. If the gradient is too small, the weight updates become negligible.
- Slow Convergence: The speed at which weights approach the minimum (optimal point) slows down drastically, leading to excessively long training times or the model failing to reach an optimized state.
- Solution: Apply Min-Max Scaling to expand the range to [0, 1] during training, then perform an Inverse Transform (multiply by 0.2) for the final prediction.
1.2 Loss Function Sensitivity
- MSE (Mean Squared Error): Since errors are squared, an error of 0.1 becomes 0.01. Small loss values might trigger early stopping prematurely, as the model “perceives” it has already converged.
- MAE (Mean Absolute Error): In small-scale data, MAE often provides a more intuitive representation of the physical error than MSE.
1.3 Compatibility with Activation Functions
- Sigmoid: While the output range [0, 1] covers 0.2, the model may fail to utilize the non-linear characteristics of the function if values are concentrated in a narrow band.
- ReLU/Linear: In regression, a Linear output is standard. However, logic to prevent negative outputs may be necessary if the target is strictly positive.
2. Analysis of Negative Ranges (-1 to 0)
Targeting a range of -1 to 0 introduces unique constraints, particularly for regression models.
2.1 Constraints on Loss Functions
- Inapplicability of MAPE: MAPE cannot be calculated because the denominator (target) includes zero or negative values, leading to division by zero or distorted results.
- Inapplicability of Log-based Loss: Metrics like RMSLE are undefined for non-positive targets.
- Solution: Utilize MSE or MAE as the primary loss functions.
2.2 Activation Function Mismatch
- Sigmoid/Softmax: Cannot be used as their output range is [0, 1].
- ReLU: Cannot be used in the output layer as it zeros out all negative values.
- Solution: Use a Linear output layer or Tanh (range: -1 to 1) if boundary constraints are required.
2.3 Physical Interpretation and Challenges
In semiconductor FDC data, negative targets often represent deltas from a baseline or log-transformed values. These require careful handling to maintain physical meaning.
3. Practical Recommendations (Semiconductor VM/FDC Context)
For high-precision tasks such as thickness prediction or fault detection:
- Mandatory Scaling: Always use scaling (Min-Max or Standardization) internally. Learning the difference between 0.1 and 0.2 is numerically more stable for a model than learning the difference between 0.001 and 0.002.
- Precision Verification: Ensure the use of
float32or higher. Minimal value fluctuations can be lost due to precision limitations in lower-bit formats. - Monitor Relative Error: Alongside absolute loss, track MAPE (for positive ranges) or relative percentage errors to ensure predictions meet actual process specifications.
4. Summary: Does the Target Range Matter?
- Theoretically: No. (The computer treats them as mere numerical values.)
- Convergence Speed: Yes. (Small gradients may result in sluggish learning.)
- Evaluation Metrics: Yes. (Calculating relative errors becomes difficult or skewed.)
Our Score
Click to rate this post!
[Total: 0 Average: 0]
Visited 11 times, 1 visit(s) today
