|

Impact of Target Variable Ranges on Model Performance

1. Analysis of Small Positive Ranges (0 to 0.2)

While a target range of 0 to 0.2 is mathematically valid, it presents several practical challenges in model training and optimization.

1.1 Training Speed and Convergence Issues

  • Small Loss: Since the discrepancy between the predicted and actual values is minimal, the resulting output of the Loss Function is also very small.
  • Small Gradient (Gradient Vanishing): When the loss value is small, the gradient of the loss with respect to the weights ($W$) becomes significantly diminished.
  • Slow Weight Update: Weights are updated according to the formula $W = W – (\eta \times \text{Gradient})$. If the gradient is too small, the weight updates become negligible.
  • Slow Convergence: The speed at which weights approach the minimum (optimal point) slows down drastically, leading to excessively long training times or the model failing to reach an optimized state.
  • Solution: Apply Min-Max Scaling to expand the range to [0, 1] during training, then perform an Inverse Transform (multiply by 0.2) for the final prediction.

1.2 Loss Function Sensitivity

  • MSE (Mean Squared Error): Since errors are squared, an error of 0.1 becomes 0.01. Small loss values might trigger early stopping prematurely, as the model “perceives” it has already converged.
  • MAE (Mean Absolute Error): In small-scale data, MAE often provides a more intuitive representation of the physical error than MSE.

1.3 Compatibility with Activation Functions

  • Sigmoid: While the output range [0, 1] covers 0.2, the model may fail to utilize the non-linear characteristics of the function if values are concentrated in a narrow band.
  • ReLU/Linear: In regression, a Linear output is standard. However, logic to prevent negative outputs may be necessary if the target is strictly positive.

2. Analysis of Negative Ranges (-1 to 0)

Targeting a range of -1 to 0 introduces unique constraints, particularly for regression models.

2.1 Constraints on Loss Functions

  • Inapplicability of MAPE: MAPE cannot be calculated because the denominator (target) includes zero or negative values, leading to division by zero or distorted results.
  • Inapplicability of Log-based Loss: Metrics like RMSLE are undefined for non-positive targets.
  • Solution: Utilize MSE or MAE as the primary loss functions.

2.2 Activation Function Mismatch

  • Sigmoid/Softmax: Cannot be used as their output range is [0, 1].
  • ReLU: Cannot be used in the output layer as it zeros out all negative values.
  • Solution: Use a Linear output layer or Tanh (range: -1 to 1) if boundary constraints are required.

2.3 Physical Interpretation and Challenges

In semiconductor FDC data, negative targets often represent deltas from a baseline or log-transformed values. These require careful handling to maintain physical meaning.

3. Practical Recommendations (Semiconductor VM/FDC Context)

For high-precision tasks such as thickness prediction or fault detection:

  • Mandatory Scaling: Always use scaling (Min-Max or Standardization) internally. Learning the difference between 0.1 and 0.2 is numerically more stable for a model than learning the difference between 0.001 and 0.002.
  • Precision Verification: Ensure the use of float32 or higher. Minimal value fluctuations can be lost due to precision limitations in lower-bit formats.
  • Monitor Relative Error: Alongside absolute loss, track MAPE (for positive ranges) or relative percentage errors to ensure predictions meet actual process specifications.

4. Summary: Does the Target Range Matter?

  • Theoretically: No. (The computer treats them as mere numerical values.)
  • Convergence Speed: Yes. (Small gradients may result in sluggish learning.)
  • Evaluation Metrics: Yes. (Calculating relative errors becomes difficult or skewed.)

Our Score
Click to rate this post!
[Total: 0 Average: 0]
Visited 11 times, 1 visit(s) today

Leave a Comment

Your email address will not be published. Required fields are marked *