Center Alignment Index (CAI): A Novel Metric for Evaluating Data Center Agreement on the 1:1 Line

The Lorentzian Shape of CAI: Sensitivity Analysis and the Tolerance Parameter k

Why CAI Is a Lorentzian Function

The functional form of CAI, $1/(1 + u^2)$, is identical to the normalized Lorentzian (Cauchy kernel) peaked at the origin. This is not an arbitrary design choice but an algebraic consequence of CCC’s structure. The CCC denominator $\sigma_x^2 + \sigma_y^2 + (\mu_x – \mu_y)^2$ is quadratic in the mean difference. When we isolate the location component by fixing the scale ratio $v = 1$, the bias correction factor becomes:

$$C_b\big|_{v=1} = \frac{2}{2 + u^2} = \frac{1}{1 + u^2/2}$$

CAI adopts a sharper variant $1/(1 + u^2)$, but the Lorentzian shape itself is inherited directly from the quadratic denominator of CCC. The reciprocal of any positive-definite quadratic in $u$ necessarily produces a Lorentzian profile. In other words, the Lorentzian form is not chosen for CAI; it is derived from CCC.

This functional form carries several practical advantages. First, the peak value is exactly 1 at $u = 0$ and the function decays monotonically toward 0 as $|u| \to \infty$, matching the semantic requirement of an agreement index. Second, compared to a Gaussian kernel $e^{-u^2}$, the Lorentzian exhibits heavier tails, maintaining discriminative power over a wider range of bias values:

$u$	Lorentzian $1/(1+u^2)$	Gaussian $e^{-u^2}$
0.0	1.000	1.000
1.0	0.500	0.368
2.0	0.200	0.018
3.0	0.100	0.0001

At $u = 2$, the Gaussian has essentially collapsed to zero (0.018), whereas the Lorentzian still returns 0.200, providing meaningful gradation. Third, the closed-form inverse $u = \sqrt{1/c – 1}$ for a given CAI value $c$ simplifies error propagation and analytical interpretation.

Sensitivity Concern: Is CAI Too Aggressive?

Despite the heavier tails relative to the Gaussian, a closer inspection of CAI’s behavior reveals a potential problem. Since $u \approx \Delta\mu / \sigma$ when $\sigma_x \approx \sigma_y$, the location shift $u$ can be interpreted as the bias measured in units of the geometric standard deviation. The resulting CAI values drop steeply:

Bias Magnitude	$u$	CAI	Practical Judgment
$0.5\sigma$	0.5	0.800	Acceptable in most domains
$1.0\sigma$	1.0	0.500	Half-power — already harsh
$1.5\sigma$	1.5	0.308	Flagged as poor
$2.0\sigma$	2.0	0.200	Nearly rejected

A mean bias of $1\sigma$ is not uncommon in remote sensing retrievals, climate model outputs, or cross-sensor comparisons. Yet CAI assigns it a score of 0.500, which reads as “50% disagreement.” This severity stems from the fact that CAI uses $u^2$ in the denominator, whereas the original $C_b$ uses $u^2/2$, effectively doubling the penalization rate.

Introducing the Tolerance Parameter k

To address this, we generalize CAI with a tunable tolerance parameter $k > 0$:

$$CAI_k = \frac{1}{1 + u^2 / k}$$

The parameter $k$ controls the half-power point: CAI$_k = 0.5$ occurs at $u = \sqrt{k}$. A larger $k$ makes the metric more tolerant of bias; a smaller $k$ makes it stricter. The following table illustrates the effect:

$u$	$k=1$ (strict)	$k=2$ (moderate)	$k=4$ (tolerant)
0.5	0.800	0.889	0.941
1.0	0.500	0.667	0.800
2.0	0.200	0.333	0.500
3.0	0.100	0.182	0.308

Setting $k = 2$ exactly recovers the decay rate of Lin’s $C_b|_{v=1}$, providing a theoretically grounded default. Setting $k = 4$ shifts the half-power point to $2\sigma$, which may be appropriate for domains where moderate systematic bias is operationally acceptable.

The recommended notation is $CAI_k$ or $CAI(u;\,k)$, with the following formal definition:

$CAI_k = 1/(1 + u^2/k)$, where $k > 0$ is the tolerance parameter governing sensitivity to location bias. The half-power point occurs at $u = \sqrt{k}$.

Benchmarking: Tunable Parameters in AI/ML Evaluation Metrics

The introduction of a tolerance parameter into an evaluation metric is a well-established practice in AI/ML. Several widely used metrics follow the same design principle: a core formula whose behavior is modulated by one or more parameters that encode domain-specific cost structures.

$F_\beta$ Score is perhaps the most familiar example. The formula $F_\beta = (1+\beta^2) \cdot PR / (\beta^2 P + R)$ uses $\beta^2$ to weight recall relative to precision. At $\beta = 1$, precision and recall contribute equally ($F_1$). In medical screening, $\beta = 2$ is standard because missing a true positive (false negative) is far more costly than a false alarm. In spam filtering, $\beta = 0.5$ prioritizes precision because users strongly dislike legitimate emails being discarded.

Focal Loss addresses extreme class imbalance in object detection through $FL = -\alpha_t (1 – p_t)^\gamma \log(p_t)$. The focusing parameter $\gamma$ down-weights easy examples: at $\gamma = 0$ the formula reduces to standard cross-entropy, while at $\gamma = 2$ (the recommended default from the RetinaNet paper) a well-classified sample with $p_t = 0.9$ contributes only $(0.1)^2 = 1\%$ of its original loss. This single parameter transforms the loss landscape from one dominated by easy negatives to one focused on hard examples.

Tversky Index generalizes the Dice and Jaccard coefficients for image segmentation: $TI = TP / (TP + \alpha \cdot FP + \beta \cdot FN)$. Setting $\alpha = \beta = 0.5$ recovers Dice; $\alpha = \beta = 1$ recovers Jaccard. In medical image segmentation for small lesion detection, practitioners set $\beta > \alpha$ to penalize false negatives more heavily, reducing the risk of missing pathological regions.

Huber Loss smoothly interpolates between MSE and MAE via the threshold $\delta$: quadratic for $|r| \leq \delta$, linear beyond. As $\delta \to \infty$, Huber converges to MSE; as $\delta \to 0$, it converges to MAE. The parameter $\delta$ encodes how much residual magnitude the practitioner considers “normal” versus “outlier.”

Quantile Loss uses $\tau \in (0,1)$ to impose asymmetric penalties on over- and under-prediction. At $\tau = 0.5$ it equals MAE; at $\tau = 0.9$ it penalizes under-prediction nine times more than over-prediction. This is essential for constructing prediction intervals and risk-aware forecasts.

Common Design Pattern

These metrics share a structural template that CAI$_k$ follows precisely:

Metric	Parameter	Controls	Recovers At
$CAI_k$	$k$	Bias tolerance	$C_b$
$F_\beta$	$\beta$	Recall vs. Precision weight	$F_1$ at $\beta=1$
Focal Loss	$\gamma$	Easy-sample suppression	CE at $\gamma=0$
Tversky Index	$\alpha,\beta$	FP vs. FN cost	Dice at $\alpha=\beta=0.5$
Huber Loss	$\delta$	Outlier tolerance	MSE at $\delta\to\infty$
Quantile Loss	$\tau$	Over/under-prediction asymmetry	MAE at $\tau=0.5$

In every case, (1) a specific parameter value recovers a well-known existing metric, (2) the parameter encodes domain-specific cost or tolerance, and (3) the core mathematical structure remains unchanged across parameter values. CAI$_k$ inherits all three properties: $k=2$ recovers $C_b|_{v=1}$, $k$ reflects the domain’s acceptable bias level, and the Lorentzian form is preserved for all $k > 0$.

Our Score

Click to rate this post!

[Total: 0 Average: 0]

Visited 27 times, 1 visit(s) today

Pages: 1 2

Center Alignment Index (CAI): A Novel Metric for Evaluating Data Center Agreement on the 1:1 Line

The Lorentzian Shape of CAI: Sensitivity Analysis and the Tolerance Parameter k

Why CAI Is a Lorentzian Function

Sensitivity Concern: Is CAI Too Aggressive?

Introducing the Tolerance Parameter k

Benchmarking: Tunable Parameters in AI/ML Evaluation Metrics

Common Design Pattern

Leave a Comment Cancel reply

Visitor

Post

About Me

Contact