{"id":6676,"date":"2026-05-27T11:47:23","date_gmt":"2026-05-27T16:47:23","guid":{"rendered":"https:\/\/ykim.synology.me\/wordpress\/?p=6676"},"modified":"2026-05-27T11:55:13","modified_gmt":"2026-05-27T16:55:13","slug":"optuna-metric-projection","status":"publish","type":"post","link":"https:\/\/ykim.synology.me\/wordpress\/optuna-metric-projection-6676\/","title":{"rendered":"Optuna Metric Projection"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"768\" src=\"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px-1024x768.jpg\" alt=\"\" class=\"wp-image-6679\" style=\"width:800px\" srcset=\"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px-1024x768.jpg 1024w, https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px-300x225.jpg 300w, https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px-768x576.jpg 768w, https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003A concise report on projecting <strong>Optuna&#8217;s best-so-far trajectory<\/strong> with four saturation curves. The method estimates the expected best metric after $K$ additional trials (forward) or the trials needed to reach a target $T$ (inverse) without using hyperparameter (HP) coordinates and without the over-optimism that plagues surrogate-based projection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Problem Setting and Motivation<\/h2>\n\n\n<style>.kadence-column6676_d09a30-4c > .kt-inside-inner-col,.kadence-column6676_d09a30-4c > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_d09a30-4c > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_d09a30-4c > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_d09a30-4c > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_d09a30-4c > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_d09a30-4c{position:relative;}.kadence-column6676_d09a30-4c, .kt-inside-inner-col > .kadence-column6676_d09a30-4c:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_d09a30-4c > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_d09a30-4c > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_d09a30-4c\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Optuna (Akiba et al., 2019) is a widely used hyperparameter optimization (HPO) framework. After running $N$ trials, two natural questions arise:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;If we run $K$ more trials, what is the expected best metric?&#8221;<\/li>\n\n\n\n<li>&#8220;How many additional trials are needed to reach a target $T$?&#8221;<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003This report fits a saturation curve to the running maximum of the metric and uses it to answer both questions.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">1.1 Why Empirical Saturation, Not Surrogate Models<\/h3>\n\n\n<style>.kadence-column6676_336f79-1c > .kt-inside-inner-col,.kadence-column6676_336f79-1c > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_336f79-1c > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_336f79-1c > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_336f79-1c > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_336f79-1c > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_336f79-1c{position:relative;}.kadence-column6676_336f79-1c, .kt-inside-inner-col > .kadence-column6676_336f79-1c:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_336f79-1c > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_336f79-1c > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_336f79-1c\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Surrogate-based projection methods \u2014 Gaussian Process (GP) or Prior-Fitted Network (PFN) \u2014 learn an abstract HP\u2192metric model and then simulate $K$ future trials. Two fundamental problems arise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Issue 1 \u2014 TPE&#8217;s active narrowing is not simulated.<\/strong> Optuna&#8217;s default sampler, the Tree-structured Parzen Estimator (TPE), concentrates new samples in a narrow good region (about 1\u20135% of the HP space) once it has converged. GP projection, however, draws $K$ random HP candidates from the entire search space. The two regimes mismatch: surrogate projection includes vast regions TPE will never visit, inflating the predicted max.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Issue 2 \u2014 GP epistemic uncertainty inflates max-of-$K$ statistics.<\/strong> GP has small uncertainty where data is dense, but very wide intervals in unexplored regions. The standard Monte Carlo (MC) routine is:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Draw $K$ HP candidates from the search space.<\/li>\n\n\n\n<li>Sample each metric from the GP posterior (mean \u00b1 std).<\/li>\n\n\n\n<li>Take the maximum of the $K$ samples.<\/li>\n\n\n\n<li>Repeat $M$ times; average the maxima.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Candidates that fall in unexplored regions return draws with very large variance. Max-of-$K$ then preferentially selects these high-tail outliers, producing an over-optimistic projection.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Empirical saturation<\/strong> fits the observed <em>trial number \u2192 best-so-far<\/em> series directly. TPE&#8217;s active learning is already encoded in the trajectory, and no hypothetical exploration of the HP space is assumed. Both surrogate-side problems are avoided automatically.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">1.2 The Best-so-Far Trajectory<\/h3>\n\n\n<style>.kadence-column6676_6d1267-60 > .kt-inside-inner-col,.kadence-column6676_6d1267-60 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_6d1267-60 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_6d1267-60 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_6d1267-60 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_6d1267-60 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_6d1267-60{position:relative;}.kadence-column6676_6d1267-60, .kt-inside-inner-col > .kadence-column6676_6d1267-60:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_6d1267-60 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_6d1267-60 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_6d1267-60\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Sort completed trials by trial number and track the cumulative maximum of the metric:<\/p>\n\n\n\n<pre style=\"font-family: consolas,monospace; font-size: 1.2rem; white-space: pre; line-height:1.2; background-color: #fff; border: none\">\ntrial_num:   1     2     3     4     5     6   ...  N\nmetric:    0.10  0.05  0.18  0.12  0.21  0.15  ... 0.31\n                                              |\n                              running max (cumulative)\n                                              v\nY_n:       0.10  0.10  0.18  0.18  0.21  0.21  ... 0.31\n            -------- monotonically non-decreasing ------>\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Formally, $Y_n = \\max(metric_1, \\ldots, metric_n)$. Equivalently, $Y_n = \\max(Y_{n-1}, metric_n)$. The sequence of points $(n, Y_n)$ is the best-so-far trajectory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Universal properties<\/strong> that saturation models fit:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Monotonically non-decreasing.<\/strong> The running max satisfies $Y_n \\le Y_{n+1}$ for every $n$. This is distinct from <em>strictly increasing<\/em> ($Y_n \\lt Y_{n+1}$); the curve stays flat whenever a new trial fails to update the best.<\/li>\n\n\n\n<li><strong>Diminishing returns.<\/strong> The probability that a new trial improves the running max decreases as $n$ grows.<\/li>\n\n\n\n<li><strong>Asymptotic ceiling $Y_\\infty$.<\/strong> The limit TPE can reach in the given HP space. Estimating $Y_\\infty$ is the core target of this report.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">2. Four Saturation Models<\/h2>\n\n\n<style>.kadence-column6676_810559-c0 > .kt-inside-inner-col,.kadence-column6676_810559-c0 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_810559-c0 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_810559-c0 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_810559-c0 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_810559-c0 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_810559-c0{position:relative;}.kadence-column6676_810559-c0, .kt-inside-inner-col > .kadence-column6676_810559-c0:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_810559-c0 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_810559-c0 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_810559-c0\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003<em>Saturation<\/em> denotes a curve that approaches a ceiling and stops growing. Optuna best-so-far curves typically show three phases: rapid early rise, gradual mid-phase slowdown, and asymptotic plateau near $Y_\\infty$. The four models below all carry $Y_\\infty$ as an explicit parameter but differ in <em>how<\/em> they approach the ceiling.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003The power law is the classical Learning Curve Extrapolation (LCE) form (Mosteller &amp; Tukey, 1977); the remaining three are standard members of the LCE family (Domhan et al., 2015).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Common features<\/strong>: all have 3 parameters (rationale in \u00a73.2), explicit $Y_\\infty$, monotone shape, and fit with <code>scipy.optimize.curve_fit<\/code>.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2.1 Exponential Saturation (exp)<\/h3>\n\n\n<style>.kadence-column6676_7ee340-e9 > .kt-inside-inner-col,.kadence-column6676_7ee340-e9 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_7ee340-e9 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_7ee340-e9 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_7ee340-e9 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_7ee340-e9 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_7ee340-e9{position:relative;}.kadence-column6676_7ee340-e9, .kt-inside-inner-col > .kadence-column6676_7ee340-e9:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_7ee340-e9 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_7ee340-e9 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_7ee340-e9\"><div class=\"kt-inside-inner-col\">\n<p style=\"background-color: #fff; border: none\">$$Y_n = Y_\\infty &#8211; a \\cdot e^{-n \/ \\tau}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Parameters: $Y_\\infty$ (ceiling), $a$ (initial gap), $\\tau$ (e-folding time). Smooth approach to the ceiling; the gap shrinks by a factor $1\/e$ at $n = \\tau$. Best fit when TPE converges quickly and only noise remains in the tail.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2.2 Power Law (power)<\/h3>\n\n\n<style>.kadence-column6676_51ea74-3a > .kt-inside-inner-col,.kadence-column6676_51ea74-3a > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_51ea74-3a > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_51ea74-3a > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_51ea74-3a > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_51ea74-3a > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_51ea74-3a{position:relative;}.kadence-column6676_51ea74-3a, .kt-inside-inner-col > .kadence-column6676_51ea74-3a:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_51ea74-3a > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_51ea74-3a > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_51ea74-3a\"><div class=\"kt-inside-inner-col\">\n<p style=\"background-color: #fff; border: none\">$$Y_n = Y_\\infty &#8211; a \\cdot n^{-c}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Parameters: $Y_\\infty$, $a$, $c$ (decay rate, $c \\gt 0$). Polynomial decay; larger $c$ means faster convergence. The classical learning curve form used widely in machine learning.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2.3 Hill Function (hill)<\/h3>\n\n\n<style>.kadence-column6676_3f3669-b5 > .kt-inside-inner-col,.kadence-column6676_3f3669-b5 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_3f3669-b5 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_3f3669-b5 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_3f3669-b5 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_3f3669-b5 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_3f3669-b5{position:relative;}.kadence-column6676_3f3669-b5, .kt-inside-inner-col > .kadence-column6676_3f3669-b5:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_3f3669-b5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_3f3669-b5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_3f3669-b5\"><div class=\"kt-inside-inner-col\">\n<p style=\"background-color: #fff; border: none\">$$Y_n = Y_\\infty \\cdot \\frac{n^c}{k^c + n^c}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Parameters: $Y_\\infty$, $k$ (half-max point, i.e., $Y_n = Y_\\infty \/ 2$ at $n = k$), $c$ (steepness). An S-shape with a slow start, rapid mid-section, and plateau. Fits runs where TPE has a delayed discovery of a productive region.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">2.4 Logistic (logistic)<\/h3>\n\n\n<style>.kadence-column6676_c6601d-04 > .kt-inside-inner-col,.kadence-column6676_c6601d-04 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_c6601d-04 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_c6601d-04 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_c6601d-04 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_c6601d-04 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_c6601d-04{position:relative;}.kadence-column6676_c6601d-04, .kt-inside-inner-col > .kadence-column6676_c6601d-04:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_c6601d-04 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_c6601d-04 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_c6601d-04\"><div class=\"kt-inside-inner-col\">\n<p style=\"background-color: #fff; border: none\">$$Y_n = \\frac{Y_\\infty}{1 + e^{-a (n &#8211; n_0)}}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Parameters: $Y_\\infty$, $a$ (steepness), $n_0$ (midpoint). A symmetric S-curve with the inflection at $n_0$. Fits trajectories with a clear phase transition.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">3. Auto Mode \u2014 Model Selection<\/h2>\n\n\n<style>.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col,.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_e9c9e9-6f{position:relative;}.kadence-column6676_e9c9e9-6f, .kt-inside-inner-col > .kadence-column6676_e9c9e9-6f:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_e9c9e9-6f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_e9c9e9-6f\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003There is no a-priori best model; the right choice depends on the trajectory shape. The auto mode fits all four candidates and chooses the one with the smallest Akaike Information Criterion (AIC).<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3.1 Procedure<\/h3>\n\n\n<style>.kadence-column6676_8c6f49-ba > .kt-inside-inner-col,.kadence-column6676_8c6f49-ba > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_8c6f49-ba > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_8c6f49-ba > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_8c6f49-ba > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_8c6f49-ba > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_8c6f49-ba{position:relative;}.kadence-column6676_8c6f49-ba, .kt-inside-inner-col > .kadence-column6676_8c6f49-ba:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_8c6f49-ba > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_8c6f49-ba > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_8c6f49-ba\"><div class=\"kt-inside-inner-col\">\n<pre style=\"font-family: consolas,monospace; font-size: 1.2rem; white-space: pre; line-height:1.2; background-color: #fff; border: none\">\nFor each model M in {exp, power, hill, logistic}:\n    1. fit via scipy.optimize.curve_fit(M, n_arr, Y_n, p0, bounds)\n    2. y_pred = M(n_arr, *fitted_params)\n    3. MSE_M  = mean((Y_n - y_pred)^2)\n    4. AIC_M  = N * log(MSE_M) + 2 * k       (k = 3 for all)\n\nSelected = argmin AIC\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003If a model fails to converge it is skipped and the best among successful fits is chosen.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3.2 Selection Criterion \u2014 AIC<\/h3>\n\n\n<style>.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col,.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_cb0c8d-80{position:relative;}.kadence-column6676_cb0c8d-80, .kt-inside-inner-col > .kadence-column6676_cb0c8d-80:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_cb0c8d-80 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_cb0c8d-80\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003The AIC (Akaike, 1974) balances fit quality against model complexity. Under a Gaussian residual assumption (derivation in Appendix B):<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$AIC = N \\cdot \\log(\\mathrm{MSE}) + 2 k$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003where $N$ is the number of data points, MSE is the Mean Squared Error, and $k$ is the number of model parameters. The first term rewards low residual error; the second penalizes complexity. A smaller AIC indicates a better model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Why $k = 3$ for every model.<\/strong> All four saturation models are deliberately constructed with three parameters. This choice (i) gives a fair, non-nested comparison, (ii) avoids overfit (three parameters express the ceiling, curvature, and timescale), and (iii) yields stable nonlinear fits.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Because $2k = 6$ is identical for all four models, the AIC ranking reduces to an MSE ranking (and equivalently to an R\u00b2 ranking) for this specific case:<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$\\arg\\min_i AIC_i = \\arg\\min_i N \\log(\\mathrm{MSE}_i) = \\arg\\min_i \\mathrm{MSE}_i$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Even so, AIC is preferred as the standard naming, and the $2k$ penalty becomes active immediately if a 4-parameter model (e.g., Janoschek) is added later. Appendix A shows why R\u00b2\/MSE is unreliable as soon as $k$ varies between candidates.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">3.3 Sample Output<\/h3>\n\n\n<style>.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col,.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_abf4d3-c5{position:relative;}.kadence-column6676_abf4d3-c5, .kt-inside-inner-col > .kadence-column6676_abf4d3-c5:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_abf4d3-c5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_abf4d3-c5\"><div class=\"kt-inside-inner-col\">\n<pre style=\"font-family: consolas,monospace; font-size: 1.2rem; white-space: pre; line-height:1.2; background-color: #fff; border: none\">\nModel selection (AIC = N*log(MSE) + 2k, k=3):\n  exp       : MSE=7.74e-06, AIC=-7361.33   <- CHOSEN\n  logistic  : MSE=1.22e-05, AIC=-7079.04\n  hill      : MSE=1.99e-05, AIC=-6771.67\n  power     : MSE=2.03e-05, AIC=-6756.43\n\nChosen model: exp (lowest AIC)\nFitted params: Y_inf=0.3218, a=0.0368, tau=338.96\n<\/pre>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">4. Forward and Inverse Projection<\/h2>\n\n\n<style>.kadence-column6676_3cb847-fa > .kt-inside-inner-col,.kadence-column6676_3cb847-fa > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_3cb847-fa > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_3cb847-fa > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_3cb847-fa > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_3cb847-fa > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_3cb847-fa{position:relative;}.kadence-column6676_3cb847-fa, .kt-inside-inner-col > .kadence-column6676_3cb847-fa:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_3cb847-fa > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_3cb847-fa > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_3cb847-fa\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Once a saturation curve $Y = f(n; \\hat{\\theta})$ is fitted, two questions can be answered in opposite directions:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Direction<\/th><th>Input<\/th><th>Output<\/th><th>Plain-English form<\/th><\/tr><\/thead><tbody><tr><td><strong>Forward<\/strong><\/td><td>$K$ (additional trials)<\/td><td>$Y_{N+K}$<\/td><td>\"If we run $K$ more trials, what's the best?\"<\/td><\/tr><tr><td><strong>Inverse<\/strong><\/td><td>$T$ (target metric)<\/td><td>$K_{required}$<\/td><td>\"How many more trials to reach $T$?\"<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Forward is a direct evaluation of $f$; inverse is a root-finding problem on the same curve. Three edge cases bracket the inverse:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>$T \\le Y_{current}$ \u2014 already reached, $K = 0$.<\/li>\n\n\n\n<li>$Y_{current} \\lt T \\lt Y_\\infty$ \u2014 forward and inverse both defined.<\/li>\n\n\n\n<li>$T \\ge Y_\\infty$ \u2014 unreachable; the inverse diverges.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4.1 Forward<\/h3>\n\n\n<style>.kadence-column6676_a641e1-1d > .kt-inside-inner-col,.kadence-column6676_a641e1-1d > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_a641e1-1d > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_a641e1-1d > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_a641e1-1d > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_a641e1-1d > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_a641e1-1d{position:relative;}.kadence-column6676_a641e1-1d, .kt-inside-inner-col > .kadence-column6676_a641e1-1d:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_a641e1-1d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_a641e1-1d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_a641e1-1d\"><div class=\"kt-inside-inner-col\">\n<p style=\"background-color: #fff; border: none\">$$Y_{proj} = f(N + K, \\hat{\\theta})$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Evaluated for each $K$ in the user-supplied list (commonly 50, 100, 500, 1000, 5000, 10000).<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">4.2 Inverse<\/h3>\n\n\n<style>.kadence-column6676_ea5497-2d > .kt-inside-inner-col,.kadence-column6676_ea5497-2d > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_ea5497-2d > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_ea5497-2d > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_ea5497-2d > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_ea5497-2d > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_ea5497-2d{position:relative;}.kadence-column6676_ea5497-2d, .kt-inside-inner-col > .kadence-column6676_ea5497-2d:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_ea5497-2d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_ea5497-2d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_ea5497-2d\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Solve $f(N + K, \\hat{\\theta}) = T$ for $K$. Each model admits a closed-form inverse:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Model<\/th><th>Inverse<\/th><\/tr><\/thead><tbody><tr><td>Exponential<\/td><td>$K = -\\tau \\log\\!\\left(\\frac{Y_\\infty - T}{a}\\right) - N$<\/td><\/tr><tr><td>Power law<\/td><td>$K = \\left(\\frac{a}{Y_\\infty - T}\\right)^{1\/c} - N$<\/td><\/tr><tr><td>Hill<\/td><td>$K = \\left(\\frac{k^c \\, T}{Y_\\infty - T}\\right)^{1\/c} - N$<\/td><\/tr><tr><td>Logistic<\/td><td>$K = n_0 - \\frac{1}{a} \\log\\!\\left(\\frac{Y_\\infty - T}{T}\\right) - N$<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003In practice, a unified <code>scipy.optimize.brentq<\/code> call covers all four models with the same dispatcher.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">5. Python Code<\/h2>\n\n\n<style>.kadence-column6676_601acd-88 > .kt-inside-inner-col,.kadence-column6676_601acd-88 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_601acd-88 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_601acd-88 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_601acd-88 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_601acd-88 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_601acd-88{position:relative;}.kadence-column6676_601acd-88, .kt-inside-inner-col > .kadence-column6676_601acd-88:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_601acd-88 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_601acd-88 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_601acd-88\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003The full reference implementation. Python 3.8+, depends only on <code>numpy<\/code>, <code>scipy<\/code>, and <code>optuna<\/code>.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">5.1 Full Implementation<\/h3>\n\n\n<style>.kadence-column6676_2046ab-26 > .kt-inside-inner-col,.kadence-column6676_2046ab-26 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_2046ab-26 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_2046ab-26 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_2046ab-26 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_2046ab-26 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_2046ab-26{position:relative;}.kadence-column6676_2046ab-26, .kt-inside-inner-col > .kadence-column6676_2046ab-26:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_2046ab-26 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_2046ab-26 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_2046ab-26\"><div class=\"kt-inside-inner-col\"><div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n&quot;&quot;&quot;Optuna best-so-far saturation projection.\n\nImportable module + standalone CLI for forward projection\n(expected best after K more trials) and inverse projection\n(trials needed to reach a target T).\n&quot;&quot;&quot;\nimport argparse\nimport numpy as np\nfrom scipy.optimize import curve_fit, brentq\n\n\n# --- Four saturation models (all 3-parameter, asymptotic Y_inf) ---\n\ndef _model_exp(n, Y_inf, a, tau):\n    &quot;&quot;&quot;Y_n = Y_inf - a * exp(-n \/ tau).&quot;&quot;&quot;\n    return Y_inf - a * np.exp(-n \/ tau)\n\n\ndef _model_power(n, Y_inf, a, c):\n    &quot;&quot;&quot;Y_n = Y_inf - a * n^(-c).&quot;&quot;&quot;\n    return Y_inf - a * np.power(n, -c)\n\n\ndef _model_hill(n, Y_inf, k, c):\n    &quot;&quot;&quot;Y_n = Y_inf * n^c \/ (k^c + n^c).&quot;&quot;&quot;\n    return Y_inf * np.power(n, c) \/ (np.power(k, c) + np.power(n, c))\n\n\ndef _model_logistic(n, Y_inf, a, n0):\n    &quot;&quot;&quot;Y_n = Y_inf \/ (1 + exp(-a * (n - n0))).&quot;&quot;&quot;\n    z = np.clip(-a * (n - n0), -50, 50)\n    return Y_inf \/ (1 + np.exp(z))\n\n\nSATURATION_MODELS = {\n    &#039;exp&#039;:      {&#039;f&#039;: _model_exp,      &#039;pnames&#039;: (&#039;Y_inf&#039;, &#039;a&#039;, &#039;tau&#039;)},\n    &#039;power&#039;:    {&#039;f&#039;: _model_power,    &#039;pnames&#039;: (&#039;Y_inf&#039;, &#039;a&#039;, &#039;c&#039;)},\n    &#039;hill&#039;:     {&#039;f&#039;: _model_hill,     &#039;pnames&#039;: (&#039;Y_inf&#039;, &#039;k&#039;, &#039;c&#039;)},\n    &#039;logistic&#039;: {&#039;f&#039;: _model_logistic, &#039;pnames&#039;: (&#039;Y_inf&#039;, &#039;a&#039;, &#039;n0&#039;)},\n}\n\n\n# --- Fit and auto-selection ---\n\ndef fit_saturation_model(name, n_arr, Y_n):\n    &quot;&quot;&quot;Fit one saturation model. Returns a dict with f, popt, mse.&quot;&quot;&quot;\n    model = SATURATION_MODELS&#x5B;name]\n    f = model&#x5B;&#039;f&#039;]\n    Y_curr = float(Y_n&#x5B;-1])\n    gap = max(0.01, Y_curr - float(Y_n&#x5B;0]))\n    N = len(n_arr)\n    if name == &#039;exp&#039;:\n        p0 = &#x5B;Y_curr + 0.01, gap, max(50.0, N \/ 3.0)]\n        bounds = (&#x5B;Y_curr, 0.0, 1.0], &#x5B;np.inf, np.inf, 1e6])\n    elif name == &#039;power&#039;:\n        p0 = &#x5B;Y_curr + 0.01, gap, 0.5]\n        bounds = (&#x5B;Y_curr, 0.0, 0.001], &#x5B;np.inf, np.inf, 5.0])\n    elif name == &#039;hill&#039;:\n        p0 = &#x5B;Y_curr + 0.01, max(1.0, N \/ 2.0), 1.0]\n        bounds = (&#x5B;Y_curr, 1.0, 0.001], &#x5B;np.inf, 1e6, 10.0])\n    else:  # logistic\n        p0 = &#x5B;Y_curr + 0.01, 0.01, max(1.0, N \/ 2.0)]\n        bounds = (&#x5B;Y_curr, 0.0001, -1e3], &#x5B;np.inf, 10.0, 1e6])\n    popt, pcov = curve_fit(f, n_arr, Y_n, p0=p0, bounds=bounds,\n                           maxfev=10_000)\n    mse = float(np.mean((Y_n - f(n_arr, *popt)) ** 2))\n    return {&#039;f&#039;: f, &#039;popt&#039;: popt, &#039;pcov&#039;: pcov, &#039;mse&#039;: mse,\n            &#039;pnames&#039;: model&#x5B;&#039;pnames&#039;], &#039;name&#039;: name}\n\n\ndef select_best_saturation_model(n_arr, Y_n):\n    &quot;&quot;&quot;Fit all four models; return (fits_dict, chosen_name)\n    with the lowest AIC. Failed fits are skipped.&quot;&quot;&quot;\n    fits = {}\n    for name in SATURATION_MODELS:\n        try:\n            fits&#x5B;name] = fit_saturation_model(name, n_arr, Y_n)\n        except Exception as e:\n            print(f&#039;  &#x5B;auto] {name:&gt;9s} fit FAILED: {e}&#039;)\n    if not fits:\n        raise RuntimeError(&#039;All saturation model fits failed&#039;)\n    N, k = len(n_arr), 3\n    for name in fits:\n        fits&#x5B;name]&#x5B;&#039;aic&#039;] = (\n            N * np.log(fits&#x5B;name]&#x5B;&#039;mse&#039;] + 1e-12) + 2 * k)\n    chosen = min(fits, key=lambda nm: fits&#x5B;nm]&#x5B;&#039;aic&#039;])\n    return fits, chosen\n\n\n# --- Forward and inverse projection ---\n\ndef project_forward(fit, N, K):\n    &quot;&quot;&quot;Y_(N+K) from the fitted curve.&quot;&quot;&quot;\n    return float(fit&#x5B;&#039;f&#039;](N + K, *fit&#x5B;&#039;popt&#039;]))\n\n\ndef project_inverse(fit, N, T, K_max=1e8):\n    &quot;&quot;&quot;K such that Y_(N+K) = T. Returns 0 if already reached,\n    inf if unreachable, otherwise the brentq solution.&quot;&quot;&quot;\n    f, popt = fit&#x5B;&#039;f&#039;], fit&#x5B;&#039;popt&#039;]\n    Y_curr = float(f(N, *popt))\n    Y_inf = float(popt&#x5B;0])\n    if T &lt;= Y_curr:\n        return 0.0\n    if T &gt;= Y_inf:\n        return float(&#039;inf&#039;)\n    def g(K, _T=T):\n        return f(N + K, *popt) - _T\n    if g(K_max) &lt; 0:\n        return float(&#039;inf&#039;)\n    return brentq(g, 0, K_max, xtol=1.0)\n\n\n# --- End-to-end pipeline ---\n\ndef print_extrapolation(study, *,\n                        model=&#039;auto&#039;,\n                        K_values=None,\n                        target_values=None,\n                        exclude_value_below=None):\n    &quot;&quot;&quot;Fit a saturation curve to study.trials and print forward\n    and inverse projection tables.&quot;&quot;&quot;\n    import optuna\n    if K_values is None:\n        K_values = &#x5B;50, 100, 500, 1000, 5000, 10000]\n    trials = &#x5B;t for t in study.trials\n              if t.state == optuna.trial.TrialState.COMPLETE\n              and t.value is not None]\n    if exclude_value_below is not None:\n        trials = &#x5B;t for t in trials if t.value &gt; exclude_value_below]\n    trials.sort(key=lambda t: t.number)\n    if len(trials) &lt; 10:\n        raise RuntimeError(\n            f&#039;need &gt;= 10 COMPLETE trials, got {len(trials)}&#039;)\n\n    values = np.array(&#x5B;t.value for t in trials], dtype=float)\n    Y_n = np.maximum.accumulate(values)\n    n_arr = np.arange(1, len(Y_n) + 1)\n    N, Y_curr = len(n_arr), float(Y_n&#x5B;-1])\n\n    if model == &#039;auto&#039;:\n        fits, chosen = select_best_saturation_model(n_arr, Y_n)\n        for name in sorted(fits, key=lambda nm: fits&#x5B;nm]&#x5B;&#039;aic&#039;]):\n            tag = &#039;  &lt;- CHOSEN&#039; if name == chosen else &#039;&#039;\n            print(f&#039;    {name:&gt;9s}: MSE={fits&#x5B;name]&#x5B;&quot;mse&quot;]:.6e}, &#039;\n                  f&#039;AIC={fits&#x5B;name]&#x5B;&quot;aic&quot;]:+.2f}{tag}&#039;)\n        fit = fits&#x5B;chosen]\n    else:\n        fit = fit_saturation_model(model, n_arr, Y_n)\n        fit&#x5B;&#039;aic&#039;] = N * np.log(fit&#x5B;&#039;mse&#039;] + 1e-12) + 2 * 3\n        chosen = model\n\n    print(f&#039;\\n  Forward -- K -&gt; Y_(N+K), model={chosen}&#039;)\n    for K in K_values:\n        Y = project_forward(fit, N, K)\n        print(f&#039;    K={K:&gt;6d}  -&gt;  Y={Y:+.4f}  (gain {Y - Y_curr:+.4f})&#039;)\n\n    Y_inf = float(fit&#x5B;&#039;popt&#039;]&#x5B;0])\n    if target_values is None:\n        target_values = list(np.linspace(Y_curr + 0.001,\n                                         Y_inf - 0.001, 5))\n    print(f&#039;\\n  Inverse -- T -&gt; required K, model={chosen}&#039;)\n    for T in target_values:\n        K = project_inverse(fit, N, T)\n        K_str = &#039;inf&#039; if K == float(&#039;inf&#039;) else f&#039;{K:.0f}&#039;\n        print(f&#039;    T={T:+.4f}  -&gt;  K={K_str}&#039;)\n\n\n# --- CLI ---\n\ndef parse_args():\n    ap = argparse.ArgumentParser(\n        description=&#039;Optuna best-so-far saturation projection.&#039;)\n    ap.add_argument(&#039;--storage&#039;, required=True, type=str,\n                    help=&#039;Optuna storage URL (e.g., sqlite:\/\/\/optuna.db)&#039;)\n    ap.add_argument(&#039;--study_name&#039;, required=True, type=str)\n    ap.add_argument(&#039;--model&#039;, default=&#039;auto&#039;, type=str.lower,\n                    choices=&#x5B;&#039;auto&#039;] + list(SATURATION_MODELS.keys()))\n    ap.add_argument(&#039;--K_values&#039;,\n                    default=&#039;50,100,500,1000,5000,10000&#039;,\n                    type=lambda s: &#x5B;int(t.strip()) for t in s.split(&#039;,&#039;)\n                                    if t.strip()])\n    ap.add_argument(&#039;--target_values&#039;, default=None,\n                    type=lambda s: (&#x5B;float(t.strip()) for t in s.split(&#039;,&#039;)\n                                     if t.strip()] if s else None))\n    ap.add_argument(&#039;--exclude_value_below&#039;, default=None, type=float)\n    return ap.parse_args()\n\n\nif __name__ == &#039;__main__&#039;:\n    import optuna\n    args = parse_args()\n    study = optuna.load_study(\n        study_name=args.study_name, storage=args.storage)\n    print_extrapolation(\n        study=study,\n        model=args.model,\n        K_values=args.K_values,\n        target_values=args.target_values,\n        exclude_value_below=args.exclude_value_below)\n\n<\/pre><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">5.2 Function Reference<\/h3>\n\n\n<style>.kadence-column6676_abf7ab-ed > .kt-inside-inner-col,.kadence-column6676_abf7ab-ed > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_abf7ab-ed > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_abf7ab-ed > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_abf7ab-ed > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_abf7ab-ed > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_abf7ab-ed{position:relative;}.kadence-column6676_abf7ab-ed, .kt-inside-inner-col > .kadence-column6676_abf7ab-ed:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_abf7ab-ed > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_abf7ab-ed > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_abf7ab-ed\"><div class=\"kt-inside-inner-col\">\n<figure class=\"wp-block-table\"><table><thead><tr><th>Symbol<\/th><th>Role<\/th><\/tr><\/thead><tbody><tr><td><code>_model_exp \/ _power \/ _hill \/ _logistic<\/code><\/td><td>The four saturation forms from \u00a72.<\/td><\/tr><tr><td><code>SATURATION_MODELS<\/code><\/td><td>Registry of (function, parameter names).<\/td><\/tr><tr><td><code>fit_saturation_model(name, n_arr, Y_n)<\/code><\/td><td>Fit a single model with per-model starting values and bounds; enforces $Y_\\infty \\gt Y_{current}$.<\/td><\/tr><tr><td><code>select_best_saturation_model(n_arr, Y_n)<\/code><\/td><td>Fit all four models, compute AIC, return the dictionary and the chosen name.<\/td><\/tr><tr><td><code>project_forward(fit, N, K)<\/code><\/td><td>Direct evaluation $Y_{N+K}$.<\/td><\/tr><tr><td><code>project_inverse(fit, N, T)<\/code><\/td><td>Root-finding for required $K$.<\/td><\/tr><tr><td><code>print_extrapolation(study, ...)<\/code><\/td><td>End-to-end: load \u2192 fit \u2192 print both tables.<\/td><\/tr><tr><td><code>parse_args()<\/code> \/ <code>__main__<\/code><\/td><td>Command-line entry point.<\/td><\/tr><\/tbody><\/table><\/figure>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">5.3 CLI Usage<\/h3>\n\n\n<style>.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col,.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_a1c1d0-d5{position:relative;}.kadence-column6676_a1c1d0-d5, .kt-inside-inner-col > .kadence-column6676_a1c1d0-d5:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_a1c1d0-d5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_a1c1d0-d5\"><div class=\"kt-inside-inner-col\"><div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\n&lt;pre class=&quot;wp-block-syntaxhighlighter-code&quot;&gt;# Auto mode (default \u2014 fit all four, pick lowest AIC)\npython optuna_projection.py \\\n    --storage sqlite:\/\/\/path\/to\/optuna.db \\\n    --study_name &lt;study_name&gt; \\\n    --model auto\n# Force the exponential model\npython optuna_projection.py \\\n    --storage sqlite:\/\/\/path\/to\/optuna.db \\\n    --study_name &lt;study_name&gt; \\\n    --model exp \\\n    --target_values 0.32,0.35,0.40\n# Cross-check all four models\nfor m in exp power hill logistic; do\n    python optuna_projection.py \\\n        --storage sqlite:\/\/\/path\/to\/optuna.db \\\n        --study_name &lt;study_name&gt; \\\n        --model $m\ndone\n&lt;\/pre&gt;\n<\/pre><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">5.4 Example Output<\/h3>\n\n\n<style>.kadence-column6676_4c1f73-33 > .kt-inside-inner-col,.kadence-column6676_4c1f73-33 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_4c1f73-33 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_4c1f73-33 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_4c1f73-33 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_4c1f73-33 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_4c1f73-33{position:relative;}.kadence-column6676_4c1f73-33, .kt-inside-inner-col > .kadence-column6676_4c1f73-33:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_4c1f73-33 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_4c1f73-33 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_4c1f73-33\"><div class=\"kt-inside-inner-col\">\n<pre style=\"font-family: consolas,monospace; font-size: 1.2rem; white-space: pre; line-height:1.2; background-color: #fff; border: none\">\n[saturation] N_trials=626, Y_current=+0.3136\n  Model selection (AIC = N*log(MSE) + 2k, k=3):\n          exp: MSE=7.74e-06, AIC=-7361.33   &lt;- CHOSEN\n     logistic: MSE=1.22e-05, AIC=-7079.04\n         hill: MSE=1.99e-05, AIC=-6771.67\n        power: MSE=2.03e-05, AIC=-6756.43\n  Fitted params: Y_inf=0.3218, a=0.0368, tau=338.96\n  Y_inf (asymptotic ceiling) = +0.3218, gap = +0.0082\n\n  Forward -- K -> Y_(N+K)        Inverse -- T -> required K\n    K=    50  ->  +0.3168          T=+0.3200  ->  K = 400\n    K=   100  ->  +0.3175          T=+0.3500  ->  unreachable\n    K=  1000  ->  +0.3215          (T &gt;= Y_inf)\n    K= 10000  ->  +0.3218\n<\/pre>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">6. Limitations and Caveats<\/h2>\n\n\n<style>.kadence-column6676_d857ef-b8 > .kt-inside-inner-col,.kadence-column6676_d857ef-b8 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_d857ef-b8 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_d857ef-b8 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_d857ef-b8 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_d857ef-b8 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_d857ef-b8{position:relative;}.kadence-column6676_d857ef-b8, .kt-inside-inner-col > .kadence-column6676_d857ef-b8:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_d857ef-b8 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_d857ef-b8 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_d857ef-b8\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Saturation-curve projection is conservative and assumption-light, but it has well-defined failure modes. The tables below separate <em>what the method captures faithfully<\/em> from <em>what it misses<\/em>.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Underlying Assumptions<\/h3>\n\n\n<style>.kadence-column6676_2f9850-ab > .kt-inside-inner-col,.kadence-column6676_2f9850-ab > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_2f9850-ab > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_2f9850-ab > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_2f9850-ab > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_2f9850-ab > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_2f9850-ab{position:relative;}.kadence-column6676_2f9850-ab, .kt-inside-inner-col > .kadence-column6676_2f9850-ab:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_2f9850-ab > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_2f9850-ab > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_2f9850-ab\"><div class=\"kt-inside-inner-col\">\n<figure class=\"wp-block-table\"><table><thead><tr><th>Assumption<\/th><th>Meaning<\/th><th>Effect when violated<\/th><\/tr><\/thead><tbody><tr><td><strong>Saturation form<\/strong><\/td><td>The trajectory fits one of the four families.<\/td><td>Out-of-distribution shapes (decreasing, oscillating) yield poor fits across all four.<\/td><\/tr><tr><td><strong>Existence of asymptote<\/strong><\/td><td>TPE eventually plateaus.<\/td><td>For studies still climbing, $Y_\\infty$ estimates are unstable.<\/td><\/tr><tr><td><strong>Fixed HP space<\/strong><\/td><td>The search space does not change during the run.<\/td><td>Add\/remove HPs requires refitting from scratch.<\/td><\/tr><tr><td><strong>Stationary noise<\/strong><\/td><td>Trial-to-trial variance is homoscedastic.<\/td><td>Heteroscedasticity calls for weighted least squares.<\/td><\/tr><\/tbody><\/table><\/figure>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Good Projection vs Bad Projection<\/h3>\n\n\n<style>.kadence-column6676_46164f-cc > .kt-inside-inner-col,.kadence-column6676_46164f-cc > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_46164f-cc > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_46164f-cc > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_46164f-cc > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_46164f-cc > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_46164f-cc{position:relative;}.kadence-column6676_46164f-cc, .kt-inside-inner-col > .kadence-column6676_46164f-cc:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_46164f-cc > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_46164f-cc > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_46164f-cc\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003<strong>Good Projection \u2014 what is captured well<\/strong>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Aspect<\/th><th>Why it works<\/th><\/tr><\/thead><tbody><tr><td>Result of TPE's active narrowing<\/td><td>The best-so-far curve <em>is<\/em> the product of narrowing; fitting it learns the narrowing directly.<\/td><\/tr><tr><td>Diminishing returns<\/td><td>All four forms are inherently diminishing-returns shapes.<\/td><\/tr><tr><td>Asymptotic ceiling $Y_\\infty$<\/td><td>The fitted asymptote is the natural estimate of TPE's reachable maximum.<\/td><\/tr><tr><td>HP-space dimensionality<\/td><td>HP coordinates are unused, so 5-D and 50-D HP spaces share the same framework.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>Bad Projection \u2014 what is missed<\/strong>:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Limitation<\/th><th>Why it is missed<\/th><\/tr><\/thead><tbody><tr><td>Sudden discovery jumps<\/td><td>A new productive HP region creates a step in best-so-far; smooth saturation curves average it out and underestimate.<\/td><\/tr><tr><td>Effect of HP-space expansion<\/td><td>Fitted curves use historical data only; adding new HPs spawns a new trajectory.<\/td><\/tr><tr><td>Shapes outside the four families<\/td><td>Oscillations, multi-stage saturation, or positive curvature cannot be expressed.<\/td><\/tr><tr><td>Non-Gaussian residual tails<\/td><td>AIC absolute values lose meaning; rankings remain relatively robust.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003<strong>How to act on this<\/strong>: trust the projection for \"expected $Y_{N+K}$ with $K \\le 10000$\" but treat the report as silent about sudden jumps or HP-space expansions; those questions require complementary diagnostics.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Recommended Usage<\/h3>\n\n\n<style>.kadence-column6676_99b169-22 > .kt-inside-inner-col,.kadence-column6676_99b169-22 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_99b169-22 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_99b169-22 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_99b169-22 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_99b169-22 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_99b169-22{position:relative;}.kadence-column6676_99b169-22, .kt-inside-inner-col > .kadence-column6676_99b169-22:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_99b169-22 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_99b169-22 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_99b169-22\"><div class=\"kt-inside-inner-col\">\n<ul class=\"wp-block-list\">\n<li>Refit every ~100 trials to track drift in $\\hat{Y}_\\infty$.<\/li>\n\n\n\n<li>Even in auto mode, sanity-check by running each of the four models manually.<\/li>\n\n\n\n<li>If $\\hat{Y}_\\infty - Y_{current}$ falls below a decision threshold (say 0.05), additional trials are unlikely to help; consider structural changes (HP-space redesign, data augmentation, model change).<\/li>\n\n\n\n<li>Optionally cross-check with a surrogate-based projection (GP) and a PFN-based projection (Adriaensen et al., 2023). The saturation curve is typically the most conservative; surrogate is the most optimistic. Wide disagreement is a sign that an assumption is broken.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" style=\"margin-top:var(--wp--preset--spacing--60);margin-bottom:var(--wp--preset--spacing--60)\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix A. Why R\u00b2 \/ MSE Fails When k Varies<\/h2>\n\n\n<style>.kadence-column6676_885038-61 > .kt-inside-inner-col,.kadence-column6676_885038-61 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_885038-61 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_885038-61 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_885038-61 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_885038-61 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_885038-61{position:relative;}.kadence-column6676_885038-61, .kt-inside-inner-col > .kadence-column6676_885038-61:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_885038-61 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_885038-61 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_885038-61\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003This appendix explains why R\u00b2 and MSE alone are unreliable selection criteria as soon as candidate models have different numbers of parameters.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">A.1 The Monotone Improvement Property<\/h3>\n\n\n<style>.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col,.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_bf6cdb-5d{position:relative;}.kadence-column6676_bf6cdb-5d, .kt-inside-inner-col > .kadence-column6676_bf6cdb-5d:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_bf6cdb-5d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_bf6cdb-5d\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Suppose $M_3$ has 3 parameters and $M_4$ has 4, with the function space of $M_4$ a superset of $M_3$'s. For least-squares fits on the same data,<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$\\mathrm{SSres}(M_4) \\le \\mathrm{SSres}(M_3) \\;\\Rightarrow\\; \\mathrm{MSE}(M_4) \\le \\mathrm{MSE}(M_3) \\;\\Rightarrow\\; R^2(M_4) \\ge R^2(M_3).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Adding parameters therefore never worsens R\u00b2\/MSE; equality holds only when the extra parameter is exactly zero. Picking by R\u00b2 alone always favors the more complex model.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">A.2 Polynomial Regression Illustration<\/h3>\n\n\n<style>.kadence-column6676_bb9493-87 > .kt-inside-inner-col,.kadence-column6676_bb9493-87 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_bb9493-87 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_bb9493-87 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_bb9493-87 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_bb9493-87 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_bb9493-87{position:relative;}.kadence-column6676_bb9493-87, .kt-inside-inner-col > .kadence-column6676_bb9493-87:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_bb9493-87 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_bb9493-87 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_bb9493-87\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Fit polynomials of degree 1, 3, and 9 to $N = 10$ noisy samples whose true generating function is quadratic. Compare AIC against R\u00b2:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Model<\/th><th>$k$<\/th><th>MSE<\/th><th>R\u00b2<\/th><th>$AIC = N \\log(\\mathrm{MSE}) + 2k$<\/th><th>R\u00b2 rank<\/th><th>AIC rank<\/th><\/tr><\/thead><tbody><tr><td>Linear<\/td><td>2<\/td><td>0.030<\/td><td>0.70<\/td><td>$10 \\ln(0.030)+4 = -31.1$<\/td><td>worst<\/td><td>worst<\/td><\/tr><tr><td>Cubic<\/td><td>4<\/td><td>0.005<\/td><td>0.95<\/td><td>$10 \\ln(0.005)+8 = -45.0$<\/td><td>middle<\/td><td><strong>best<\/strong><\/td><\/tr><tr><td>Degree 9<\/td><td>10<\/td><td>0.003<\/td><td><strong>0.98<\/strong><\/td><td>$10 \\ln(0.003)+20 = -38.1$<\/td><td><strong>best<\/strong><\/td><td>middle<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003R\u00b2\/MSE picks the degree-9 polynomial (an overfit; it captures the noise). AIC picks the cubic (correct: the true generator is quadratic). The trade-off is quantitative:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Comparison<\/th><th>$N \\log$ improvement<\/th><th>$2k$ penalty change<\/th><th>$\\Delta$AIC<\/th><\/tr><\/thead><tbody><tr><td>Cubic \u2192 Degree 9<\/td><td>$10 \\ln(0.003\/0.005) = -5.1$<\/td><td>$+12$<\/td><td>$+6.9$ (worse)<\/td><\/tr><tr><td>Linear \u2192 Cubic<\/td><td>$10 \\ln(0.005\/0.030) = -17.9$<\/td><td>$+4$<\/td><td>$-13.9$ (better)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Each extra parameter must improve $\\log(\\mathrm{MSE})$ by at least $2\/N$ for AIC to accept it. R\u00b2\/MSE has no such rule, which is why it slides toward overfit.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">A.3 Same-k Comparisons Are Safe<\/h3>\n\n\n<style>.kadence-column6676_68a961-09 > .kt-inside-inner-col,.kadence-column6676_68a961-09 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_68a961-09 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_68a961-09 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_68a961-09 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_68a961-09 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_68a961-09{position:relative;}.kadence-column6676_68a961-09, .kt-inside-inner-col > .kadence-column6676_68a961-09:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_68a961-09 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_68a961-09 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_68a961-09\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003The monotone improvement property requires nested models with different $k$. When all candidates have the same $k$ and belong to different function families \u2014 as in our four saturation models \u2014 R\u00b2, MSE, and AIC produce identical rankings. The penalty matters only when a 4-parameter model (e.g., Janoschek) joins the candidate set.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix B. AIC Under Gaussian Residuals \u2014 Derivation<\/h2>\n\n\n<style>.kadence-column6676_45b19c-3f > .kt-inside-inner-col,.kadence-column6676_45b19c-3f > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_45b19c-3f > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_45b19c-3f > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_45b19c-3f > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_45b19c-3f > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_45b19c-3f{position:relative;}.kadence-column6676_45b19c-3f, .kt-inside-inner-col > .kadence-column6676_45b19c-3f:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_45b19c-3f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_45b19c-3f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_45b19c-3f\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003The general AIC (Akaike, 1974) is<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$AIC = -2 \\log \\hat{L} + 2 k,$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003where $\\hat{L}$ is the maximum value of the likelihood. The compact form $AIC = N \\log(\\mathrm{MSE}) + 2k$ used in \u00a73.2 follows once Gaussian residuals are assumed and the additive constant is dropped.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.1 What is the Likelihood $L$?<\/h3>\n\n\n<style>.kadence-column6676_57745e-f2 > .kt-inside-inner-col,.kadence-column6676_57745e-f2 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_57745e-f2 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_57745e-f2 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_57745e-f2 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_57745e-f2 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_57745e-f2{position:relative;}.kadence-column6676_57745e-f2, .kt-inside-inner-col > .kadence-column6676_57745e-f2:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_57745e-f2 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_57745e-f2 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_57745e-f2\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003The <strong>likelihood<\/strong> $L$ is the probability density of the observed data $\\{(x_i, y_i)\\}_{i=1}^N$ under given model parameters $(\\theta, \\sigma^2)$. Informally: \"If these parameters are correct, how plausible is the data we actually observed?\"<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.2 Setup and Notation<\/h3>\n\n\n<style>.kadence-column6676_997170-28 > .kt-inside-inner-col,.kadence-column6676_997170-28 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_997170-28 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_997170-28 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_997170-28 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_997170-28 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_997170-28{position:relative;}.kadence-column6676_997170-28, .kt-inside-inner-col > .kadence-column6676_997170-28:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_997170-28 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_997170-28 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_997170-28\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Regression model: $y_i = f(x_i; \\theta) + \\varepsilon_i$ with $\\varepsilon_i \\sim \\mathcal{N}(0, \\sigma^2)$ independent and identically distributed (i.i.d.). Let $\\varphi(x; \\mu, \\sigma^2) = (2\\pi\\sigma^2)^{-1\/2} \\exp\\!\\big(-(x-\\mu)^2 \/ (2\\sigma^2)\\big)$ denote the Gaussian density.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.3 Step 1 \u2014 Density of a Single Observation<\/h3>\n\n\n<style>.kadence-column6676_ec36bd-0d > .kt-inside-inner-col,.kadence-column6676_ec36bd-0d > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_ec36bd-0d > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_ec36bd-0d > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_ec36bd-0d > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_ec36bd-0d > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_ec36bd-0d{position:relative;}.kadence-column6676_ec36bd-0d, .kt-inside-inner-col > .kadence-column6676_ec36bd-0d:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_ec36bd-0d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_ec36bd-0d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_ec36bd-0d\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Conditional on $x_i, \\theta, \\sigma^2$, the response $y_i$ is normally distributed with mean $f(x_i; \\theta)$ and variance $\\sigma^2$. Its density is<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$p(y_i \\mid x_i, \\theta, \\sigma^2) = \\varphi\\!\\big(y_i - f(x_i; \\theta);\\, 0,\\, \\sigma^2\\big),$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003i.e., the Gaussian density evaluated at the residual $r_i = y_i - f(x_i; \\theta)$.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.4 Step 2 \u2014 Joint Likelihood<\/h3>\n\n\n<style>.kadence-column6676_b4d18b-95 > .kt-inside-inner-col,.kadence-column6676_b4d18b-95 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_b4d18b-95 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_b4d18b-95 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_b4d18b-95 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_b4d18b-95 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_b4d18b-95{position:relative;}.kadence-column6676_b4d18b-95, .kt-inside-inner-col > .kadence-column6676_b4d18b-95:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_b4d18b-95 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_b4d18b-95 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_b4d18b-95\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Independence converts joint probability to a product of single-observation densities:<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$L(\\theta, \\sigma^2) = \\prod_{i=1}^N \\varphi\\!\\big(y_i - f(x_i; \\theta);\\, 0,\\, \\sigma^2\\big).$$<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.5 Step 3 \u2014 Take the Log<\/h3>\n\n\n<style>.kadence-column6676_16c18d-f5 > .kt-inside-inner-col,.kadence-column6676_16c18d-f5 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_16c18d-f5 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_16c18d-f5 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_16c18d-f5 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_16c18d-f5 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_16c18d-f5{position:relative;}.kadence-column6676_16c18d-f5, .kt-inside-inner-col > .kadence-column6676_16c18d-f5:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_16c18d-f5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_16c18d-f5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_16c18d-f5\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003A product over $N$ small numbers underflows to zero in floating-point; taking the logarithm converts it to a sum and simplifies differentiation. Because $\\log$ is monotonic, $\\arg\\max L = \\arg\\max \\log L$. Hence<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$\\log L(\\theta, \\sigma^2) = \\sum_{i=1}^N \\log \\varphi\\!\\big(y_i - f(x_i; \\theta);\\, 0,\\, \\sigma^2\\big).$$<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.6 Step 4 \u2014 Expand the Gaussian<\/h3>\n\n\n<style>.kadence-column6676_b6595f-6f > .kt-inside-inner-col,.kadence-column6676_b6595f-6f > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_b6595f-6f > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_b6595f-6f > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_b6595f-6f > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_b6595f-6f > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_b6595f-6f{position:relative;}.kadence-column6676_b6595f-6f, .kt-inside-inner-col > .kadence-column6676_b6595f-6f:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_b6595f-6f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_b6595f-6f > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_b6595f-6f\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Substituting the Gaussian density and writing $r_i = y_i - f(x_i; \\theta)$:<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$\\log L(\\theta, \\sigma^2) = -\\frac{N}{2}\\log(2\\pi) - \\frac{N}{2}\\log(\\sigma^2) - \\frac{1}{2\\sigma^2}\\sum_{i=1}^N r_i^2.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003The three terms are, respectively, a model-independent constant, a precision term, and a goodness-of-fit term proportional to the sum of squared residuals.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.7 MLE and AIC Formula<\/h3>\n\n\n<style>.kadence-column6676_b0057b-80 > .kt-inside-inner-col,.kadence-column6676_b0057b-80 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_b0057b-80 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_b0057b-80 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_b0057b-80 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_b0057b-80 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_b0057b-80{position:relative;}.kadence-column6676_b0057b-80, .kt-inside-inner-col > .kadence-column6676_b0057b-80:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_b0057b-80 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_b0057b-80 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_b0057b-80\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">\u2003Maximum Likelihood Estimation (MLE) maximizes $\\log L$. The MLE for $\\theta$ coincides with the Least Squares Estimate (LSE), and the MLE for $\\sigma^2$ is<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$\\hat{\\sigma}^2_{MLE} = \\frac{1}{N} \\sum_{i=1}^N (y_i - f(x_i; \\hat{\\theta}))^2 = \\mathrm{MSE}.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Plugging the MLEs into $\\log L$ and then into $AIC = -2 \\log \\hat{L} + 2 k$:<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$AIC = N \\log(\\mathrm{MSE}) + N (\\log(2\\pi) + 1) + 2k.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003The term $N(\\log(2\\pi) + 1)$ is identical across all candidate models and drops out of any ranking. Hence the operational form<\/p>\n\n\n\n<p style=\"background-color: #fff; border: none\">$$AIC \\approx N \\log(\\mathrm{MSE}) + 2k$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003recovers the \u00a73.2 formula.<\/p>\n<\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">B.8 When the Gaussian Assumption Breaks<\/h3>\n\n\n<style>.kadence-column6676_93c163-ef > .kt-inside-inner-col,.kadence-column6676_93c163-ef > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_93c163-ef > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_93c163-ef > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_93c163-ef > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_93c163-ef > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_93c163-ef{position:relative;}.kadence-column6676_93c163-ef, .kt-inside-inner-col > .kadence-column6676_93c163-ef:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_93c163-ef > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_93c163-ef > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_93c163-ef\"><div class=\"kt-inside-inner-col\">\n<ul class=\"wp-block-list\">\n<li>Absolute AIC values lose interpretability for heavy-tailed or skewed residuals; rankings remain reasonably robust.<\/li>\n\n\n\n<li>The small-sample-corrected variant AICc adds $2k(k+1)\/(N - k - 1)$ and is recommended when $N \\lt 40k$.<\/li>\n\n\n\n<li>The Bayesian Information Criterion (BIC) uses penalty $k \\log(N)$, which is stricter for large $N$.<\/li>\n\n\n\n<li>Cross-validation is distribution-free but costlier than AIC.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u2003Our best-so-far series is monotone and therefore its residuals are not strictly Gaussian (one tail is truncated). Empirically, the order-of-magnitude AIC ranking is unaffected.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n<style>.kadence-column6676_cd889d-f1 > .kt-inside-inner-col,.kadence-column6676_cd889d-f1 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6676_cd889d-f1 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6676_cd889d-f1 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6676_cd889d-f1 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6676_cd889d-f1 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6676_cd889d-f1{position:relative;}.kadence-column6676_cd889d-f1, .kt-inside-inner-col > .kadence-column6676_cd889d-f1:not(.specificity){margin-left:var(--global-kb-spacing-sm, 1.5rem);}@media all and (max-width: 1024px){.kadence-column6676_cd889d-f1 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6676_cd889d-f1 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6676_cd889d-f1\"><div class=\"kt-inside-inner-col\">\n<ul class=\"wp-block-list\">\n<li>Adriaensen, S., Rakotoarison, H., M\u00fcller, S., &amp; Hutter, F. (2023). Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks. <em>NeurIPS 2023<\/em>.<\/li>\n\n\n\n<li>Akaike, H. (1974). A New Look at the Statistical Model Identification. <em>IEEE Transactions on Automatic Control<\/em>, 19(6), 716\u2013723.<\/li>\n\n\n\n<li>Akiba, T., Sano, S., Yanase, T., Ohta, T., &amp; Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. In <em>Proceedings of KDD 2019<\/em>.<\/li>\n\n\n\n<li>Domhan, T., Springenberg, J. T., &amp; Hutter, F. (2015). Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves. <em>IJCAI 2015<\/em>.<\/li>\n\n\n\n<li>Mosteller, F., &amp; Tukey, J. W. (1977). <em>Data Analysis and Regression<\/em>. Addison-Wesley.<\/li>\n<\/ul>\n<\/div><\/div>\n<div style='text-align:center' class='yasr-auto-insert-overall'><\/div><div style='text-align:center' class='yasr-auto-insert-visitor'><\/div>","protected":false},"excerpt":{"rendered":"<p>\u2003A concise report on projecting Optuna&#8217;s best-so-far trajectory with four saturation curves. The method estimates the expected best metric after $K$ additional trials (forward) or the trials needed to reach a target $T$ (inverse) without using hyperparameter (HP) coordinates and without the over-optimism that plagues surrogate-based projection. 1. Problem Setting and Motivation \u2003Optuna (Akiba et&#8230;<\/p>\n","protected":false},"author":4,"featured_media":6679,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","yasr_overall_rating":0,"yasr_post_is_review":"","yasr_auto_insert_disabled":"","yasr_review_type":"","fifu_image_url":"","fifu_image_alt":"","iawp_total_views":9,"footnotes":""},"categories":[374,56,377,375],"tags":[],"class_list":["post-6676","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-label-engineering-slug","category-data-science-slug","category-hyperparameter-slug","category-tree-based-model-slug"],"yasr_visitor_votes":{"stars_attributes":{"read_only":false,"span_bottom":false},"number_of_votes":1,"sum_votes":4},"jetpack_featured_media_url":"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/05\/20260523-rainbow-over-the-sunset-1200x900px.jpg","_links":{"self":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6676","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/comments?post=6676"}],"version-history":[{"count":2,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6676\/revisions"}],"predecessor-version":[{"id":6681,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6676\/revisions\/6681"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/media\/6679"}],"wp:attachment":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/media?parent=6676"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/categories?post=6676"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/tags?post=6676"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}