{"id":6460,"date":"2026-04-29T15:57:55","date_gmt":"2026-04-29T20:57:55","guid":{"rendered":"https:\/\/ykim.synology.me\/wordpress\/?p=6460"},"modified":"2026-04-29T18:08:13","modified_gmt":"2026-04-29T23:08:13","slug":"noise-induced-instability-in-tree-based-feature-selection-root-causes-and-robust-countermeasures","status":"publish","type":"post","link":"https:\/\/ykim.synology.me\/wordpress\/noise-induced-instability-in-tree-based-feature-selection-root-causes-and-robust-countermeasures-6460\/","title":{"rendered":"Noise-Induced Instability in Tree-based Feature Selection: Root Causes and Robust Countermeasures"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"600\" src=\"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/04\/Backlit-Tree-Skeleton-at-Sunset-800x600px.png\" alt=\"\" class=\"wp-image-6467\" style=\"width:600px\" srcset=\"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/04\/Backlit-Tree-Skeleton-at-Sunset-800x600px.png 800w, https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/04\/Backlit-Tree-Skeleton-at-Sunset-800x600px-300x225.png 300w, https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/04\/Backlit-Tree-Skeleton-at-Sunset-800x600px-768x576.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/figure>\n\n\n<style>.kadence-column6460_a78d53-0b > .kt-inside-inner-col,.kadence-column6460_a78d53-0b > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_a78d53-0b > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_a78d53-0b > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_a78d53-0b > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_a78d53-0b > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_a78d53-0b{position:relative;}.kadence-column6460_a78d53-0b, .kt-inside-inner-col > .kadence-column6460_a78d53-0b:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_a78d53-0b > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_a78d53-0b > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_a78d53-0b\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">When performing feature selection with tree-based models such as LightGBM (LGBM) or CatBoost, adding noise features to the existing set often causes <strong>truly important primary features to drop out of the importance ranking<\/strong>. This is not a data problem but a structural issue rooted in how tree models compete during training and how importance is computed. Relying on default feature importance alone makes selection fragile against noise. This post breaks down the five main causes and explains why each robust countermeasure works.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">1. What Is a Noise Feature?<\/h2>\n\n\n<style>.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col,.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_ab7a61-a4{position:relative;}.kadence-column6460_ab7a61-a4, .kt-inside-inner-col > .kadence-column6460_ab7a61-a4:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_ab7a61-a4 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_ab7a61-a4\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">A <strong>noise feature<\/strong> is a feature with no (or negligible) statistical relationship to the target. The term is standard in both academia and industry, and is typically split into three categories.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-13-color\">Irrelevant feature<\/mark><\/strong>: independent of the target, with mutual information close to zero.<\/li>\n\n\n\n<li><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-13-color\">Random noise feature<\/mark><\/strong>: deliberately generated from a random distribution (uniform, Gaussian, etc.) with no causal or statistical link to the target.<\/li>\n\n\n\n<li><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-13-color\">Redundant feature<\/mark><\/strong>: related to the target but provides no additional information beyond what other features already carry. Distinct from noise.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Note: in the literature, terms like <em>irrelevant features<\/em>, <em>uninformative features<\/em>, and <em>spurious predictors<\/em> are used interchangeably. In the Boruta algorithm, artificially shuffled features are called <em>shadow features<\/em>.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">2. Tree-Native Features<\/h2>\n\n\n<style>.kadence-column6460_dae9ff-5d > .kt-inside-inner-col,.kadence-column6460_dae9ff-5d > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_dae9ff-5d > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_dae9ff-5d > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_dae9ff-5d > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_dae9ff-5d > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_dae9ff-5d{position:relative;}.kadence-column6460_dae9ff-5d, .kt-inside-inner-col > .kadence-column6460_dae9ff-5d:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_dae9ff-5d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_dae9ff-5d > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_dae9ff-5d\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">Ideal Tree-Native features let the model reach the correct answer through the <strong>shortest possible path<\/strong>, without having to grow deep, complex trees. They form the first line of defense against noise vulnerability before any selection technique is applied.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-12-color\">1) High-Fidelity Signal<\/mark><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Denoised continuous variables<\/strong>: clean sensor jitter and outliers so values reflect the true physical state.<\/li>\n\n\n\n<li><strong>Monotonicity<\/strong>: relationships where the target consistently increases or decreases with the feature (e.g., process temperature vs. yield). Gradient Boosted Decision Trees (GBDT) learn far more robustly under monotonic constraints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-12-color\">2) Structural Determinants<\/mark><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>High-gain splitters<\/strong>: variables that drastically reduce impurity in a single split (e.g., equipment ID, process step number).<\/li>\n\n\n\n<li><strong>Interaction-rich features<\/strong>: explicitly precomputed combinations of two or more variables, so the tree does not have to discover them on its own.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-12-color\">3) Clean Dimensionality<\/mark><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Zero redundancy<\/strong>: remove highly correlated variables (multicollinearity) so the model does not have to &#8220;choose&#8221; between equivalent options.<\/li>\n\n\n\n<li><strong>Optimized cardinality (<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-12-color\">low cardinality<\/mark>)<\/strong>: prefer meaningful grouped categories over high-cardinality ID-like fields. High cardinality is the main channel through which noise features inflate importance (see Appendix B).<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why Noise Features Push Out Primary Features<\/h2>\n\n\n<style>.kadence-column6460_50993f-94 > .kt-inside-inner-col,.kadence-column6460_50993f-94 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_50993f-94 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_50993f-94 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_50993f-94 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_50993f-94 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_50993f-94{position:relative;}.kadence-column6460_50993f-94, .kt-inside-inner-col > .kadence-column6460_50993f-94:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_50993f-94 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_50993f-94 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_50993f-94\"><div class=\"kt-inside-inner-col\">\n<h3 class=\"wp-block-heading\">3.1 How Other Model Families Behave<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Model Family<\/th><th>Noise Sensitivity<\/th><th>Notes<\/th><\/tr><\/thead><tbody><tr><td>Linear \/ Logistic Regression<\/td><td>Medium<\/td><td>Sensitive to multicollinearity but stable under L1\/L2 regularization. Noise feature coefficients shrink toward zero.<\/td><\/tr><tr><td>Tree-based (LGBM, XGBoost, CatBoost, Random Forest)<\/td><td><strong><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-14-color\">High<\/mark><\/strong><\/td><td>Structurally vulnerable due to <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-14-color\">greedy splitting and cardinality bias<\/mark>. Main focus of this post.<\/td><\/tr><tr><td>Neural Networks (NN)<\/td><td>Low\u2013Medium<\/td><td>Weight decay and dropout dilute noise impact, but interpretable importance is hard to obtain.<\/td><\/tr><tr><td>k-Nearest Neighbors (kNN), Support Vector Machine (SVM, RBF kernel)<\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-14-color\">Very High<\/mark><\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-theme-palette-14-color\">Distance-based<\/mark>, so noise features directly distort distance computation. Exposed to the curse of dimensionality.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">What makes tree-based models distinct is that <strong>features compete at every split<\/strong>, where a &#8220;split&#8221; means partitioning a node&#8217;s data by a single feature threshold. This competitive structure is the root cause of noise vulnerability.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3.2 Five Root Causes (Tree-based Models)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>1) Lucky wins in split competition (limits of greedy splitting).<\/strong> At each node, the tree picks the split with the highest immediate gain. A noise feature can occasionally beat a primary feature on a specific subset of samples by sheer chance. Once the primary loses that node, its chance to contribute downstream collapses, and since importance is cumulative gain, losing early splits causes a sharp drop in score.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>2) High cardinality bias (selection bias toward high-cardinality features).<\/strong> The most common cause. A continuous or high-cardinality noise feature (e.g., time-series-like values) offers many more split candidates, making it structurally more likely to win. A feature with 1000 unique values has ~999 split candidates; one with 5 unique values has only 4. More candidates means a higher chance of finding a split that fits the training data by accident. This was formally reported by Strobl et al. (2007) for Random Forest and applies equally to Gradient Boosting Machine (GBM) variants (see Appendix B).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>3) Masking effect (feature correlation and redundancy).<\/strong> If a primary feature has even weak correlation with a noise feature, their importance gets distributed across both. If the noise accidentally explains some variation of the primary, the primary&#8217;s marginal contribution looks smaller. This mirrors multicollinearity in linear models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>4) Interaction with regularization and column subsampling.<\/strong> Parameters like LGBM&#8217;s <code>feature_fraction<\/code> or CatBoost&#8217;s <code>rsm<\/code> sample only a subset of features per tree or node. When a primary feature is not sampled in some iterations, a noise feature gets picked instead and accumulates importance. Adding more noise features dilutes the sampling probability of primaries (k of N becomes k of N+M).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>5) Limitations of the importance metric itself.<\/strong> Default importance (split count, gain) measures <strong>&#8220;how much the model used a feature,&#8221;<\/strong> not <strong>&#8220;how genuinely related it is to the target.&#8221;<\/strong> All the biases above feed directly into the score.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">4. Countermeasures<\/h2>\n\n\n<style>.kadence-column6460_d7cb02-70 > .kt-inside-inner-col,.kadence-column6460_d7cb02-70 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_d7cb02-70 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_d7cb02-70 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_d7cb02-70 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_d7cb02-70 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_d7cb02-70{position:relative;}.kadence-column6460_d7cb02-70, .kt-inside-inner-col > .kadence-column6460_d7cb02-70:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_d7cb02-70 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_d7cb02-70 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_d7cb02-70\"><div class=\"kt-inside-inner-col\">\n<h3 class=\"wp-block-heading\">4.1 Permutation Importance<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">It measures how much model performance drops when a feature&#8217;s values are randomly shuffled.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>What is cardinality bias?<\/strong> The number of split candidates in a tree model is proportional to the number of unique values in a feature. As a result, <strong>high-cardinality features can find a split that fits the training data by chance, inflating their importance even when unrelated to the target<\/strong> (see Appendix B for details).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why permutation importance is immune.<\/strong> Shuffling only changes the order of values; the value set itself (uniques, range, frequency, distribution) is preserved. What gets broken is the row-level pairing between feature and target \u2014 the <strong>joint distribution<\/strong>.<\/p>\n\n\n\n<div style=\"background-color: #fff; border: none\">\r\n$$\r\n\\text{Before shuffle: } P(\\text{feature}, \\text{target}) \\quad\\longrightarrow\\quad \\text{After shuffle: } P(\\text{feature}) \\times P(\\text{target})\r\n$$\r\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Marginal distributions are preserved; only the joint distribution is destroyed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Interpreting the Drop<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Permutation importance = (performance before) \u2212 (performance after).<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Feature Type<\/th><th>Before<\/th><th>After<\/th><th>Drop<\/th><th>Interpretation<\/th><\/tr><\/thead><tbody><tr><td>Truly important<\/td><td>Area Under the Curve (AUC) 0.85<\/td><td>AUC 0.70<\/td><td><strong>0.15 (large)<\/strong><\/td><td>Model breaks without it \u2192 genuine signal<\/td><\/tr><tr><td>High-cardinality noise<\/td><td>AUC 0.85<\/td><td>AUC 0.849<\/td><td><strong>\u2248 0 (tiny)<\/strong><\/td><td>No effect when broken \u2192 fake importance<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why cardinality bias cancels out.<\/strong> Since cardinality is preserved after shuffling, the inflated gain from &#8220;many split candidates&#8221; exists both before and after. Subtracting the two cancels the bias, leaving only the contribution from the true feature-target relationship.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">&#8220;Isn&#8217;t shuffled data garbage?&#8221;<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This is not validation; it is a <strong>controlled experiment<\/strong>, like comparing a drug group with a placebo group. The original data is the drug, the shuffled data is the placebo, and the difference is the feature&#8217;s pure &#8220;drug effect.&#8221; Garbage is the point \u2014 it gives us a clean baseline. We are not asking &#8220;how good is the shuffled data?&#8221; but &#8220;how much did the trained model rely on this feature?&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Applicability to Time Series<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Standard permutation assumes independent samples, which breaks for time series. Naive row-wise shuffling destroys autocorrelation and temporal order, producing unrealistic sequences. Recommended adaptations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Block permutation<\/strong>: shuffle in fixed-length time blocks to partially preserve autocorrelation.<\/li>\n\n\n\n<li><strong>Time-series Cross-Validation (CV)-based permutation<\/strong>: walk-forward splitting, then shuffle only within validation folds.<\/li>\n\n\n\n<li><strong>Conditional permutation<\/strong>: swap values only within local time windows.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For lag or rolling features, SHapley Additive exPlanations (SHAP) or Boruta is often more stable. Summary: <strong>not directly applicable, but works with adaptations.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Known limits<\/strong>: unrealistic input combinations and distortion from correlated features. Conditional permutation importance and SHAP (see Appendix A) address these.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4.2 SHAP Values<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Based on the Shapley value from cooperative game theory, SHAP computes how much each feature contributes on average across all possible feature subsets (see Appendix A for the mathematical background).<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Consistency axiom<\/strong>: if the model becomes more dependent on a feature, that feature&#8217;s SHAP value never decreases. By contrast, <strong>default gain importance<\/strong> (the metric exposed as <code>feature_importances_<\/code> in LGBM \/ XGBoost \u2014 the cumulative loss reduction across all splits using that feature) does not satisfy consistency, so importance rankings can flip even after small model changes. SHAP guarantees this stability axiomatically.<\/li>\n\n\n\n<li><strong>Local accuracy<\/strong>: summing the SHAP values of all features for a sample exactly equals that sample&#8217;s actual prediction. This means &#8220;feature contribution&#8221; is not just an analogy but a mathematically closed, decomposable definition.<\/li>\n\n\n\n<li><strong>Less sensitive to cardinality bias<\/strong>: SHAP measures <strong>marginal contribution<\/strong> \u2014 the average change in prediction when the feature is added to or removed from the model, i.e., the feature&#8217;s pure standalone contribution \u2014 rather than counting splits. The mechanism that inflates importance with high cardinality simply does not enter the calculation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4.3 Null Importance \/ Boruta (Most Direct Solution)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">These methods directly attack the problem of &#8220;noise features accidentally accumulating importance.&#8221;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Boruta<\/strong>: duplicate each feature, shuffle the copy to create a &#8220;shadow feature,&#8221; and train them together. A real feature is accepted only if its importance is statistically significantly higher than the maximum among shadow features.<\/li>\n\n\n\n<li><strong>Null importance<\/strong>: shuffle the target many times to estimate the <strong>null distribution<\/strong> of importance achievable by noise alone. A real feature counts as signal only if it exceeds a high percentile (e.g., 99th) of that distribution.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Why it works: cardinality bias, greedy randomness, and sampling dilution all act equally on shadow and real features, so the bias cancels. The criterion becomes <strong>&#8220;how much more important is this feature than noise?&#8221;<\/strong>, which is the right question.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4.4 Target Encoding \/ CatBoost Ordered Boosting<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Cardinality bias stems from differences in split candidate counts. Target encoding converts every categorical into a single continuous value, equalizing those counts. CatBoost&#8217;s ordered boosting adds target leakage protection on top.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4.5 Multi-Seed Averaging<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Greedy splitting is sensitive to data order and subsampling seed. Averaging or taking the median of importance across multiple seeds reduces the variance of &#8220;lucky wins,&#8221; statistically shrinking the noise variance by a factor of $\\sqrt{n}$ where $n$ is the number of seeds.<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">5. Cause\u2013Countermeasure Mapping<\/h2>\n\n\n<style>.kadence-column6460_f67d14-5b > .kt-inside-inner-col,.kadence-column6460_f67d14-5b > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_f67d14-5b > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_f67d14-5b > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_f67d14-5b > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_f67d14-5b > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_f67d14-5b{position:relative;}.kadence-column6460_f67d14-5b, .kt-inside-inner-col > .kadence-column6460_f67d14-5b:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_f67d14-5b > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_f67d14-5b > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_f67d14-5b\"><div class=\"kt-inside-inner-col\">\n<figure class=\"wp-block-table\"><table><thead><tr><th>Cause<\/th><th>Permutation<\/th><th>SHAP<\/th><th>Boruta\/Null<\/th><th>Target Encoding<\/th><th>Multi-seed<\/th><\/tr><\/thead><tbody><tr><td>1) Lucky greedy wins<\/td><td>\u25a1<\/td><td>\u25a1<\/td><td>\u25c9<\/td><td>&#8211;<\/td><td>\u25c9<\/td><\/tr><tr><td>2) Cardinality bias<\/td><td>\u25c9<\/td><td>\u25ce<\/td><td>\u25c9<\/td><td>\u25c9<\/td><td>\u25a1<\/td><\/tr><tr><td>3) Masking \/ correlation<\/td><td>\u25a1 (Conditional \u25c9)<\/td><td>\u25c9<\/td><td>\u25ce<\/td><td>&#8211;<\/td><td>&#8211;<\/td><\/tr><tr><td>4) Subsampling dilution<\/td><td>\u25ce<\/td><td>\u25ce<\/td><td>\u25c9<\/td><td>&#8211;<\/td><td>\u25c9<\/td><\/tr><tr><td>5) Importance metric limits<\/td><td>\u25c9<\/td><td>\u25c9<\/td><td>\u25c9<\/td><td>&#8211;<\/td><td>\u25ce<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">\u25c9 Highly effective \/ \u25ce Effective \/ \u25a1 Partially effective \/ &#8211; Not relevant<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">6. Recommended Practical Pipeline<\/h2>\n\n\n<style>.kadence-column6460_6f9c29-fe > .kt-inside-inner-col,.kadence-column6460_6f9c29-fe > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_6f9c29-fe > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_6f9c29-fe > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_6f9c29-fe > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_6f9c29-fe > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_6f9c29-fe{position:relative;}.kadence-column6460_6f9c29-fe, .kt-inside-inner-col > .kadence-column6460_6f9c29-fe:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_6f9c29-fe > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_6f9c29-fe > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_6f9c29-fe\"><div class=\"kt-inside-inner-col\">\n<p class=\"wp-block-paragraph\">Do not rely on a single method. Combine them in layers.<\/p>\n\n\n\n<pre style=\"font-family: Consolas; white-space: pre; line-height:1.2; background-color: #fff; border: none; font-size: 17px\">\n[Stage 1: Filter]     Boruta or Null Importance\n                        \u2193   (keep features significant vs. noise)\n[Stage 2: Validate]   SHAP value analysis\n                        \u2193   (check direction and consistency)\n[Stage 3: Stabilize]  Multi-seed repetition (\u2265 5 runs)\n                        \u2193   (check importance variance)\n[Final selection]     Features with stable, proven signal\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Supporting practices: unify categoricals via target encoding (equalizes cardinality), use conditional permutation importance when correlations are strong, and avoid setting <code>feature_fraction<\/code> too low in LGBM (mitigates dilution).<\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">7. Key Takeaways<\/h2>\n\n\n<style>.kadence-column6460_0010b5-da > .kt-inside-inner-col,.kadence-column6460_0010b5-da > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_0010b5-da > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_0010b5-da > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_0010b5-da > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_0010b5-da > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_0010b5-da{position:relative;}.kadence-column6460_0010b5-da, .kt-inside-inner-col > .kadence-column6460_0010b5-da:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_0010b5-da > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_0010b5-da > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_0010b5-da\"><div class=\"kt-inside-inner-col\">\n<ul class=\"wp-block-list\">\n<li>Noise pushing out primary features is <strong>not a data problem<\/strong> \u2014 it is a structural property of tree models, born from greedy splitting, cardinality bias, and sampling dilution.<\/li>\n\n\n\n<li>Default feature importance (gain, split count) measures <strong>&#8220;how much the model used a feature,&#8221;<\/strong> not its true relationship with the target.<\/li>\n\n\n\n<li><strong>Boruta and null importance<\/strong> are the most direct fix: they use noise itself as the baseline, canceling all biases at once.<\/li>\n\n\n\n<li>SHAP and permutation are <strong>complementary<\/strong>: permutation handles cardinality bias, SHAP guarantees axiomatic consistency.<\/li>\n\n\n\n<li>Robust feature selection requires a <strong>multi-layer verification pipeline<\/strong>, not a single metric.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix A. Shapley Value and SHAP<\/h2>\n\n\n<style>.kadence-column6460_7f2512-a4 > .kt-inside-inner-col,.kadence-column6460_7f2512-a4 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_7f2512-a4 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_7f2512-a4 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_7f2512-a4 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_7f2512-a4 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_7f2512-a4{position:relative;}.kadence-column6460_7f2512-a4, .kt-inside-inner-col > .kadence-column6460_7f2512-a4:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_7f2512-a4 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_7f2512-a4 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_7f2512-a4\"><div class=\"kt-inside-inner-col\">\n<h3 class=\"wp-block-heading\">A.1 Origin<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>Shapley value<\/strong> was proposed by economist Lloyd Shapley in 1953 in cooperative game theory, answering: &#8220;When several players cooperate to earn a total reward, how do we fairly distribute it according to each player&#8217;s contribution?&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Example: A, B, and C work together and earn $1M. A alone earns $100K, A+B earn $500K, A+B+C earn $1M, and so on for every coalition. Shapley defined each player&#8217;s fair share as <strong>the average marginal contribution they bring across all possible joining orders<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A.2 Mathematical Definition<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Let $N = \\{1, 2, \\dots, n\\}$ be the feature set, $S \\subseteq N \\setminus \\{i\\}$ a subset, and $v(S)$ the model output on $S$. The Shapley value of feature $i$ is:<\/p>\n\n\n\n<div style=\"background-color: #fff; border: none\">\r\n$$\r\n\\phi_i = \\sum_{S \\subseteq N \\setminus \\{i\\}} \\frac{|S|! \\, (n &#8211; |S| &#8211; 1)!}{n!} \\left[ v(S \\cup \\{i\\}) &#8211; v(S) \\right]\r\n$$\r\n<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>$v(S \\cup \\{i\\}) &#8211; v(S)$ is the <strong>marginal contribution<\/strong> of feature $i$ when added to $S$.<\/li>\n\n\n\n<li>The fraction is the weight over all possible orderings.<\/li>\n\n\n\n<li>The result is the average marginal contribution of $i$ across all coalitions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">A.3 The Four Axioms<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The Shapley value is provably the <strong>unique<\/strong> distribution rule satisfying these four axioms.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Axiom<\/th><th>Meaning<\/th><\/tr><\/thead><tbody><tr><td><strong>Efficiency<\/strong><\/td><td>The sum of all Shapley values equals the model output. Nothing is left undistributed.<\/td><\/tr><tr><td><strong>Symmetry<\/strong><\/td><td>Two features with identical contributions across all subsets receive identical Shapley values.<\/td><\/tr><tr><td><strong>Dummy<\/strong><\/td><td>A feature whose marginal contribution is zero in every coalition gets a Shapley value of zero.<\/td><\/tr><tr><td><strong>Additivity<\/strong><\/td><td>Shapley values of two combined models equal the sum of their individual Shapley values.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">A.4 SHAP (SHapley Additive exPlanations)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Lundberg &amp; Lee (2017) applied the Shapley value to Machine Learning (ML) prediction explanation. The mapping is: players \u2192 features, total reward \u2192 model prediction, fair share \u2192 contribution to that prediction. For a sample $x$:<\/p>\n\n\n\n<div style=\"background-color: #fff; border: none\">\r\n$$\r\nf(x) = \\phi_0 + \\sum_{i=1}^{n} \\phi_i\r\n$$\r\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Here $\\phi_0$ is the baseline (mean prediction with no features) and $\\phi_i$ is feature $i$&#8217;s SHAP value for this sample.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A.5 Why SHAP Resists Noise<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The <strong>Dummy<\/strong> axiom drives true noise features toward a SHAP value of zero \u2014 they never change predictions on average.<\/li>\n\n\n\n<li>The <strong>Symmetry<\/strong> axiom means cardinality does not inflate scores; equal contributions yield equal values regardless of split candidate counts.<\/li>\n\n\n\n<li><strong>Consistency<\/strong> (derived from Efficiency and Symmetry) ensures that increasing a feature&#8217;s role in the model never reduces its SHAP value, unlike default gain importance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">A.6 Limitations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compute cost<\/strong>: TreeSHAP runs in $O(TLD^2)$ where $T$ is the number of trees, $L$ the number of leaves, and $D$ the depth \u2014 heavy on large models.<\/li>\n\n\n\n<li><strong>Independence assumption<\/strong>: standard SHAP assumes feature independence; strong correlations create unrealistic coalitions. Use <em>Interventional SHAP<\/em> or <em>Conditional SHAP<\/em> as alternatives.<\/li>\n\n\n\n<li><strong>Local vs. global<\/strong>: SHAP is fundamentally local; aggregate $|\\phi_i|$ for a global view.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Appendix B. Cardinality Bias in Detail<\/h2>\n\n\n<style>.kadence-column6460_578edb-b2 > .kt-inside-inner-col,.kadence-column6460_578edb-b2 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_578edb-b2 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_578edb-b2 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_578edb-b2 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_578edb-b2 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_578edb-b2{position:relative;}.kadence-column6460_578edb-b2, .kt-inside-inner-col > .kadence-column6460_578edb-b2:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_578edb-b2 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_578edb-b2 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_578edb-b2\"><div class=\"kt-inside-inner-col\">\n<h3 class=\"wp-block-heading\">B.1 Definition<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cardinality bias (selection bias toward high-cardinality features)<\/strong>: because the number of split candidates in a tree model is proportional to the number of unique values in a feature, <strong>high-cardinality features can find a split that fits the training data by chance, inflating their importance even when they are unrelated to the target<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">B.2 What Cardinality Means<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sex: 2 values \u2192 low cardinality.<\/li>\n\n\n\n<li>Postal code: thousands of values \u2192 high cardinality.<\/li>\n\n\n\n<li>Continuous variables: nearly every value unique \u2192 very high cardinality.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">B.3 The Mechanism<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At each node, the tree tries every possible split point and picks the one with the highest gain.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>2 unique values: 1 candidate.<\/li>\n\n\n\n<li>1000 unique values: 999 candidates.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">More candidates means a higher chance that <strong>at least one accidentally fits the training data well<\/strong> \u2014 like buying 999 lottery tickets versus one. Even if that split captures pure noise, the gain is still measured as high.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">B.4 Symptoms<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>High-cardinality features get inflated importance even when unrelated to the target.<\/li>\n\n\n\n<li>Genuinely important low-cardinality features are systematically underrated.<\/li>\n\n\n\n<li>This is called <strong>cardinality bias<\/strong> or <strong>selection bias toward high-cardinality features<\/strong>. Strobl et al. (2007) formally reported it for Random Forest, and the same applies to GBM-family models.<\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n<style>.kadence-column6460_4079c3-72 > .kt-inside-inner-col,.kadence-column6460_4079c3-72 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column6460_4079c3-72 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column6460_4079c3-72 > .kt-inside-inner-col{flex-direction:column;}.kadence-column6460_4079c3-72 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column6460_4079c3-72 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column6460_4079c3-72{position:relative;}.kadence-column6460_4079c3-72, .kt-inside-inner-col > .kadence-column6460_4079c3-72:not(.specificity){margin-left:var(--global-kb-spacing-md, 2rem);}@media all and (max-width: 1024px){.kadence-column6460_4079c3-72 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column6460_4079c3-72 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column6460_4079c3-72\"><div class=\"kt-inside-inner-col\">\n<ul class=\"wp-block-list\">\n<li>Strobl, C., Boulesteix, A. L., Zeileis, A., &amp; Hothorn, T. (2007). <em>Bias in random forest variable importance measures: Illustrations, sources and a solution<\/em>. BMC Bioinformatics.<\/li>\n\n\n\n<li>Lundberg, S. M., &amp; Lee, S. I. (2017). <em>A Unified Approach to Interpreting Model Predictions<\/em> (SHAP). NeurIPS.<\/li>\n\n\n\n<li>Kursa, M. B., &amp; Rudnicki, W. R. (2010). <em>Feature Selection with the Boruta Package<\/em>. Journal of Statistical Software.<\/li>\n\n\n\n<li>Altmann, A., et al. (2010). <em>Permutation importance: a corrected feature importance measure<\/em>. Bioinformatics.<\/li>\n<\/ul>\n<\/div><\/div>\n<div style='text-align:center' class='yasr-auto-insert-overall'><\/div><div style='text-align:center' class='yasr-auto-insert-visitor'><\/div>","protected":false},"excerpt":{"rendered":"<p>When performing feature selection with tree-based models such as LightGBM (LGBM) or CatBoost, adding noise features to the existing set often causes truly important primary features to drop out of the importance ranking. This is not a data problem but a structural issue rooted in how tree models compete during training and how importance is&#8230;<\/p>\n","protected":false},"author":4,"featured_media":6467,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","yasr_overall_rating":0,"yasr_post_is_review":"","yasr_auto_insert_disabled":"","yasr_review_type":"","fifu_image_url":"","fifu_image_alt":"","iawp_total_views":0,"footnotes":""},"categories":[56],"tags":[],"class_list":["post-6460","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-slug"],"yasr_visitor_votes":{"stars_attributes":{"read_only":false,"span_bottom":false},"number_of_votes":1,"sum_votes":4},"jetpack_featured_media_url":"https:\/\/ykim.synology.me\/wordpress\/wp-content\/uploads\/2026\/04\/Backlit-Tree-Skeleton-at-Sunset-800x600px.png","_links":{"self":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6460","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/comments?post=6460"}],"version-history":[{"count":10,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6460\/revisions"}],"predecessor-version":[{"id":6482,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/posts\/6460\/revisions\/6482"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/media\/6467"}],"wp:attachment":[{"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/media?parent=6460"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/categories?post=6460"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ykim.synology.me\/wordpress\/wp-json\/wp\/v2\/tags?post=6460"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}