Experimental, AI‑assisted, data‑driven methodologies integrated into engineering platforms and supported by semiconductor, statistical, machine‑learning, and deep‑learning technologies to optimize semiconductor manufacturing across process, device, and yield development. The following are the key components of my work on AI‑Driven Engineering Platforms:

AI-assisted software: AI-agent
AI-assisted data analysis: yield analysis enabling yield-aware design and yieldable process/device
- Machine Learning: PCA, SVM, Bayesian Optimization
- Deep Learning: time-series data
Statistical data analysis: Gaussian, Poisson, Order statistics, Extreme Value Distribution
(Semiconductor) Technology-based analysis: Device physics, Small circuit simulation, Error propagation, Monte Carlo Simulation, DOE/RSM, Split-CV, Dielectric Conduction, Variability, BKM management, Soft/hard yield
Full-stack web platform using WordPress, Flask, or Next.js

The motivation for technology convergence that integrates semiconductor technology with data science is that this convergence is essential for technology-aware software that enables:

Advancing semiconductor technology
Improving engineers’ productivity
Creating a more fulfilling work environment.

A technology‑aware software engineer can deliver this integration effectively, since domain‑aware development fits well with Agile and DevOps practices.

A technology‑aware software tool provides several key benefits:

Helps engineers quickly learn the legacy knowledge from previous technologies
Enables engineers to absorb leading‑edge technology more effectively
Speeds up computational workflows
Ensures work is performed in a standardized manner
Standardizes data by serving as a de facto specification
Needs continuous improvement as the technology evolves, with pros and cons.

Applied Statistics

AI-assisted Semiconductor Development ….

Related Posts below (or view All Articles)

Categories = “Data Science, AI-powered, Applied Statistics”

Label Engineering | Data Science | Hyperparameter | Tree Based Model

Optuna Metric Projection

By Wolf
Created: 2026.05.27 | Modified: 2026.05.27

A concise report on projecting Optuna’s best-so-far trajectory with four saturation curves. The method estimates the expected best metric after $K$ additional trials (forward) or the trials needed to reach…

Data Science | Feature Engineering | Label Engineering

Wafer Level Zernike Polynomials

By Wolf
Created: 2026.05.12 | Modified: 2026.05.16

wlzpoly is a Python package that decomposes N-point wafer thickness measurements into M Zernike polynomial coefficients using LSQ or Ridge regression with LOOCV-tuned regularization. It ships with a reproducible three-stage…

Data Science | Pipeline

A Taxonomy of Manufacturing Big Data: Integrating Machine and Human Data

By Wolf
Created: 2026.05.09 | Modified: 2026.05.10

1. Introduction: The Missing Link in Smart Manufacturing Investment in smart manufacturing and big data analytics has expanded rapidly, yet the focus has remained almost exclusively on Machine Data—the data…

Data Science | Pipeline

Python ML Pipeline Reproducibility — Field Notes

By Wolf
Created: 2026.05.08 | Modified: 2026.05.08

Introduction This document classifies reproducibility problems in Python Machine Learning (ML) pipelines into three chapters, plus a fourth chapter on diagnostic techniques: This classification aligns well with the Six Sigma…

Semiconductor | AI-powered | Data Science | Label Engineering

An Introductory Survey on Polynomial Machine Learning: Taxonomic Axes and Hierarchical Levels

By Wolf
Created: 2026.05.06 | Modified: 2026.05.06

This report surveys Polynomial Machine Learning (PML) at an introductory level. PML refers to the family of techniques that exploit higher-order and interaction terms of input variables to learn nonlinear…

Data Science | AI-powered | Label Engineering | Semiconductor | Tree Based Model

Modeling Thickness Variation in Semiconductor Thin-Film Processes — A Spatial Decomposition Approach to Machine Learning (ML)

By Wolf
Created: 2026.05.04 | Modified: 2026.05.06

Thickness uniformity in thin-film deposition determines downstream yield and device performance. Variation arises along two distinct axes — within a single wafer (Within-Wafer, WiW) and across wafers over time (Wafer-to-Wafer,…

Data Science

Are Missing-Path Samples in Tree-Based Models OOD?

By Wolf
Created: 2026.05.03 | Modified: 2026.05.04

Bottom Line Strictly speaking, no — but in practice, treat them as Out-of-Distribution (OOD). Missing-path samples in tree-based boosting models such as LightGBM, CatBoost, and XGBoost do not match the…

Data Science

A Taxonomy of ML Model Failures in the Training-Testing Gap

By Wolf
Created: 2026.05.03 | Modified: 2026.05.03

Machine learning (ML) models are designed under the assumption that the training distribution P_train equals the deployment distribution P_test. In reality, this assumption breaks frequently, causing sharp accuracy drops in…

Data Science | Feature Engineering | Time Series

Why Raw Vectorization Is the Right Choice for Ultra-Short Time Series (T ≤ 10)

By Wolf
Created: 2026.05.02 | Modified: 2026.05.02

This report analyzes why standard vectorization methods — statistical summary (mean/var/AUC), automatic feature extraction (tsfresh, catch22), convolutional representations (MiniRocket), and self-supervised embeddings (TS2Vec) — fail when the time series length…

Data Science

Missing Values and Unknown Categories in Gradient Boosting Libraries

By Wolf
Created: 2026.04.29 | Modified: 2026.04.29

1. Introduction This article summarizes how three popular gradient boosting libraries — LightGBM (Light Gradient Boosting Machine), XGBoost (Extreme Gradient Boosting), and CatBoost (Categorical Boosting) — handle missing values and…

Data Science

Noise-Induced Instability in Tree-based Feature Selection: Root Causes and Robust Countermeasures

By Wolf
Created: 2026.04.29 | Modified: 2026.04.29

When performing feature selection with tree-based models such as LightGBM (LGBM) or CatBoost, adding noise features to the existing set often causes truly important primary features to drop out of…

Data Science | Evaluation Metric

Centered R² vs Uncentered R²

By Wolf
Created: 2026.04.25 | Modified: 2026.04.26

Bondi Iceberg pool 1. Introduction: R² and Its Relation to RSQ The coefficient of determination, denoted as R² (R-squared), is one of the most widely used validation metrics in statistics…

1 2 3 4 Next »

Data Science and AI Integration

Applied Statistics

AI-assisted Semiconductor Development ….

Related Posts below (or view All Articles)

Optuna Metric Projection

Wafer Level Zernike Polynomials

A Taxonomy of Manufacturing Big Data: Integrating Machine and Human Data

Python ML Pipeline Reproducibility — Field Notes

An Introductory Survey on Polynomial Machine Learning: Taxonomic Axes and Hierarchical Levels

Modeling Thickness Variation in Semiconductor Thin-Film Processes — A Spatial Decomposition Approach to Machine Learning (ML)

Are Missing-Path Samples in Tree-Based Models OOD?

A Taxonomy of ML Model Failures in the Training-Testing Gap

Why Raw Vectorization Is the Right Choice for Ultra-Short Time Series (T ≤ 10)

Missing Values and Unknown Categories in Gradient Boosting Libraries

Noise-Induced Instability in Tree-based Feature Selection: Root Causes and Robust Countermeasures

Centered R² vs Uncentered R²

Visitor

Post

About Me

Contact