Experimental, AI‑assisted, data‑driven methodologies integrated into engineering platforms and supported by semiconductor, statistical, machine‑learning, and deep‑learning technologies to optimize semiconductor manufacturing across process, device, and yield development. The following are the key components of my work on AI‑Driven Engineering Platforms:
- AI-assisted software: AI-agent
- AI-assisted data analysis: yield analysis enabling yield-aware design and yieldable process/device
- Machine Learning: PCA, SVM, Bayesian Optimization
- Deep Learning: time-series data
- Statistical data analysis: Gaussian, Poisson, Order statistics, Extreme Value Distribution
- (Semiconductor) Technology-based analysis: Device physics, Small circuit simulation, Error propagation, Monte Carlo Simulation, DOE/RSM, Split-CV, Dielectric Conduction, Variability, BKM management, Soft/hard yield
- Full-stack web platform using WordPress, Flask, or Next.js
The motivation for technology convergence that integrates semiconductor technology with data science is that this convergence is essential for technology-aware software that enables:
- Advancing semiconductor technology
- Improving engineers’ productivity
- Creating a more fulfilling work environment.
A technology‑aware software engineer can deliver this integration effectively, since domain‑aware development fits well with Agile and DevOps practices.
A technology‑aware software tool provides several key benefits:
- Helps engineers quickly learn the legacy knowledge from previous technologies
- Enables engineers to absorb leading‑edge technology more effectively
- Speeds up computational workflows
- Ensures work is performed in a standardized manner
- Standardizes data by serving as a de facto specification
- Needs continuous improvement as the technology evolves, with pros and cons.
Applied Statistics
AI-assisted Semiconductor Development ….
Related Posts below (or view All Articles)
Categories = “Data Science, AI-powered, Applied Statistics”
By Wolf
Created: 2026.05.27 | Modified: 2026.05.27
A concise report on projecting Optuna’s best-so-far trajectory with four saturation curves. The method estimates the expected best metric after $K$ additional trials (forward) or the trials needed to reach…
Read More →By Wolf
Created: 2026.05.12 | Modified: 2026.05.16
wlzpoly is a Python package that decomposes N-point wafer thickness measurements into M Zernike polynomial coefficients using LSQ or Ridge regression with LOOCV-tuned regularization. It ships with a reproducible three-stage…
Read More →By Wolf
Created: 2026.05.09 | Modified: 2026.05.10
1. Introduction: The Missing Link in Smart Manufacturing Investment in smart manufacturing and big data analytics has expanded rapidly, yet the focus has remained almost exclusively on Machine Data—the data…
Read More →By Wolf
Created: 2026.05.08 | Modified: 2026.05.08
Introduction This document classifies reproducibility problems in Python Machine Learning (ML) pipelines into three chapters, plus a fourth chapter on diagnostic techniques: This classification aligns well with the Six Sigma…
Read More →By Wolf
Created: 2026.05.06 | Modified: 2026.05.06
This report surveys Polynomial Machine Learning (PML) at an introductory level. PML refers to the family of techniques that exploit higher-order and interaction terms of input variables to learn nonlinear…
Read More →By Wolf
Created: 2026.05.04 | Modified: 2026.05.06
Thickness uniformity in thin-film deposition determines downstream yield and device performance. Variation arises along two distinct axes — within a single wafer (Within-Wafer, WiW) and across wafers over time (Wafer-to-Wafer,…
Read More →By Wolf
Created: 2026.05.03 | Modified: 2026.05.04
Bottom Line Strictly speaking, no — but in practice, treat them as Out-of-Distribution (OOD). Missing-path samples in tree-based boosting models such as LightGBM, CatBoost, and XGBoost do not match the…
Read More →By Wolf
Created: 2026.05.03 | Modified: 2026.05.03
Machine learning (ML) models are designed under the assumption that the training distribution P_train equals the deployment distribution P_test. In reality, this assumption breaks frequently, causing sharp accuracy drops in…
Read More →By Wolf
Created: 2026.05.02 | Modified: 2026.05.02
This report analyzes why standard vectorization methods — statistical summary (mean/var/AUC), automatic feature extraction (tsfresh, catch22), convolutional representations (MiniRocket), and self-supervised embeddings (TS2Vec) — fail when the time series length…
Read More →By Wolf
Created: 2026.04.29 | Modified: 2026.04.29
1. Introduction This article summarizes how three popular gradient boosting libraries — LightGBM (Light Gradient Boosting Machine), XGBoost (Extreme Gradient Boosting), and CatBoost (Categorical Boosting) — handle missing values and…
Read More →By Wolf
Created: 2026.04.29 | Modified: 2026.04.29
When performing feature selection with tree-based models such as LightGBM (LGBM) or CatBoost, adding noise features to the existing set often causes truly important primary features to drop out of…
Read More →By Wolf
Created: 2026.04.25 | Modified: 2026.04.26
Bondi Iceberg pool 1. Introduction: R² and Its Relation to RSQ The coefficient of determination, denoted as R² (R-squared), is one of the most widely used validation metrics in statistics…
Read More →