1. Introduction: The Missing Link in Smart Manufacturing

Investment in smart manufacturing and big data analytics has expanded rapidly, yet the focus has remained almost exclusively on Machine Data—the data automatically generated by equipment and systems. Despite dramatic improvements in data volume, granularity, and infrastructure, gains in defect analysis and yield improvement have fallen short of expectations. The reason is clear.

The root cause of defects often lies not in machines but in human judgment and design intent.

What hypothesis drove this experiment?
Why was the recipe changed at this point in time?
On what basis was this lot held or released?
What correction was applied to this chamber after Preventive Maintenance (PM)?

Such information remains scattered across engineers’ minds, emails, slide decks, and personal notes—never reaching the big data pipeline. No matter how rich the machine data, the absence of context for why a value turned out the way it did imposes a hard ceiling on causal inference and model learning.

This report defines manufacturing big data as two equal branches—Machine Data and Human Data—and proposes a taxonomy that classifies each branch by the purpose and role of the data. The taxonomy is industry-agnostic and uses semiconductor manufacturing as the representative example.

2. Machine Data Taxonomy: The Physical Reality

Machine Data is the output that systems and equipment generate automatically according to predefined logic. Machines are not decision-makers; they are executors following control algorithms, and they record the physical reality that results. Machine Data answers “What happened.”

2.1 Classification

MACHINE DATA
├─ Static     : Asset & Metadata
├─ Dynamic    : Operational & Trace Data
└─ Quality    : Metrology, Inspection

Category	Definition	Characteristics
Static	Equipment and asset identity, configuration, and specifications—information that does not change or changes rarely	Quasi-static, reference
Dynamic	Time-dependent data generated during operation (sensors, events, state logs, Fault Detection and Classification (FDC) trace)	High-frequency time-series
Quality	Inspection and metrology results—post-hoc verification of process output	Discrete measurement, outcome-oriented

The three categories operate on different time axes and serve different roles. Static answers “what exists,” Dynamic answers “how it ran,” and Quality answers “what the result was.” Together they form a complete description of the physical state of the process.

3. Human Data Taxonomy: The Engineering Intelligence

Human Data is the data produced by the judgment, design intent, and experience of engineers. It captures “Why and how” and provides the context that gives machine data its meaning.

In a Human-in-the-Loop (HITL) view, engineers infuse data with meaning by:

Interpreting sensor readings and judging normal versus anomalous
Forming hypotheses from process results and designing experiments
Making judgments and deciding actions when standards are violated
Learning equipment behavior through experience and applying corrections

The artifacts of these activities form Human Data, classified by purpose into Baseline / Knob / Excursion / Maintenance.

3.1 Classification

HUMAN DATA
├─ Baseline
├─ Knob
│    ├─ Narrow DOE
│    └─ Wide DOE
├─ Excursion
└─ Maintenance

Category	Definition	Representative Assets
Baseline	Data that defines the standard state	Recipe, Spec/Limit, Standard Operating Procedure (SOP), Best Known Method (BKM)
Knob	Intentional manipulation and exploration data — Design of Experiments (DOE)	DOE plan, split table
├ Narrow DOE	Fine tuning and optimization within a narrow range	Single step or parameter adjustment
└ Wide DOE	Exploration and screening across a broad range	Multiple steps or parameters varied simultaneously
Excursion	Response data for abnormal situations	Disposition, troubleshooting, Non-Conformance Report / Corrective and Preventive Action (NCR/CAPA), Engineering Change Notice (ECN) / Engineering Change Request (ECR) / Engineering Information Notice (EIN)
Maintenance	Maintenance and management activity data	PM records, parts replacement history, heuristic offsets

4. Integration Strategy: Synergy between Machine and Human

Machine Data and Human Data are limited on their own; value emerges when they are combined. The practical significance of this taxonomy lies in how the two branches are paired.

4.1 Machine ↔ Human Correspondence

Role	Machine Data	Human Data
Standard / Fixed	Static	Baseline
Intentional Variation	—	Knob
Routine Operation	Dynamic	—
Abnormal Response	—	Excursion
Maintenance	—	Maintenance
Outcome Verification	Quality	—

4.2 Key Integration Scenarios

(1) Maintenance × Asset → Predictive Maintenance (PdM)

Combine asset/parts information with maintenance history (replacement cycles, correction history)
Match against degradation patterns in Dynamic trace to predict remaining useful life
Result: shift from scheduled maintenance to Condition-Based Maintenance (CBM) (Lee 2014)

(2) Knob (DOE) × Trace → Process Optimization and Virtual Metrology (VM)

Combine DOE intent (which variables were perturbed and how) with the corresponding Trace data
Enables input-output relationship modeling — the basis for VM, Advanced Process Control (APC), and soft sensors (Moyne 2012)
Result: improved experimental efficiency, reduced metrology load, automated process-window discovery

(3) Baseline × Dynamic/Quality → Drift Detection

Compare Baseline standards and control limits against real-time Dynamic/Quality data
Goes beyond classical Statistical Process Control (SPC) to detect changes in the distribution the model itself learned (Gama 2014)
Result: early detection of silent degradation and quiet distribution shifts; trigger for model retraining

(4) Excursion × Quality → Root Cause Analysis (RCA)

Link engineer judgments and corrective actions during excursions to Quality outcomes via lineage
Forms a learning corpus of “which action led to which result”
Result: automated troubleshooting recommendation, foundation for domain Large Language Model (LLM) training (Shintani 2021)

These integrations only work when both branches are managed as equal-tier assets. If Human Data remains scattered across one-off documents, none of the combinations will function.

5. Conclusion: Toward Autonomous Process Control

A truly autonomous factory becomes possible only when data is fully classified and integrated.

Machine Data alone reveals phenomena but not causes.
Human Data alone carries intent but cannot be verified.
When both branches are managed as equal-tier assets, causal inference, learning, and autonomous control begin to operate.

The proposed taxonomy satisfies three criteria:

Consistent classification axis — both branches are organized by the purpose of the data.
Completeness — covers standards, intentional variation, abnormal response, maintenance, and verification.
Industry generality — applies to all manufacturing domains, including semiconductor.

Defining and integrating Machine Data and Human Data as equal assets is the starting point for data-driven autonomous manufacturing.

References

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37.
Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications. Mechanical Systems and Signal Processing, 42(1–2), 314–334.
Moyne, J., & Iskandar, J. (2012). Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing. Processes, 5(3), 39.
Shintani, K., et al. (2021). Knowledge management and AI-driven assistants for semiconductor process engineering. IEEE Transactions on Semiconductor Manufacturing, 34(3), 312–321.

Our Score

Click to rate this post!

[Total: 1 Average: 4]

Visited 8 times, 1 visit(s) today

A Taxonomy of Manufacturing Big Data: Integrating Machine and Human Data

1. Introduction: The Missing Link in Smart Manufacturing

2. Machine Data Taxonomy: The Physical Reality

2.1 Classification

3. Human Data Taxonomy: The Engineering Intelligence

3.1 Classification

4. Integration Strategy: Synergy between Machine and Human

4.1 Machine ↔ Human Correspondence

4.2 Key Integration Scenarios

5. Conclusion: Toward Autonomous Process Control

References

Leave a Comment Cancel reply

Visitor

Post

About Me

Contact