|

A Taxonomy of Manufacturing Big Data: Integrating Machine and Human Data

1. Introduction: The Missing Link in Smart Manufacturing

 Investment in smart manufacturing and big data analytics has expanded rapidly, yet the focus has remained almost exclusively on Machine Data—the data automatically generated by equipment and systems. Despite dramatic improvements in data volume, granularity, and infrastructure, gains in defect analysis and yield improvement have fallen short of expectations. The reason is clear.

The root cause of defects often lies not in machines but in human judgment and design intent.

  • What hypothesis drove this experiment?
  • Why was the recipe changed at this point in time?
  • On what basis was this lot held or released?
  • What correction was applied to this chamber after Preventive Maintenance (PM)?

 Such information remains scattered across engineers’ minds, emails, slide decks, and personal notes—never reaching the big data pipeline. No matter how rich the machine data, the absence of context for why a value turned out the way it did imposes a hard ceiling on causal inference and model learning.

 This report defines manufacturing big data as two equal branches—Machine Data and Human Data—and proposes a taxonomy that classifies each branch by the purpose and role of the data. The taxonomy is industry-agnostic and uses semiconductor manufacturing as the representative example.

2. Machine Data Taxonomy: The Physical Reality

 Machine Data is the output that systems and equipment generate automatically according to predefined logic. Machines are not decision-makers; they are executors following control algorithms, and they record the physical reality that results. Machine Data answers “What happened.”

2.1 Classification

MACHINE DATA
├─ Static     : Asset & Metadata
├─ Dynamic    : Operational & Trace Data
└─ Quality    : Metrology, Inspection
CategoryDefinitionCharacteristics
StaticEquipment and asset identity, configuration, and specifications—information that does not change or changes rarelyQuasi-static, reference
DynamicTime-dependent data generated during operation (sensors, events, state logs, Fault Detection and Classification (FDC) trace)High-frequency time-series
QualityInspection and metrology results—post-hoc verification of process outputDiscrete measurement, outcome-oriented

 The three categories operate on different time axes and serve different roles. Static answers “what exists,” Dynamic answers “how it ran,” and Quality answers “what the result was.” Together they form a complete description of the physical state of the process.

3. Human Data Taxonomy: The Engineering Intelligence

 Human Data is the data produced by the judgment, design intent, and experience of engineers. It captures “Why and how” and provides the context that gives machine data its meaning.

 In a Human-in-the-Loop (HITL) view, engineers infuse data with meaning by:

  • Interpreting sensor readings and judging normal versus anomalous
  • Forming hypotheses from process results and designing experiments
  • Making judgments and deciding actions when standards are violated
  • Learning equipment behavior through experience and applying corrections

 The artifacts of these activities form Human Data, classified by purpose into Baseline / Knob / Excursion / Maintenance.

3.1 Classification

HUMAN DATA
├─ Baseline
├─ Knob
│    ├─ Narrow DOE
│    └─ Wide DOE
├─ Excursion
└─ Maintenance
CategoryDefinitionRepresentative Assets
BaselineData that defines the standard stateRecipe, Spec/Limit, Standard Operating Procedure (SOP), Best Known Method (BKM)
KnobIntentional manipulation and exploration data — Design of Experiments (DOE)DOE plan, split table
├ Narrow DOEFine tuning and optimization
within a narrow range
Single step or parameter adjustment
└ Wide DOEExploration and screening
across a broad range
Multiple steps or parameters varied simultaneously
ExcursionResponse data for abnormal
situations
Disposition, troubleshooting, Non-Conformance Report / Corrective and Preventive Action (NCR/CAPA), Engineering Change Notice (ECN) / Engineering Change Request (ECR) / Engineering Information Notice (EIN)
MaintenanceMaintenance and management
activity data
PM records, parts replacement history, heuristic offsets

4. Integration Strategy: Synergy between Machine and Human

 Machine Data and Human Data are limited on their own; value emerges when they are combined. The practical significance of this taxonomy lies in how the two branches are paired.

4.1 Machine ↔ Human Correspondence

RoleMachine DataHuman Data
Standard / FixedStaticBaseline
Intentional VariationKnob
Routine OperationDynamic
Abnormal ResponseExcursion
MaintenanceMaintenance
Outcome VerificationQuality

4.2 Key Integration Scenarios

(1) Maintenance × Asset → Predictive Maintenance (PdM)

  • Combine asset/parts information with maintenance history (replacement cycles, correction history)
  • Match against degradation patterns in Dynamic trace to predict remaining useful life
  • Result: shift from scheduled maintenance to Condition-Based Maintenance (CBM) (Lee 2014)

(2) Knob (DOE) × Trace → Process Optimization and Virtual Metrology (VM)

  • Combine DOE intent (which variables were perturbed and how) with the corresponding Trace data
  • Enables input-output relationship modeling — the basis for VM, Advanced Process Control (APC), and soft sensors (Moyne 2012)
  • Result: improved experimental efficiency, reduced metrology load, automated process-window discovery

(3) Baseline × Dynamic/Quality → Drift Detection

  • Compare Baseline standards and control limits against real-time Dynamic/Quality data
  • Goes beyond classical Statistical Process Control (SPC) to detect changes in the distribution the model itself learned (Gama 2014)
  • Result: early detection of silent degradation and quiet distribution shifts; trigger for model retraining

(4) Excursion × Quality → Root Cause Analysis (RCA)

  • Link engineer judgments and corrective actions during excursions to Quality outcomes via lineage
  • Forms a learning corpus of “which action led to which result”
  • Result: automated troubleshooting recommendation, foundation for domain Large Language Model (LLM) training (Shintani 2021)

 These integrations only work when both branches are managed as equal-tier assets. If Human Data remains scattered across one-off documents, none of the combinations will function.

5. Conclusion: Toward Autonomous Process Control

 A truly autonomous factory becomes possible only when data is fully classified and integrated.

  • Machine Data alone reveals phenomena but not causes.
  • Human Data alone carries intent but cannot be verified.
  • When both branches are managed as equal-tier assets, causal inference, learning, and autonomous control begin to operate.

 The proposed taxonomy satisfies three criteria:

  • Consistent classification axis — both branches are organized by the purpose of the data.
  • Completeness — covers standards, intentional variation, abnormal response, maintenance, and verification.
  • Industry generality — applies to all manufacturing domains, including semiconductor.

 Defining and integrating Machine Data and Human Data as equal assets is the starting point for data-driven autonomous manufacturing.


References

  • Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37.
  • Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—Reviews, methodology and applications. Mechanical Systems and Signal Processing, 42(1–2), 314–334.
  • Moyne, J., & Iskandar, J. (2012). Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing. Processes, 5(3), 39.
  • Shintani, K., et al. (2021). Knowledge management and AI-driven assistants for semiconductor process engineering. IEEE Transactions on Semiconductor Manufacturing, 34(3), 312–321.
Our Score
Click to rate this post!
[Total: 1 Average: 4]
Visited 8 times, 1 visit(s) today

Leave a Comment

Your email address will not be published. Required fields are marked *