Building a Unified Machine Data Layer Without Ripping SCADA
- michaelsedique
- Sep 14
- 7 min read
Executive Summary
A machine data layer turns raw signals from PLCs, robots, sensors, and SCADA into consistent, trustworthy production information that any application can reuse. Instead of replacing what already works, you insert a semantic layer—aligned to ISA-95—that standardizes names, units, states, and lineage. With a data-ops backbone such as Artisan Edge, plants can publish clean, well-modeled metrics to OEE, alarm triage, CMMS, and traceability systems without rip-and-replace projects. This guide defines the data layer, explains the problems it solves, lays out design principles, and provides an 8–10 week rollout plan, acceptance criteria, and a governance blueprint to keep the model accurate as you scale.
Table of Contents
What is a Machine Data Layer?
Problems It Solves (Business & Technical)
Design Principles (Canonical Naming, States, Units, Lineage)
Reference Architecture (Brownfield-First)
KPI Lineage in Practice (OEE as an Example)
Implementation Plan (8–10 Weeks)
Data Quality Gates & Governance
Integration Patterns (OEE, Alarms, CMMS, Traceability)
Storage & Retention Tiers
Security & Access Control
Validation Metrics & Acceptance Criteria
Common Pitfalls & Mitigations
Implementation Checklist
FAQs
1) What is a Machine Data Layer?
Definition. A machine data layer is a semantic and operational layer that normalizes tags into modeled assets, states, and metrics, then exposes them via stable contracts to downstream apps. It sits between connectivity (OPC UA, MQTT Sparkplug, proprietary drivers) and applications (OEE dashboards, alarm triage, CMMS, traceability, analytics).
Key attributes
Canonical semantics: Consistent names, units, and state enumerations across lines and sites.
Lineage: Transparent mapping from source tags → engineered metrics → KPIs.
Reusability: Multiple apps consume the same trustworthy metric definitions.
Change control: Versioned schemas and topic contracts to prevent breakage.
Where Artisan Edge fits: it acts as the data-ops backbone, enforcing naming conventions, modeling signals into an ISA-95-aligned schema, and routing clean, versioned metrics to OEE/CMMS/analytics—while SCADA and HMIs continue their roles for visualization and control.
2) Problems It Solves (Business & Technical)
Business
Multiple truths:
Plants report different KPI values for the same shift, undermining credibility.
Pilot purgatory:
One-off integrations do not scale because each app re-plumbs raw tags.
Slow changeovers:
Recipe parameter drift and unclear data definitions cause scrap and delays.
Technical
Shadow data:
CSV exports and spreadsheets introduce latency and errors.
Inconsistent semantics:
Tags vary in names, units, and state meanings across cells.
Opaque lineage:
KPIs lack traceability to source signals, making audits difficult.
Tight coupling:
Downstream apps depend directly on device specifics, i.e., brittle point-to-points.
3) Design Principles (Canonical Naming, States, Units, Lineage)
3.1 Canonical naming
Adopt a disciplined, human-readable pattern:
site/area/line/cell/asset/metric
Rules
Lower-case; underscores for readability; SI units; avoid vendor abbreviations.
Use nouns for assets and nouns_or_verb-nouns for metrics (i.e., runtime_seconds, good_count, scrap_reason).
Keep names stable; version schemas rather than silently renaming signals.
3.2 State modeling
Define explicit enumerations for machine and line states, i.e., run, idle, fault, changeover, maintenance, setup. Map vendor-specific codes to these canonical states and record reason codes for changes (i.e., fault class, changeover type).
3.3 Units & calibration
Maintain a unit registry (SI preferred) and apply conversions at ingestion once. Record calibration metadata for sensors so KPIs can reflect metrological confidence when needed.
3.4 Data lineage
Store lineage alongside metrics: source tag/topic → transformation → version → consumer KPI. Lineage enables audits, reconciles differences, and builds trust with leadership.
4) Reference Architecture (Brownfield-First)
Layer | Purpose | Examples (i.e.) | Notes |
Connectivity | Access raw signals | OPC UA browse; MQTT Sparkplug topics; serial/fieldbus via gateway | Maintain control-adjacent actions local to the cell. |
Data-Ops Backbone | Ingest, buffer, normalize | Artisan Edge; store-and-forward; schema enforcement | Applies naming, units, states, and validation. |
Semantic Model | ISA-95 equipment/material/process | Site/Area/Line/Cell/Asset hierarchies; material genealogy | The source of truth; versioned and governed. |
Applications | OEE, alarms, CMMS, traceability, analytics | Dashboards, APIs, work order creation, genealogy services | Consume modeled metrics with lineage. |
Security & Governance | Segmentation, identity, review | Mutual TLS, ACLs, RBAC; schema/topic change control | Treat topics/schemas as productized APIs. |
Protocol stance: keep OPC UA near equipment for structured access; publish normalized metrics over MQTT Sparkplug for decoupled distribution across apps and sites.
5) KPI Lineage in Practice (OEE as an Example)
Availability
runtime / planned_time
Excluding canonical downtime states (i.e., planned maintenance, meetings, breaks).
Lineage: runtime derived from canonical states; planned_time from schedule.
Performance
actual_output / ideal_output
With ideal_output computed from cycle time × planned_time for the active SKU.
Lineage: actual counts from modeled metrics; ideal from product catalog.
Quality
good_output / total_output
With scrap reasons captured in canonical codes.
Lineage: good/total counts from modeled counters; scrap reason from event stream.
Governance tip:
Publish OEE as derived metrics with versioned formulas so downstream tools cannot redefine KPIs ad hoc.
6) Implementation Plan (8–10 Weeks)
Phase 1 (Weeks 1–2) — Discovery & Standards
Inventory lines, controllers, robots, sensors, protocols, and tag counts.
Draft naming conventions, state enumerations, unit registry, and topic patterns.
Choose priority metrics (i.e., counts, states, downtime reasons, defects) per line.
Establish owners for schemas and topics; set review cadence.
Phase 2 (Weeks 3–4) — Minimal Viable Connectivity
Stand up an edge gateway/broker with OPC UA and MQTT Sparkplug.
Publish 20–40 high-value metrics per line under disciplined namespaces.
Enable store-and-forward; verify lossless behavior during planned disconnects.
Enforce time synchronization (NTP/PTP) and record event time vs processing time.
Phase 3 (Weeks 5–6) — Semantic Modeling & First Consumers
Map tags into an ISA-95-aligned model (equipment/material/process).
Implement OEE with visible data lineage.
Wire alarm triage with hysteresis/deadbands; target nuisance reduction ≥50%.
Integrate CMMS so alerts can auto-create work orders with asset context.
Phase 4 (Weeks 7–8) — Data Quality Gates & Acceptance
Run soak tests (latency, freshness, loss) and a failover drill (broker restart, WAN loss).
Audit KPI variance vs manual checks; target ≤1%.
Remediate gaps; lock v1.0 of the model and topic contracts.
Phase 5 (Weeks 9–10) — Replicate & Harden
Templatize models for similar lines; add CI linting for topics and schemas.
Document ownership, versioning, and change control.
Publish a consumer guide describing contracts and lineage.
7) Data Quality Gates & Governance
Gates before production
Freshness: ≥ 98% of priority topics within SLO.
Completeness: No required metrics missing for operational KPIs.
Validity: Units, ranges, and enumerations pass validation checks.
Lineage: Every KPI links to versioned formulas and source metrics.
Governance
Treat topics/schemas as APIs with owners and versioning.
Require pull-request reviews for schema changes; maintain a changelog.
Quarterly model drift reviews with OT/IT, reliability, and operations.
8) Integration Patterns (OEE, Alarms, CMMS, Traceability)
OEE dashboards: Subscribe to modeled counts/states; compute OEE with published formulas to avoid divergence.
Alarm triage: Consume canonical state transitions and event streams; apply hysteresis and deadbands centrally, i.e., do not duplicate rules in each app.
CMMS integration: An event pipeline raises work orders with device context, recent metrics, and recommended priority.
Traceability: Serialize materials and maintain genealogy edges from modeled events, i.e., station start/stop, parameter snapshots, inspection results.
9) Storage & Retention Tiers
Hot (edge): 7–30 days of priority topics for survivability and quick investigations.
Warm (cloud/site): 3–12 months for KPI trends and reliability analysis.
Cold (object storage): 1–7 years for compliance and long-horizon forensics.
Downsampling: Maintain 1 s → 5 s → 60 s rollups with provenance so analytics can choose appropriate granularity.
10) Security & Access Control
Segmentation: Isolate OT zones from IT; restrict conduits and east–west pathways.
Identity: Mutual TLS, certificate rotation, and per-topic ACLs with least privilege.
Secrets: Centralize secrets; avoid credentials embedded in PLC logic.
Audit: Log schema/topic changes and access; implement quarterly access reviews.
11) Validation Metrics & Acceptance Criteria
Performance & Reliability
Latency: p95 end-to-end < 500 ms for priority metrics.
Freshness: ≥ 98% within SLO; loss < 0.01% during soak and failover tests.
Recovery: Subscribers regain correct context within seconds via retained descriptors.
Business Outcomes
OEE trust: Dashboard vs manual audit variance ≤ 1%.
Alarm hygiene: Nuisance alarms ↓ ≥ 50% post-tuning.
Workflow closure: Alert → CMMS work order < 5 minutes with correct asset context.
Scale-out speed: Time to onboard a new line/app ↓ ≥ 50% vs baseline.
12) Common Pitfalls & Mitigations
Namespace sprawl (different teams inventing names).
Mitigation: Central stewardship, linting, and mandatory reviews for new topics.
Duplicate truths (apps redefining KPIs).
Mitigation: Publish KPIs in the model with versioned formulas; downstream apps consume rather than recompute.
Over-modeling early (blocking progress).
Mitigation: Model the vital few metrics first, i.e., counts, states, downtime reasons; iterate.
Ignoring time bases (event vs processing time).
Mitigation: Enforce NTP/PTP; carry event_time in schemas so consumers reconcile reorderings.
Security drift (stale certs and wide ACLs).
Mitigation: Rotation schedules, least privilege, and quarterly access audits.
13) Implementation Checklist
Naming convention, unit registry, and state enumerations published
Edge gateway/broker live with OPC UA + MQTT Sparkplug
20–40 high-value metrics per line streaming with store-and-forward
ISA-95-aligned semantic model v1.0 approved; ownership assigned
OEE with lineage live; alarm policies tuned; nuisance ↓ ≥ 50%
CMMS integration live (alert → work order < 5 minutes)
Soak test and failover drill results documented; gaps remediated
Consumer guide and change-control workflow published
14) FAQs
Q1. Do we need to replace SCADA or historians?
No. Keep SCADA/HMI for visualization and control. The data layer coexists, normalizes signals, and routes them to downstream apps.
Q2. How do we handle legacy protocols?
Use gateways to translate to OPC UA and MQTT Sparkplug. Start with high-value signals and expand incrementally.
Q3. Where do we compute KPIs—edge or cloud?
Both. Compute near the edge for operational KPIs and low-latency views; aggregate in cloud for cross-site analytics and governance.
Q4. How do we keep models from drifting?
Version schemas, enforce reviews, and schedule quarterly drift checks with OT/IT and operations leaders.
Q5. What is the smallest viable scope?
One bottleneck line with 20–40 metrics, i.e., counts, states, downtime reasons, alarms—sufficient for OEE, alarm triage, and CMMS.
Book a Data-Model Workshop with Artisan Technologies.
We will map one line, publish a schema, and deliver a first OEE with lineage in 30 days—no rip-and-replace required.
Comments