Building a Unified Machine Data Layer Without Ripping SCADA

michaelsedique
Sep 14
7 min read

Executive Summary

A machine data layer turns raw signals from PLCs, robots, sensors, and SCADA into consistent, trustworthy production information that any application can reuse. Instead of replacing what already works, you insert a semantic layer—aligned to ISA-95—that standardizes names, units, states, and lineage. With a data-ops backbone such as Artisan Edge, plants can publish clean, well-modeled metrics to OEE, alarm triage, CMMS, and traceability systems without rip-and-replace projects. This guide defines the data layer, explains the problems it solves, lays out design principles, and provides an 8–10 week rollout plan, acceptance criteria, and a governance blueprint to keep the model accurate as you scale.

Table of Contents

What is a Machine Data Layer?
Problems It Solves (Business & Technical)
Design Principles (Canonical Naming, States, Units, Lineage)
Reference Architecture (Brownfield-First)
KPI Lineage in Practice (OEE as an Example)
Implementation Plan (8–10 Weeks)
Data Quality Gates & Governance
Integration Patterns (OEE, Alarms, CMMS, Traceability)
Storage & Retention Tiers
Security & Access Control
Validation Metrics & Acceptance Criteria
Common Pitfalls & Mitigations
Implementation Checklist
FAQs

1) What is a Machine Data Layer?

Definition. A machine data layer is a semantic and operational layer that normalizes tags into modeled assets, states, and metrics, then exposes them via stable contracts to downstream apps. It sits between connectivity (OPC UA, MQTT Sparkplug, proprietary drivers) and applications (OEE dashboards, alarm triage, CMMS, traceability, analytics).

Key attributes

Canonical semantics: Consistent names, units, and state enumerations across lines and sites.
Lineage: Transparent mapping from source tags → engineered metrics → KPIs.
Reusability: Multiple apps consume the same trustworthy metric definitions.
Change control: Versioned schemas and topic contracts to prevent breakage.

Where Artisan Edge fits: it acts as the data-ops backbone, enforcing naming conventions, modeling signals into an ISA-95-aligned schema, and routing clean, versioned metrics to OEE/CMMS/analytics—while SCADA and HMIs continue their roles for visualization and control.

2) Problems It Solves (Business & Technical)

Business

Multiple truths:
- Plants report different KPI values for the same shift, undermining credibility.
Pilot purgatory:
- One-off integrations do not scale because each app re-plumbs raw tags.
Slow changeovers:
- Recipe parameter drift and unclear data definitions cause scrap and delays.

Technical

Shadow data:
- CSV exports and spreadsheets introduce latency and errors.
Inconsistent semantics:
- Tags vary in names, units, and state meanings across cells.
Opaque lineage:
- KPIs lack traceability to source signals, making audits difficult.
Tight coupling:
- Downstream apps depend directly on device specifics, i.e., brittle point-to-points.

3) Design Principles (Canonical Naming, States, Units, Lineage)

3.1 Canonical naming
Adopt a disciplined, human-readable pattern:

site/area/line/cell/asset/metric

Rules
- Lower-case; underscores for readability; SI units; avoid vendor abbreviations.
- Use nouns for assets and nouns_or_verb-nouns for metrics (i.e., runtime_seconds, good_count, scrap_reason).
- Keep names stable; version schemas rather than silently renaming signals.
3.2 State modeling
- Define explicit enumerations for machine and line states, i.e., run, idle, fault, changeover, maintenance, setup. Map vendor-specific codes to these canonical states and record reason codes for changes (i.e., fault class, changeover type).
3.3 Units & calibration
- Maintain a unit registry (SI preferred) and apply conversions at ingestion once. Record calibration metadata for sensors so KPIs can reflect metrological confidence when needed.
3.4 Data lineage
- Store lineage alongside metrics: source tag/topic → transformation → version → consumer KPI. Lineage enables audits, reconciles differences, and builds trust with leadership.

4) Reference Architecture (Brownfield-First)

Layer	Purpose	Examples (i.e.)	Notes
Connectivity	Access raw signals	OPC UA browse; MQTT Sparkplug topics; serial/fieldbus via gateway	Maintain control-adjacent actions local to the cell.
Data-Ops Backbone	Ingest, buffer, normalize	Artisan Edge; store-and-forward; schema enforcement	Applies naming, units, states, and validation.
Semantic Model	ISA-95 equipment/material/process	Site/Area/Line/Cell/Asset hierarchies; material genealogy	The source of truth; versioned and governed.
Applications	OEE, alarms, CMMS, traceability, analytics	Dashboards, APIs, work order creation, genealogy services	Consume modeled metrics with lineage.
Security & Governance	Segmentation, identity, review	Mutual TLS, ACLs, RBAC; schema/topic change control	Treat topics/schemas as productized APIs.

Protocol stance: keep OPC UA near equipment for structured access; publish normalized metrics over MQTT Sparkplug for decoupled distribution across apps and sites.

5) KPI Lineage in Practice (OEE as an Example)

Availability

runtime / planned_time

Excluding canonical downtime states (i.e., planned maintenance, meetings, breaks).
- Lineage: runtime derived from canonical states; planned_time from schedule.
Performance

actual_output / ideal_output

With ideal_output computed from cycle time × planned_time for the active SKU.
- Lineage: actual counts from modeled metrics; ideal from product catalog.
Quality

good_output / total_output

With scrap reasons captured in canonical codes.
- Lineage: good/total counts from modeled counters; scrap reason from event stream.
Governance tip:
- Publish OEE as derived metrics with versioned formulas so downstream tools cannot redefine KPIs ad hoc.

6) Implementation Plan (8–10 Weeks)

Phase 1 (Weeks 1–2) — Discovery & Standards

Inventory lines, controllers, robots, sensors, protocols, and tag counts.
Draft naming conventions, state enumerations, unit registry, and topic patterns.
Choose priority metrics (i.e., counts, states, downtime reasons, defects) per line.
Establish owners for schemas and topics; set review cadence.

Phase 2 (Weeks 3–4) — Minimal Viable Connectivity

Stand up an edge gateway/broker with OPC UA and MQTT Sparkplug.
Publish 20–40 high-value metrics per line under disciplined namespaces.
Enable store-and-forward; verify lossless behavior during planned disconnects.
Enforce time synchronization (NTP/PTP) and record event time vs processing time.

Phase 3 (Weeks 5–6) — Semantic Modeling & First Consumers

Map tags into an ISA-95-aligned model (equipment/material/process).
Implement OEE with visible data lineage.
Wire alarm triage with hysteresis/deadbands; target nuisance reduction ≥50%.
Integrate CMMS so alerts can auto-create work orders with asset context.

Phase 4 (Weeks 7–8) — Data Quality Gates & Acceptance

Run soak tests (latency, freshness, loss) and a failover drill (broker restart, WAN loss).
Audit KPI variance vs manual checks; target ≤1%.
Remediate gaps; lock v1.0 of the model and topic contracts.

Phase 5 (Weeks 9–10) — Replicate & Harden

Templatize models for similar lines; add CI linting for topics and schemas.
Document ownership, versioning, and change control.
Publish a consumer guide describing contracts and lineage.

7) Data Quality Gates & Governance

Gates before production

Freshness: ≥ 98% of priority topics within SLO.
Completeness: No required metrics missing for operational KPIs.
Validity: Units, ranges, and enumerations pass validation checks.
Lineage: Every KPI links to versioned formulas and source metrics.

Governance

Treat topics/schemas as APIs with owners and versioning.
Require pull-request reviews for schema changes; maintain a changelog.
Quarterly model drift reviews with OT/IT, reliability, and operations.

8) Integration Patterns (OEE, Alarms, CMMS, Traceability)

OEE dashboards: Subscribe to modeled counts/states; compute OEE with published formulas to avoid divergence.
Alarm triage: Consume canonical state transitions and event streams; apply hysteresis and deadbands centrally, i.e., do not duplicate rules in each app.
CMMS integration: An event pipeline raises work orders with device context, recent metrics, and recommended priority.
Traceability: Serialize materials and maintain genealogy edges from modeled events, i.e., station start/stop, parameter snapshots, inspection results.

9) Storage & Retention Tiers

Hot (edge): 7–30 days of priority topics for survivability and quick investigations.
Warm (cloud/site): 3–12 months for KPI trends and reliability analysis.
Cold (object storage): 1–7 years for compliance and long-horizon forensics.
Downsampling: Maintain 1 s → 5 s → 60 s rollups with provenance so analytics can choose appropriate granularity.

10) Security & Access Control

Segmentation: Isolate OT zones from IT; restrict conduits and east–west pathways.
Identity: Mutual TLS, certificate rotation, and per-topic ACLs with least privilege.
Secrets: Centralize secrets; avoid credentials embedded in PLC logic.
Audit: Log schema/topic changes and access; implement quarterly access reviews.

11) Validation Metrics & Acceptance Criteria

Performance & Reliability

Latency: p95 end-to-end < 500 ms for priority metrics.
Freshness: ≥ 98% within SLO; loss < 0.01% during soak and failover tests.
Recovery: Subscribers regain correct context within seconds via retained descriptors.

Business Outcomes

OEE trust: Dashboard vs manual audit variance ≤ 1%.
Alarm hygiene: Nuisance alarms ↓ ≥ 50% post-tuning.
Workflow closure: Alert → CMMS work order < 5 minutes with correct asset context.
Scale-out speed: Time to onboard a new line/app ↓ ≥ 50% vs baseline.

12) Common Pitfalls & Mitigations

Namespace sprawl (different teams inventing names).
- Mitigation: Central stewardship, linting, and mandatory reviews for new topics.
Duplicate truths (apps redefining KPIs).
- Mitigation: Publish KPIs in the model with versioned formulas; downstream apps consume rather than recompute.
Over-modeling early (blocking progress).
- Mitigation: Model the vital few metrics first, i.e., counts, states, downtime reasons; iterate.
Ignoring time bases (event vs processing time).
- Mitigation: Enforce NTP/PTP; carry event_time in schemas so consumers reconcile reorderings.
Security drift (stale certs and wide ACLs).
- Mitigation: Rotation schedules, least privilege, and quarterly access audits.

13) Implementation Checklist

Naming convention, unit registry, and state enumerations published
Edge gateway/broker live with OPC UA + MQTT Sparkplug
20–40 high-value metrics per line streaming with store-and-forward
ISA-95-aligned semantic model v1.0 approved; ownership assigned
OEE with lineage live; alarm policies tuned; nuisance ↓ ≥ 50%
CMMS integration live (alert → work order < 5 minutes)
Soak test and failover drill results documented; gaps remediated
Consumer guide and change-control workflow published

14) FAQs

Q1. Do we need to replace SCADA or historians?
- No. Keep SCADA/HMI for visualization and control. The data layer coexists, normalizes signals, and routes them to downstream apps.
Q2. How do we handle legacy protocols?
- Use gateways to translate to OPC UA and MQTT Sparkplug. Start with high-value signals and expand incrementally.
Q3. Where do we compute KPIs—edge or cloud?
- Both. Compute near the edge for operational KPIs and low-latency views; aggregate in cloud for cross-site analytics and governance.
Q4. How do we keep models from drifting?
- Version schemas, enforce reviews, and schedule quarterly drift checks with OT/IT and operations leaders.
Q5. What is the smallest viable scope?
- One bottleneck line with 20–40 metrics, i.e., counts, states, downtime reasons, alarms—sufficient for OEE, alarm triage, and CMMS.

Book a Data-Model Workshop with Artisan Technologies.

We will map one line, publish a schema, and deliver a first OEE with lineage in 30 days—no rip-and-replace required.

Building a Unified Machine Data Layer Without Ripping SCADA

Recent Posts

Comments