top of page

Building a Unified Machine Data Layer Without Ripping SCADA

  • michaelsedique
  • Sep 14
  • 7 min read

Executive Summary

machine data layer turns raw signals from PLCs, robots, sensors, and SCADA into consistent, trustworthy production information that any application can reuse. Instead of replacing what already works, you insert a semantic layer—aligned to ISA-95—that standardizes names, units, states, and lineage. With a data-ops backbone such as Artisan Edge, plants can publish clean, well-modeled metrics to OEE, alarm triage, CMMS, and traceability systems without rip-and-replace projects. This guide defines the data layer, explains the problems it solves, lays out design principles, and provides an 8–10 week rollout plan, acceptance criteria, and a governance blueprint to keep the model accurate as you scale.


Table of Contents

  1. What is a Machine Data Layer?

  2. Problems It Solves (Business & Technical)

  3. Design Principles (Canonical Naming, States, Units, Lineage)

  4. Reference Architecture (Brownfield-First)

  5. KPI Lineage in Practice (OEE as an Example)

  6. Implementation Plan (8–10 Weeks)

  7. Data Quality Gates & Governance

  8. Integration Patterns (OEE, Alarms, CMMS, Traceability)

  9. Storage & Retention Tiers

  10. Security & Access Control

  11. Validation Metrics & Acceptance Criteria

  12. Common Pitfalls & Mitigations

  13. Implementation Checklist

  14. FAQs


1) What is a Machine Data Layer?

Definition. A machine data layer is a semantic and operational layer that normalizes tags into modeled assets, states, and metrics, then exposes them via stable contracts to downstream apps. It sits between connectivity (OPC UA, MQTT Sparkplug, proprietary drivers) and applications (OEE dashboards, alarm triage, CMMS, traceability, analytics).

Key attributes

  • Canonical semantics: Consistent names, units, and state enumerations across lines and sites.

  • Lineage: Transparent mapping from source tags → engineered metrics → KPIs.

  • Reusability: Multiple apps consume the same trustworthy metric definitions.

  • Change control: Versioned schemas and topic contracts to prevent breakage.

Where Artisan Edge fits: it acts as the data-ops backbone, enforcing naming conventions, modeling signals into an ISA-95-aligned schema, and routing clean, versioned metrics to OEE/CMMS/analytics—while SCADA and HMIs continue their roles for visualization and control.


2) Problems It Solves (Business & Technical)

Business

  • Multiple truths:

    • Plants report different KPI values for the same shift, undermining credibility.

  • Pilot purgatory:

    • One-off integrations do not scale because each app re-plumbs raw tags.

  • Slow changeovers:

    • Recipe parameter drift and unclear data definitions cause scrap and delays.

Technical

  • Shadow data:

    • CSV exports and spreadsheets introduce latency and errors.

  • Inconsistent semantics:

    • Tags vary in names, units, and state meanings across cells.

  • Opaque lineage:

    • KPIs lack traceability to source signals, making audits difficult.

  • Tight coupling:

    • Downstream apps depend directly on device specifics, i.e., brittle point-to-points.


3) Design Principles (Canonical Naming, States, Units, Lineage)

  • 3.1 Canonical naming

  • Adopt a disciplined, human-readable pattern:

site/area/line/cell/asset/metric
  • Rules

    • Lower-case; underscores for readability; SI units; avoid vendor abbreviations.

    • Use nouns for assets and nouns_or_verb-nouns for metrics (i.e., runtime_seconds, good_count, scrap_reason).

    • Keep names stable; version schemas rather than silently renaming signals.

  • 3.2 State modeling

    • Define explicit enumerations for machine and line states, i.e., run, idle, fault, changeover, maintenance, setup. Map vendor-specific codes to these canonical states and record reason codes for changes (i.e., fault class, changeover type).

  • 3.3 Units & calibration

    • Maintain a unit registry (SI preferred) and apply conversions at ingestion once. Record calibration metadata for sensors so KPIs can reflect metrological confidence when needed.

  • 3.4 Data lineage

    • Store lineage alongside metrics: source tag/topic → transformation → version → consumer KPI. Lineage enables audits, reconciles differences, and builds trust with leadership.


4) Reference Architecture (Brownfield-First)

Layer

Purpose

Examples (i.e.)

Notes

Connectivity

Access raw signals

OPC UA browse; MQTT Sparkplug topics; serial/fieldbus via gateway

Maintain control-adjacent actions local to the cell.

Data-Ops Backbone

Ingest, buffer, normalize

Artisan Edge; store-and-forward; schema enforcement

Applies naming, units, states, and validation.

Semantic Model

ISA-95 equipment/material/process

Site/Area/Line/Cell/Asset hierarchies; material genealogy

The source of truth; versioned and governed.

Applications

OEE, alarms, CMMS, traceability, analytics

Dashboards, APIs, work order creation, genealogy services

Consume modeled metrics with lineage.

Security & Governance

Segmentation, identity, review

Mutual TLS, ACLs, RBAC; schema/topic change control

Treat topics/schemas as productized APIs.

Protocol stance: keep OPC UA near equipment for structured access; publish normalized metrics over MQTT Sparkplug for decoupled distribution across apps and sites.


5) KPI Lineage in Practice (OEE as an Example)

  • Availability

runtime / planned_time
  • Excluding canonical downtime states (i.e., planned maintenance, meetings, breaks).

    • Lineage: runtime derived from canonical states; planned_time from schedule.

  • Performance

actual_output / ideal_output
  • With ideal_output computed from cycle time × planned_time for the active SKU.

    • Lineage: actual counts from modeled metrics; ideal from product catalog.

  • Quality

good_output / total_output
  • With scrap reasons captured in canonical codes.

    • Lineage: good/total counts from modeled counters; scrap reason from event stream.

  • Governance tip: 

    • Publish OEE as derived metrics with versioned formulas so downstream tools cannot redefine KPIs ad hoc.


6) Implementation Plan (8–10 Weeks)

Phase 1 (Weeks 1–2) — Discovery & Standards

  • Inventory lines, controllers, robots, sensors, protocols, and tag counts.

  • Draft naming conventionsstate enumerationsunit registry, and topic patterns.

  • Choose priority metrics (i.e., counts, states, downtime reasons, defects) per line.

  • Establish owners for schemas and topics; set review cadence.

Phase 2 (Weeks 3–4) — Minimal Viable Connectivity

  • Stand up an edge gateway/broker with OPC UA and MQTT Sparkplug.

  • Publish 20–40 high-value metrics per line under disciplined namespaces.

  • Enable store-and-forward; verify lossless behavior during planned disconnects.

  • Enforce time synchronization (NTP/PTP) and record event time vs processing time.

Phase 3 (Weeks 5–6) — Semantic Modeling & First Consumers

  • Map tags into an ISA-95-aligned model (equipment/material/process).

  • Implement OEE with visible data lineage.

  • Wire alarm triage with hysteresis/deadbands; target nuisance reduction ≥50%.

  • Integrate CMMS so alerts can auto-create work orders with asset context.

Phase 4 (Weeks 7–8) — Data Quality Gates & Acceptance

  • Run soak tests (latency, freshness, loss) and a failover drill (broker restart, WAN loss).

  • Audit KPI variance vs manual checks; target ≤1%.

  • Remediate gaps; lock v1.0 of the model and topic contracts.

Phase 5 (Weeks 9–10) — Replicate & Harden

  • Templatize models for similar lines; add CI linting for topics and schemas.

  • Document ownershipversioning, and change control.

  • Publish a consumer guide describing contracts and lineage.


7) Data Quality Gates & Governance

Gates before production

  • Freshness: ≥ 98% of priority topics within SLO.

  • Completeness: No required metrics missing for operational KPIs.

  • Validity: Units, ranges, and enumerations pass validation checks.

  • Lineage: Every KPI links to versioned formulas and source metrics.

Governance

  • Treat topics/schemas as APIs with owners and versioning.

  • Require pull-request reviews for schema changes; maintain a changelog.

  • Quarterly model drift reviews with OT/IT, reliability, and operations.


8) Integration Patterns (OEE, Alarms, CMMS, Traceability)

  • OEE dashboards: Subscribe to modeled counts/states; compute OEE with published formulas to avoid divergence.

  • Alarm triage: Consume canonical state transitions and event streams; apply hysteresis and deadbands centrally, i.e., do not duplicate rules in each app.

  • CMMS integration: An event pipeline raises work orders with device context, recent metrics, and recommended priority.

  • Traceability: Serialize materials and maintain genealogy edges from modeled events, i.e., station start/stop, parameter snapshots, inspection results.


9) Storage & Retention Tiers

  • Hot (edge): 7–30 days of priority topics for survivability and quick investigations.

  • Warm (cloud/site): 3–12 months for KPI trends and reliability analysis.

  • Cold (object storage): 1–7 years for compliance and long-horizon forensics.

  • Downsampling: Maintain 1 s → 5 s → 60 s rollups with provenance so analytics can choose appropriate granularity.


10) Security & Access Control

  • Segmentation: Isolate OT zones from IT; restrict conduits and east–west pathways.

  • Identity: Mutual TLS, certificate rotation, and per-topic ACLs with least privilege.

  • Secrets: Centralize secrets; avoid credentials embedded in PLC logic.

  • Audit: Log schema/topic changes and access; implement quarterly access reviews.


11) Validation Metrics & Acceptance Criteria

Performance & Reliability

  • Latency: p95 end-to-end < 500 ms for priority metrics.

  • Freshness: ≥ 98% within SLO; loss < 0.01% during soak and failover tests.

  • Recovery: Subscribers regain correct context within seconds via retained descriptors.

Business Outcomes

  • OEE trust: Dashboard vs manual audit variance ≤ 1%.

  • Alarm hygiene: Nuisance alarms ↓ ≥ 50% post-tuning.

  • Workflow closure: Alert → CMMS work order < 5 minutes with correct asset context.

  • Scale-out speed: Time to onboard a new line/app ↓ ≥ 50% vs baseline.


12) Common Pitfalls & Mitigations

  • Namespace sprawl (different teams inventing names).

    • Mitigation: Central stewardship, linting, and mandatory reviews for new topics.

  • Duplicate truths (apps redefining KPIs).

    • Mitigation: Publish KPIs in the model with versioned formulas; downstream apps consume rather than recompute.

  • Over-modeling early (blocking progress).

    • Mitigation: Model the vital few metrics first, i.e., counts, states, downtime reasons; iterate.

  • Ignoring time bases (event vs processing time).

    • Mitigation: Enforce NTP/PTP; carry event_time in schemas so consumers reconcile reorderings.

  • Security drift (stale certs and wide ACLs).

    • Mitigation: Rotation schedules, least privilege, and quarterly access audits.


13) Implementation Checklist

  •  Naming convention, unit registry, and state enumerations published

  •  Edge gateway/broker live with OPC UA + MQTT Sparkplug

  •  20–40 high-value metrics per line streaming with store-and-forward

  •  ISA-95-aligned semantic model v1.0 approved; ownership assigned

  •  OEE with lineage live; alarm policies tuned; nuisance ↓ ≥ 50%

  •  CMMS integration live (alert → work order < 5 minutes)

  •  Soak test and failover drill results documented; gaps remediated

  •  Consumer guide and change-control workflow published


14) FAQs

  • Q1. Do we need to replace SCADA or historians?

    • No. Keep SCADA/HMI for visualization and control. The data layer coexists, normalizes signals, and routes them to downstream apps.

  • Q2. How do we handle legacy protocols?

    • Use gateways to translate to OPC UA and MQTT Sparkplug. Start with high-value signals and expand incrementally.

  • Q3. Where do we compute KPIs—edge or cloud?

    • Both. Compute near the edge for operational KPIs and low-latency views; aggregate in cloud for cross-site analytics and governance.

  • Q4. How do we keep models from drifting?

    • Version schemas, enforce reviews, and schedule quarterly drift checks with OT/IT and operations leaders.

  • Q5. What is the smallest viable scope?

    • One bottleneck line with 20–40 metrics, i.e., counts, states, downtime reasons, alarms—sufficient for OEE, alarm triage, and CMMS.


Book a Data-Model Workshop with Artisan Technologies. 

We will map one line, publish a schema, and deliver a first OEE with lineage in 30 days—no rip-and-replace required.


Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.
bottom of page