mqtt_bus/addapters.md at main · bogdan/mqtt_bus

mqtt_bus / addapters.md
bogdan Initial commit: MQTT semantic bus specifications
1 contributor
868 lines | 24.041kb
Blame
Raw
# Adapter

## Definition

An **adapter** is a software component that translates data between an external system or device and the internal MQTT semantic bus used in the infrastructure.

The primary role of the adapter is to **normalize heterogeneous protocols, topic structures, and payload formats** into the canonical structure used by the system.

Adapters do not implement business logic, automation rules, aggregation, or storage. Their responsibility is strictly limited to protocol translation and semantic normalization.

Shared payload, metadata, quality, and operational namespace rules are defined in `mqtt_contract.md` and `sys_bus.md`.

Practical Node-RED conventions and worked examples are documented in `adapter_implementation_examples.md`.

---

## Purpose

In a heterogeneous environment (Zigbee, Tasmota, SNMP, Modbus, MikroTik APIs, custom firmware, etc.), devices publish data using incompatible conventions:

- different topic hierarchies
- different payload formats (numeric, text, JSON)
- inconsistent naming
- missing timestamps
- device‑specific semantics

Adapters isolate these differences and expose a **stable internal MQTT bus contract**.

---

## Architectural Position

Adapters operate at the ingress boundary of the system.

Pipeline:

Device / External System
    ↓
Vendor / Protocol Topics
    ↓
Adapter
    ↓
Canonical MQTT Bus
    ↓
Consumers (HomeKit, historian, automation, analytics)

---

## Responsibilities

An adapter may perform the following transformations:

1. Topic translation to bus contracts

Example:

zigbee2mqtt/bedroom_sensor/temperature
        ↓
vad/home/bedroom/temperature/bedroom-sensor/value

2. Payload normalization

Examples:

"23.4" → 23.4

{"temperature":23.4}
        ↓
23.4

3. Timestamp handling

If the device provides an observation timestamp, the adapter must preserve it.

If the device does not provide timestamps, the adapter may attach ingestion time.

4. Unit normalization

Example:

°F → °C

5. Identity normalization

Adapters map vendor identifiers to canonical IDs used by bus contracts.

6. Stream mapping

Adapters must route data to valid streams (`value`, `last`, `set`, `meta`, `availability`) according to bus rules. Legacy `state` and `event` topics remain compatibility-only during migration and SHOULD NOT be introduced by new adapters.

7. Historian policy projection

Adapters should publish enough retained `meta` for historian workers to ingest canonical topics without knowing vendor semantics.

---

## Internal Models vs MQTT Output

Adapters MAY construct richer internal objects during normalization.

Example internal normalization shape:

```json
{
  "sourcePayload": {
    "temperature": 23.6,
    "battery": 91
  },
  "normalizedLocation": "bedroom",
  "capability": "temperature",
  "deviceId": "bedroom-sensor",
  "stream": "value"
}
```

Rule:

- these structures MUST be ephemeral and limited to the normalization stage
- they MUST remain local to the normalization stage and MUST NOT be attached to the `msg` object that continues through the hot-path pipeline
- once normalization is complete, the adapter MUST publish MQTT-ready messages as early as possible
- before publishing to MQTT, the adapter MUST reduce the message to the canonical MQTT-ready form
- consumers on the semantic bus MUST NOT need to understand adapter-specific fields

At the publish boundary, the message SHOULD contain only:

- `msg.topic`
- `msg.payload`
- `msg.qos` when QoS is not configured statically on the MQTT node
- `msg.retain` when retain is not configured statically on the MQTT node

Adapters MUST NOT publish rich internal envelopes such as:

```json
{
  "topic": "vad/home/bedroom/temperature/bedroom-sensor/value",
  "payload": 23.6,
  "mapping": {
    "source_field": "temperature"
  },
  "normalizedBus": {
    "bus": "home",
    "stream": "value"
  },
  "sourcePayload": {
    "temperature": 23.6,
    "battery": 91
  },
  "internalContext": {
    "location": "bedroom"
  }
}
```

Those fields may exist inside adapter logic, but MUST be removed before publish.

---

## Fan-Out Pattern

Many ingress protocols emit a single inbound payload containing multiple metrics.

Examples:

- Zigbee2MQTT
- Modbus pollers
- SNMP collectors

Adapters SHOULD normalize those payloads using fan-out:

1 inbound message
        ↓
multiple canonical MQTT messages

Example inbound payload:

```json
{
  "temperature": 23.4,
  "humidity": 41
}
```

Canonical adapter output:

- `vad/home/bedroom/temperature/bedroom-sensor/value` -> `23.4`
- `vad/home/bedroom/humidity/bedroom-sensor/value` -> `41`

Rule:

- each emitted message MUST be independent and minimal
- fan-out SHOULD produce canonical MQTT-ready outputs directly rather than forwarding one rich `msg` object through multiple downstream stages
- metadata for each emitted metric belongs on the corresponding retained `meta` topic

---

## Multi‑Bus Capability Projection

Some physical devices expose multiple capabilities that belong to **different semantic domains**. In such cases an adapter may project different aspects of the same device onto different buses.

A common example is a **smart socket (smart plug)** which provides both:

- a controllable power switch
- energy measurement (power or accumulated energy)

These capabilities belong to different semantic models:

- switching is part of the **home automation model**
- energy measurement is part of the **energy telemetry model**

An adapter may therefore publish different streams derived from the same device to different buses.

Example:

Home bus (control semantics):

vad/home/living-room/power/tv/value
vad/home/living-room/power/tv/set

Energy bus (load telemetry):

vad/energy/load/living-room-entertainment/active_power/value
vad/energy/load/living-room-entertainment/energy_total/value

In this situation the adapter performs a **capability projection**, exposing each capability in the semantic domain where it belongs.

This approach prevents the spatial model of the home (`home` bus) from becoming coupled to the electrical topology represented by the `energy` bus, while still allowing a single physical device to participate in both domains.

Rule:

- projection across buses is allowed
- semantic duplication inside one bus should be avoided
- the same source field must not be published to multiple semantic meanings without an explicit reason

---

## Explicit Non‑Responsibilities

Adapters must NOT:

- implement HomeKit logic
- implement automation rules
- aggregate multiple sensors
- perform anomaly detection
- store historical data

These functions belong to other components in the system.

---

## Node-RED Execution Guidelines

Because adapters are implemented in Node-RED, the following constraints apply:

- Node-RED hot paths are not optimized for large per-message object graphs
- prefer deterministic `change`/`switch` mapping before custom `function` logic
- centralize topic and metric mapping in reusable subflows
- minimize per-message allocations on hot paths
- avoid large nested objects and rich per-message envelopes
- avoid heavy per-message JSON transform on high-rate telemetry
- use scalar payload on hot `value` streams and publish metadata on retained `meta`
- use retained `last` for cold-start samples that need timestamp/freshness evaluation
- keep live `value` streams lightweight and deduplicated where appropriate

---

## Consumer Adapters and Dynamic Subscriptions

Not all adapters are ingress adapters.

In practice, the system also includes consumer adapters that subscribe to canonical bus topics and project them into a downstream model such as HomeKit.

Examples:

- Device -> Protocol Adapter -> MQTT Bus
- MQTT Bus -> HomeKit Adapter -> HomeKit

Consumer adapters SHOULD follow these rules:

- consume only canonical bus topics
- keep consumer-specific logic out of the ingress adapter
- use retained `last` for bootstrap when startup state matters
- continue on live `value` after bootstrap
- unsubscribe from `last` only after bootstrap completeness is explicitly satisfied

Practical recommendation:

- if a Node-RED adapter node has a dedicated output for controlling a dynamic `mqtt in`, keep that control output as the last output when possible

This keeps semantic outputs stable and reduces rewiring churn.

---

## Dedicated MQTT Session Rule for Retained Bootstrap

If a consumer adapter depends on retained `last` for deterministic cold start, it SHOULD use a dedicated MQTT client session.

In Node-RED this means:

- create a dedicated `mqtt-broker` config node for that dynamic `mqtt in`
- do not share the same broker config with unrelated static or dynamic subscribers when retained bootstrap must be isolated

Reason:

- broker config nodes represent shared MQTT client sessions
- retained replay behavior is coupled to subscribe operations inside that session
- shared sessions make lifecycle-sensitive bootstrap behavior difficult to reason about

Observed failure mode:

- a static subscriber receives retained `last`
- a dynamic consumer using the same broker config subscribes later
- the consumer does not observe retained bootstrap deterministically
- later live updates republish `last`, masking the real issue

Rule:

- live-only telemetry consumers MAY share a broker config
- consumers that rely on retained bootstrap SHOULD NOT

---

## Dynamic `mqtt in` Control Recommendations

For Node-RED dynamic `mqtt in`, adapter control messages SHOULD remain simple and explicit.

Preferred pattern:

- one `subscribe` message per topic
- one `unsubscribe` message per topic

Example:

```json
{
  "action": "subscribe",
  "topic": "vad/home/balcon/+/south/last",
  "qos": 2,
  "rh": 0,
  "rap": true
}
```

```json
{
  "action": "subscribe",
  "topic": "vad/home/balcon/+/south/value",
  "qos": 2,
  "rh": 0,
  "rap": true
}
```

Recommendation:

- prefer separate control messages over multi-topic control payloads unless the exact runtime behavior has been verified on the target Node-RED version

See `adapter_implementation_examples.md` for the full flow pattern and debugging guidance.
- allow `last` to update whenever the latest observation timestamp changes, even if the scalar value is unchanged
- publish MQTT-ready messages as early as possible once normalization is complete
- keep temporary normalization structures in local variables or node-local context, not on forwarded `msg` objects
- delete temporary normalization fields before the MQTT publish node
- keep MQTT subscriptions narrow (avoid global `#` on hot pipelines)
- include error routing for malformed input and unknown mapping cases

Hot-path rule:

- the semantic bus is not a debugging channel
- adapters should emit MQTT-ready messages as the semantic boundary
- adapter internals belong in transient Node-RED state or under operational `sys` topics, not on bus payloads

Recommended final publish stage:

```javascript
return {
  topic: normalizedTopic,
  payload: normalizedValue
};
```

If an existing `msg` object must be reused:

```javascript
msg.topic = normalizedTopic;
msg.payload = normalizedValue;

delete msg.internal;
delete msg.internalContext;
delete msg.mapping;
delete msg.normalizedBus;
delete msg.sourcePayload;

return msg;
```

Operational requirement:

- malformed or unmappable messages SHOULD be published to `<site>/sys/adapter/<adapter_id>/dlq`
- adapter faults SHOULD be published to `<site>/sys/adapter/<adapter_id>/error`
- adapter liveness SHOULD be exposed on `<site>/sys/adapter/<adapter_id>/availability`
- adapter diagnostics and counters SHOULD be published to `<site>/sys/adapter/<adapter_id>/stats`

See `sys_bus.md` for the shared contract of these operational topics.

Operational recommendation:

- one ingress flow per protocol
- one shared normalization subflow per bus contract
- one egress flow for publish, with QoS/retain set per stream policy


## Mapping Registry

Adapters should be driven by declarative mapping data wherever possible.

Recommended mapping fields:

- `source_system`
- `source_topic_match`
- `source_field`
- `target_bus`
- `target_location` or `target_entity_id`
- `target_capability` or `target_metric`
- `target_device_id`
- `stream`
- `payload_profile`
- `unit`
- `historian_enabled`
- `historian_mode`

Example mapping entry for a Zigbee room sensor:

```json
{
  "source_system": "zigbee2mqtt",
  "source_topic_match": "zigbee2mqtt/bedroom_sensor",
  "source_field": "temperature",
  "target_bus": "home",
  "target_location": "bedroom",
  "target_capability": "temperature",
  "target_device_id": "bedroom-sensor",
  "stream": "value",
  "payload_profile": "scalar",
  "unit": "C",
  "historian_enabled": true,
  "historian_mode": "sample"
}
```

This keeps normalization logic deterministic and reviewable.

---

## Types of Adapters

Adapters are typically organized by protocol or subsystem.

Examples:

zigbee_adapter

Transforms Zigbee2MQTT topics into the canonical bus structure.

network_adapter

Transforms SNMP / router telemetry into the network bus.

energy_adapter

Normalizes inverter, meter, and battery telemetry.

vehicle_adapter

Normalizes EV charger and vehicle telemetry.


## Zigbee2MQTT Adapter Guidance

Zigbee2MQTT is expected to be one of the first major ingress sources for the new broker, so its projection rules should be explicit.

Source shape:

- base telemetry topic: `zigbee2mqtt/<friendly_name>`
- availability topic: `zigbee2mqtt/<friendly_name>/availability`
- payload: JSON object with one or more capability fields

Normalization rules:

- one inbound Z2M JSON payload may fan out into multiple canonical MQTT publications
- friendly names SHOULD follow the deterministic naming convention defined below so canonical IDs and locations can be derived directly from the topic path
- vendor names and IEEE addresses MUST be preserved in `meta.source_ref`, not exposed in canonical topic paths
- measurements, booleans, enums, and transition-like signals map to live `value`
- adapters SHOULD use `meta.historian.mode` to label whether a `value` stream is semantically a `sample`, `state`, or `event`
- adapters SHOULD deduplicate hot `value` publications when the semantic value did not change
- adapters SHOULD update retained `last` whenever the latest observed timestamp changes, even if the scalar value is unchanged

Recommended initial Z2M field mapping:

- `temperature` -> `home/.../temperature/.../value`
- `humidity` -> `home/.../humidity/.../value`
- `pressure` -> `home/.../pressure/.../value`
- `illuminance` -> `home/.../illuminance/.../value`
- `contact` -> `home/.../contact/.../value`
- `occupancy` from PIR devices -> `home/.../motion/.../value`
- `presence` from mmWave devices with `fading_time=0` -> `home/.../motion/.../value`
- `battery` -> `home/.../battery/.../value`
- `state` on smart plugs or switches -> `home/.../power/.../value`
- `power`, `energy`, `voltage`, `current` on smart plugs -> `energy/.../.../.../value`
- `action` -> `home/.../button/.../value`

mmWave presence handling:

- if a mmWave device emits `presence` while `fading_time=0`, adapters SHOULD treat it as raw motion detection and publish `motion`, not `presence`
- device-level `presence` SHOULD usually be derived above the adapter layer through fusion or higher-level logic
- adapters MAY publish `presence/.../value` directly only when the device is intentionally configured to expose held presence semantics, for example with non-zero `fading_time`

Fields that SHOULD NOT go to `home` by default:

- `linkquality`
- transport diagnostics
- adapter-local counters

Those belong on `sys` or a future `network` bus.

## Zigbee2MQTT Topic Structure for Deterministic Adapter Translation

In Zigbee2MQTT the primary telemetry topic structure is:

`zigbee2mqtt/<friendly_name>`

The MQTT prefix is controlled by the Zigbee2MQTT `base_topic` configuration parameter.

Default:

`zigbee2mqtt`

Zigbee2MQTT allows the `/` character inside `friendly_name`, which means the MQTT topic hierarchy can be controlled by the device name itself.

Example:

- friendly name: `kitchen/floor_light`
- resulting topic: `zigbee2mqtt/kitchen/floor_light`

This project uses that feature intentionally to encode semantic information directly into the Zigbee2MQTT topic path.

For Zigbee devices in this repository, the `friendly_name` MUST use the following structure:

`<device_type>/<site>/<location>/<device_id>`

Example:

- friendly name: `ZG-204ZV/vad/balcon/south`
- resulting topic: `zigbee2mqtt/ZG-204ZV/vad/balcon/south`

Segment meaning:

- `device_type`: hardware model or device class
- `site`: canonical site identifier
- `location`: canonical room/location identifier
- `device_id`: logical endpoint identifier within the location

### Adapter Translation Rationale

Structuring Zigbee2MQTT topics this way allows adapters to perform deterministic translation without lookup tables.

Example inbound topic:

`zigbee2mqtt/ZG-204ZV/vad/balcon/south`

Example payload:

```json
{ "illuminance": 704 }
```

Adapter output:

`vad/home/balcon/illuminance/south/value`

The adapter only needs to split the topic path and map payload fields to capabilities.

This aligns directly with the home bus contract defined in `home_bus.md`:

`<site>/home/<location>/<capability>/<device_id>/<stream>`

Example:

`vad/home/balcon/illuminance/south/value`

The Zigbee2MQTT topic structure intentionally mirrors the dimensions required by the semantic home bus grammar.

### Naming Rules

- `device_type` should correspond to the hardware model when possible
- `location` must match canonical location identifiers used by the home bus
- `device_id` must be unique within the location
- identifiers must follow MQTT naming rules defined in `mqtt_contract.md`
- identifiers should use lowercase kebab-case where possible

This approach removes location mapping tables from adapters, enables deterministic topic parsing, simplifies Node-RED flows, and allows wildcard subscriptions by device type.

Useful subscriptions:

- `zigbee2mqtt/+/+/+/+`
- `zigbee2mqtt/ZG-204ZV/#`

## Adapter Auto-Provisioning from Source Topics

The purpose of this section is to define how adapters automatically derive canonical MQTT bus topics from source topics without requiring a static mapping table.

The design goal is to keep adapters deterministic and lightweight while allowing configuration overrides for exceptional cases.

### Principle

Adapters SHOULD attempt to derive semantic dimensions directly from the inbound topic structure.

For Zigbee2MQTT the expected inbound topic format is:

`zigbee2mqtt/<device_type>/<site>/<location>/<device_id>`

Adapters MUST parse the topic segments and attempt to infer:

- source system
- device type
- site
- location
- device identifier

These values are then used to construct canonical bus topics.

Example inbound topic:

`zigbee2mqtt/ZG-204ZV/vad/balcon/south`

Parsed dimensions:

- `source = zigbee2mqtt`
- `device_type = ZG-204ZV`
- `site = vad`
- `location = balcon`
- `device_id = south`

Example inbound payload:

```json
{ "illuminance": 704 }
```

Adapter output:

`vad/home/balcon/illuminance/south/value`

### Provisioning Algorithm

Adapters SHOULD follow the following processing order:

1. Parse topic segments.
2. Attempt direct semantic mapping.
3. Apply configuration overrides.
4. Apply default values if required fields are missing.

This ensures deterministic behavior while allowing controlled deviations.

### Configuration Overrides

Adapters MUST support a configuration object allowing explicit overrides.

Overrides allow correcting cases where the inbound topic does not match the canonical semantic model.

Example override configuration:

```json
{
  "location_map": {
    "balcony": "balcon"
  },
  "bus_override": {
    "power": "energy"
  },
  "device_id_map": {
    "south": "radar-south"
  }
}
```

Override categories:

- `location_map`
  Maps inbound location identifiers to canonical location identifiers.
- `bus_override`
  Overrides which semantic bus a capability should be published to.
- `device_id_map`
  Renames device identifiers when canonical IDs differ from source IDs.

### Default Values

If a semantic dimension cannot be derived from the topic and no override exists, adapters MUST fall back to defaults.

Recommended defaults:

- `site = configured adapter site id`
- `bus = home`
- `location = unknown`
- `device_id = device_type`
- `stream = value`

Example fallback result:

`vad/home/unknown/temperature/ZG-204ZV/value`

These defaults ensure the adapter continues operating even when topic information is incomplete.

### Node-RED Implementation Guidance

Adapters implemented in Node-RED SHOULD:

- parse topic segments using deterministic split logic
- apply overrides via a centralized configuration object
- avoid dynamic lookup tables in hot paths
- emit canonical MQTT topics as early as possible
- publish malformed or unmappable inputs to:

`<site>/sys/adapter/<adapter_id>/dlq`

Adapter errors SHOULD be published to:

`<site>/sys/adapter/<adapter_id>/error`

This keeps semantic bus traffic deterministic while still exposing operational diagnostics.

### Architectural Rationale

Automatic provisioning from topic structure provides several advantages:

- eliminates large static mapping tables
- allows new devices to appear without manual configuration
- keeps Node-RED adapter logic simple
- keeps canonical bus structure predictable
- allows targeted overrides only when necessary

This approach maintains the principle that adapters perform normalization rather than interpretation.


## Historian Testability

Adapters should support a replay-friendly execution model so historian testing does not depend on live device timing.

Recommended modes:

- live consume from vendor topics
- replay recorded vendor fixtures
- synthetic generate canonical samples

For historian bootstrap, adapters SHOULD be able to emit:

- retained `last` for streams that must support cold start from the bus
- retained `meta`
- retained `availability`
- realistic live `value` traffic together with retained `last`
- dual projection for devices such as smart plugs

---

## Relationship with HomeKit

HomeKit integration is implemented through a **separate adapter layer** that consumes the canonical MQTT bus.

Pipeline:

Device → Protocol Adapter → MQTT Bus → HomeKit Adapter → HomeKit

This separation prevents HomeKit constraints from leaking into protocol translation logic.

---

## Design Principles

Adapters should follow several key principles:

1. Deterministic behavior

The same input must always produce the same output.

2. Stateless processing

Adapters should avoid maintaining internal state unless strictly required.

3. Minimal transformation

Adapters should only normalize data, not reinterpret it.

4. Idempotent operation

Repeated messages should not produce inconsistent system state.

5. Contract-first mapping

Adapters must validate outgoing topics against the target bus grammar.

6. Low-overhead runtime

Adapter behavior should minimize CPU and memory cost in Node-RED hot paths.

7. Operational transparency

Adapter failures must be observable without inspecting raw vendor topics.

---

## Example

Vendor message:

Topic:

zigbee2mqtt/bedroom_sensor

Payload:

{
  "temperature": 23.4,
  "humidity": 41
}

Adapter output:

Topic:

vad/home/bedroom/temperature/bedroom-sensor/value

Payload:

23.4

and

Topic:

vad/home/bedroom/humidity/bedroom-sensor/value

Payload:

41

The final published bus message should be MQTT-ready and minimal, for example:

Topic:

vad/home/balcon/illuminance/radar-south/value

Payload:

1315

It should NOT include diagnostic wrappers, mapping objects, or source payload copies.

If the same source is a smart plug, additional output may also be emitted on the `energy` bus:

Topic:

vad/energy/load/bedroom-heater/active_power/value

Payload:

126.8

---

This abstraction allows the rest of the system to operate independently of the device ecosystem and vendor protocols.