mqtt_bus / README.md
1 contributor
275 lines | 8.951kb
# Home MQTT Semantic Bus

## Overview

This project defines the architecture and conventions used to build a semantic MQTT bus for a heterogeneous home infrastructure.

The environment includes multiple device ecosystems and protocols such as Zigbee, custom ESP firmware, network telemetry, energy systems, and HomeKit integrations. These systems publish data using incompatible topic structures and payload formats.

The purpose of this repository is to define a canonical internal structure that allows all telemetry, events, and states to be normalized and consumed by multiple systems.

The MQTT bus acts as the central integration layer between devices and higher-level services.

Primary documents:

- `consolidated_spec.md`: consolidated reference linking all specs, decisions, and end-to-end traces
- `mqtt_contract.md`: shared transport, payload, metadata, and historian rules
- `sys_bus.md`: operational namespace for adapters, workers, and infrastructure components
- `home_bus.md`: room-centric semantic bus contract
- `energy_bus.md`: electrical telemetry bus contract
- `addapters.md`: adapter responsibilities and normalization rules
- `adapter_implementation_examples.md`: practical Node-RED adapter patterns, flow integration guidance, and known failure modes
- `historian_worker.md`: historian worker responsibilities for consuming buses and writing to PostgreSQL
- `tdb_ingestion/mqtt_ingestion_api.md`: PostgreSQL historian ingestion API contract for numeric and boolean measurements
- `tdb_ingestion/counter_ingestion_api.md`: counter ingestion API contract (stabilized, not yet implemented)

---

## Architectural Model

The architecture separates five fundamental layers:

Device Layer

Devices publish telemetry using vendor-specific protocols and topic structures.

Examples:

- Zigbee2MQTT
- Tasmota
- ESP firmware
- SNMP
- MikroTik APIs
- Modbus energy meters

Protocol Adapter Layer

Adapters translate vendor-specific topics and payloads into canonical MQTT bus contracts.

Adapters perform only normalization and protocol translation.

They must not implement automation logic, aggregation, or persistence.

MQTT Semantic Bus

The canonical model is implemented as multiple semantic buses (for example `home`, `energy`, `network`), each with a strict domain contract.

All higher-level services consume data from this layer.

The bus is intentionally lightweight: canonical publications must remain minimal, MQTT-ready messages rather than rich adapter envelopes.

Historian Worker Layer

Historian persistence is handled by a worker that subscribes to canonical bus topics and writes them into PostgreSQL using the historian ingestion API.

Consumer Layer

Multiple systems consume the bus simultaneously:

- HomeKit integration
- Historian (time-series storage)
- Aggregators
- Automation logic
- Dashboards and monitoring

Pipeline:

Device -> Protocol Adapter -> MQTT Bus -> Historian Worker / Other Consumers

---

## The Standardization Problem

IoT ecosystems lack a common telemetry model.

Different devices publish data using incompatible conventions:

- inconsistent topic hierarchies
- different payload formats (numeric, text, JSON)
- different naming schemes
- missing timestamps
- device-specific semantics

This lack of standardization creates several problems:

- difficult automation
- complex integrations
- duplicated parsing logic
- unreliable historical analysis

The semantic MQTT bus solves this by enforcing strict internal addressing contracts per bus.

Adapters isolate vendor inconsistencies and expose normalized data to the rest of the system.

---

## Shared Contract Baseline (v1)

Each bus defines its own topic grammar, but all buses inherit the same shared contract from `mqtt_contract.md`.

The shared contract defines:

- the common stream taxonomy (`value`, `last`, `set`, `meta`, `availability`)
- payload profiles (`scalar` and `envelope`)
- retained metadata structure
- time semantics (`observed_at`, `published_at`, `ingested_at`)
- quality states
- operational topics under `<site>/sys/...` with detailed rules in `sys_bus.md`
- historian defaults

Semantic categories such as `sample`, `state`, and `event` are carried by `meta.historian.mode`, not by introducing separate live stream names in v1.

This keeps ingestion simple and predictable while allowing low-overhead Node-RED flows.

---

## Node-RED Translation Constraints

Protocol translation is implemented in Node-RED.

To keep flow cost low and determinism high:

- keep topic shapes stable and predictable
- avoid expensive JSON transforms in high-rate paths
- publish repeated metadata on retained `meta` topics
- publish canonical MQTT-ready messages as early as possible after normalization
- keep hot-path messages minimal at publish time: `topic`, `payload`, and stream-policy QoS/retain only
- do not carry adapter-internal normalization structures on forwarded `msg` objects
- delete temporary adapter fields before MQTT publish
- do not use semantic bus topics as a debugging channel
- use reusable normalization subflows and centralized mapping tables
- avoid broad `#` subscriptions on high-volume paths

These constraints are reflected in each bus specification.

---

## Operational Separation

The new broker is treated as a clean semantic boundary.

Production-facing legacy topics may continue to exist temporarily, but adapters should normalize data into the new broker namespace without leaking old topic structures into the canonical contract.

The target split is:

- legacy broker and vendor topics remain compatibility surfaces
- the new broker hosts the semantic buses and adapter operational topics
- historian and future consumers subscribe only to canonical topics

---

## Historian Integration

One of the primary consumers of the bus is the historian.

The historian records time-series measurements for long-term analysis.

Typical use cases include:

- temperature history
- energy production and consumption
- network traffic metrics
- device performance monitoring

The historian does not communicate directly with devices.

Instead, it subscribes to normalized bus topics.

Current ingestion modeling is split in two:

- numeric and boolean measurements or states go through `tdb_ingestion/mqtt_ingestion_api.md`
- cumulative counters such as `energy_total` follow the separate contract in `tdb_ingestion/counter_ingestion_api.md` (stabilized, not yet implemented)

Example subscriptions:

- `+/home/+/+/+/value`
- `+/energy/+/+/+/value`

This architecture ensures that historical data remains consistent even when devices or protocols change.

---

## Project Goals

The project aims to achieve the following objectives:

1. Define a stable MQTT semantic architecture for home infrastructure.
2. Decouple device protocols from automation and monitoring systems.
3. Enable multiple independent consumers of telemetry data.
4. Provide consistent topic contracts across heterogeneous systems.
5. Support scalable integration of additional device ecosystems.
6. Enable long-term historical analysis of telemetry.
7. Simplify integration with HomeKit and other user interfaces.
8. Make historian ingestion generic enough to reuse across buses.
9. Keep room for future buses without reworking existing consumers.

---

## Core Concepts

Adapters

Components that translate between systems and canonical bus contracts.

In practice there are two useful classes:

- ingress adapters: vendor/protocol topics -> canonical bus topics
- consumer adapters: canonical bus topics -> downstream consumer models such as HomeKit

Buses

Domain-specific normalized telemetry spaces (for example `home`, `energy`, `network`).

Streams

Named data flows associated with a capability or metric (`value`, `last`, `set`, `meta`, `availability`).

Semantic interpretation such as `sample`, `state`, or `event` is carried by retained `meta`, especially `meta.historian.mode`.

Consumers

Systems that subscribe to the bus and process the data.

---

## Design Principles

Protocol isolation

Device ecosystems must not leak their internal topic structure into the system.

Contract-driven addressing

All normalized telemetry must follow explicit per-bus topic contracts.

Loose coupling

Consumers must not depend on specific device implementations.

Extensibility

New buses, locations, devices, and metrics must be easy to integrate.

Observability

All telemetry should be recordable by a historian.

Node-RED efficiency

Topic and payload design should minimize transformation overhead in Node-RED.

The MQTT semantic bus is therefore optimized as a low-memory, low-CPU event bus for constrained accessories, SBCs, thin VMs, and high-rate Node-RED flows.

---

## Status

The system is currently being deployed with a new MQTT broker running on:

`192.168.2.101`

The legacy broker at:

`192.168.2.133`

will be progressively phased out while Node-RED adapters migrate traffic into the canonical bus.