|
Bogdan Timofte
authored
2 weeks ago
|
1
|
# Sys Operational Namespace
|
|
|
2
|
|
|
|
3
|
## Purpose
|
|
|
4
|
|
|
|
5
|
This document defines the shared operational MQTT namespace under `<site>/sys/...`.
|
|
|
6
|
|
|
|
7
|
It is used for adapter, worker, bridge, and infrastructure observability.
|
|
|
8
|
|
|
|
9
|
`sys` is not a semantic bus. It exists to expose component health, counters, errors, and rejected messages without polluting domain buses such as `home` or `energy`.
|
|
|
10
|
|
|
|
11
|
---
|
|
|
12
|
|
|
|
13
|
## Namespace Model
|
|
|
14
|
|
|
|
15
|
Canonical shape:
|
|
|
16
|
|
|
|
17
|
`<site>/sys/<producer_kind>/<instance_id>/<stream>`
|
|
|
18
|
|
|
|
19
|
Examples:
|
|
|
20
|
|
|
|
21
|
- `vad/sys/adapter/z2m-zg-204zv/availability`
|
|
|
22
|
- `vad/sys/adapter/z2m-zg-204zv/stats`
|
|
|
23
|
- `vad/sys/historian/main/error`
|
|
|
24
|
- `vad/sys/historian/main/dlq`
|
|
|
25
|
|
|
|
26
|
Rules:
|
|
|
27
|
|
|
|
28
|
- `<site>` MUST be stable lowercase kebab-case
|
|
|
29
|
- `<producer_kind>` identifies the emitting software component, not a device capability
|
|
|
30
|
- `<instance_id>` MUST identify a stable logical instance of that component
|
|
|
31
|
- `sys` MUST NOT be used for room, device, or capability telemetry
|
|
|
32
|
- semantic topics such as `<site>/home/...` and `<site>/energy/...` MUST remain separate from `sys`
|
|
|
33
|
|
|
|
34
|
Common `producer_kind` values in v1:
|
|
|
35
|
|
|
|
36
|
- `adapter`
|
|
|
37
|
- `historian`
|
|
|
38
|
|
|
|
39
|
Additional producer kinds MAY be introduced later, but they SHOULD follow the same operational topic model.
|
|
|
40
|
|
|
|
41
|
---
|
|
|
42
|
|
|
|
43
|
## Streams
|
|
|
44
|
|
|
|
45
|
The `sys` namespace uses operational streams, not the semantic stream taxonomy from domain buses.
|
|
|
46
|
|
|
|
47
|
Supported v1 streams:
|
|
|
48
|
|
|
|
49
|
- `availability`
|
|
|
50
|
- `stats`
|
|
|
51
|
- `error`
|
|
|
52
|
- `dlq`
|
|
|
53
|
|
|
|
54
|
### `availability`
|
|
|
55
|
|
|
|
56
|
Meaning:
|
|
|
57
|
|
|
|
58
|
- liveness of the emitting component instance
|
|
|
59
|
- whether the adapter or worker itself is online
|
|
|
60
|
|
|
|
61
|
Typical payloads:
|
|
|
62
|
|
|
|
63
|
- `online`
|
|
|
64
|
- `offline`
|
|
|
65
|
- optionally `degraded` for components that expose a degraded mode
|
|
|
66
|
|
|
|
67
|
Policy:
|
|
|
68
|
|
|
|
69
|
- QoS 1
|
|
|
70
|
- retain true
|
|
|
71
|
- SHOULD use LWT when supported by the MQTT client
|
|
|
72
|
|
|
|
73
|
### `stats`
|
|
|
74
|
|
|
|
75
|
Meaning:
|
|
|
76
|
|
|
|
77
|
- low-rate counters or snapshots describing the current operational state of the component
|
|
|
78
|
|
|
|
79
|
Typical payload shape:
|
|
|
80
|
|
|
|
81
|
```json
|
|
|
82
|
{
|
|
|
83
|
"processed_inputs": 1824,
|
|
|
84
|
"translated_messages": 9241,
|
|
|
85
|
"errors": 2,
|
|
|
86
|
"dlq": 1
|
|
|
87
|
}
|
|
|
88
|
```
|
|
|
89
|
|
|
|
90
|
Policy:
|
|
|
91
|
|
|
|
92
|
- QoS 1
|
|
|
93
|
- retain true when the message represents the latest snapshot
|
|
|
94
|
- publish at a low rate, not on every hot-path sample
|
|
|
95
|
|
|
|
96
|
### `error`
|
|
|
97
|
|
|
|
98
|
Meaning:
|
|
|
99
|
|
|
|
100
|
- structured operator-visible errors
|
|
|
101
|
- faults that deserve attention but do not require embedding diagnostics into semantic bus payloads
|
|
|
102
|
|
|
|
103
|
Typical payload shape:
|
|
|
104
|
|
|
|
105
|
```json
|
|
|
106
|
{
|
|
|
107
|
"code": "invalid_topic",
|
|
|
108
|
"reason": "Topic must start with zigbee2mqtt/ZG-204ZV",
|
|
|
109
|
"source_topic": "zigbee2mqtt/bad/topic",
|
|
|
110
|
"adapter_id": "z2m-zg-204zv"
|
|
|
111
|
}
|
|
|
112
|
```
|
|
|
113
|
|
|
|
114
|
Policy:
|
|
|
115
|
|
|
|
116
|
- QoS 1
|
|
|
117
|
- retain false by default
|
|
|
118
|
- SHOULD remain structured and compact
|
|
|
119
|
|
|
|
120
|
### `dlq`
|
|
|
121
|
|
|
|
122
|
Meaning:
|
|
|
123
|
|
|
|
124
|
- dead-letter payloads for messages that could not be normalized or safely processed
|
|
|
125
|
|
|
|
126
|
Typical payload shape:
|
|
|
127
|
|
|
|
128
|
```json
|
|
|
129
|
{
|
|
|
130
|
"code": "payload_not_object",
|
|
|
131
|
"source_topic": "zigbee2mqtt/ZG-204ZV/vad/balcon/south",
|
|
|
132
|
"payload": "offline"
|
|
|
133
|
}
|
|
|
134
|
```
|
|
|
135
|
|
|
|
136
|
Policy:
|
|
|
137
|
|
|
|
138
|
- QoS 1
|
|
|
139
|
- retain false
|
|
|
140
|
- SHOULD carry enough context to reproduce or inspect the rejected message
|
|
|
141
|
|
|
|
142
|
---
|
|
|
143
|
|
|
|
144
|
## Availability Semantics
|
|
|
145
|
|
|
|
146
|
Operational availability and semantic availability are different signals.
|
|
|
147
|
|
|
|
148
|
Examples:
|
|
|
149
|
|
|
|
150
|
- `vad/sys/adapter/z2m-zg-204zv/availability` means the adapter is running
|
|
|
151
|
- `vad/home/balcon/motion/south/availability` means the canonical endpoint is available to consumers
|
|
|
152
|
|
|
|
153
|
These topics are complementary, not duplicates.
|
|
|
154
|
|
|
|
155
|
Rules:
|
|
|
156
|
|
|
|
157
|
- `sys/.../availability` describes component health
|
|
|
158
|
- semantic `.../availability` describes endpoint availability on a domain bus
|
|
|
159
|
|
|
|
160
|
---
|
|
|
161
|
|
|
|
162
|
## Consumer Guidance
|
|
|
163
|
|
|
|
164
|
- historians SHOULD NOT treat `sys` topics as normal semantic measurements by default
|
|
|
165
|
- dashboards, alerting, and operator tooling SHOULD subscribe to `sys`
|
|
|
166
|
- malformed input, normalization failures, and adapter faults SHOULD be routed to `sys`, not embedded into semantic payloads
|
|
|
167
|
- consumers SHOULD expect `stats` to be snapshots and `error`/`dlq` to be transient diagnostics
|
|
|
168
|
|
|
|
169
|
---
|
|
|
170
|
|
|
|
171
|
## Relationship to Other Documents
|
|
|
172
|
|
|
|
173
|
- shared transport and payload rules: `mqtt_contract.md`
|
|
|
174
|
- adapter operational responsibilities: `addapters.md`
|
|
|
175
|
- historian worker operational topics: `historian_worker.md`
|
|
|
176
|
- semantic endpoint contracts: `home_bus.md`, `energy_bus.md`
|