# Sys Operational Namespace

## Purpose

This document defines the shared operational MQTT namespace under `<site>/sys/...`.

It is used for adapter, worker, bridge, and infrastructure observability.

`sys` is not a semantic bus. It exists to expose component health, counters, errors, and rejected messages without polluting domain buses such as `home` or `energy`.

---

## Namespace Model

Canonical shape:

`<site>/sys/<producer_kind>/<instance_id>/<stream>`

Examples:

- `vad/sys/adapter/z2m-zg-204zv/availability`
- `vad/sys/adapter/z2m-zg-204zv/stats`
- `vad/sys/historian/main/error`
- `vad/sys/historian/main/dlq`

Rules:

- `<site>` MUST be stable lowercase kebab-case
- `<producer_kind>` identifies the emitting software component, not a device capability
- `<instance_id>` MUST identify a stable logical instance of that component
- `sys` MUST NOT be used for room, device, or capability telemetry
- semantic topics such as `<site>/home/...` and `<site>/energy/...` MUST remain separate from `sys`

Common `producer_kind` values in v1:

- `adapter`
- `historian`

Additional producer kinds MAY be introduced later, but they SHOULD follow the same operational topic model.

---

## Streams

The `sys` namespace uses operational streams, not the semantic stream taxonomy from domain buses.

Supported v1 streams:

- `availability`
- `stats`
- `error`
- `dlq`

### `availability`

Meaning:

- liveness of the emitting component instance
- whether the adapter or worker itself is online

Typical payloads:

- `online`
- `offline`
- optionally `degraded` for components that expose a degraded mode

Policy:

- QoS 1
- retain true
- SHOULD use LWT when supported by the MQTT client

### `stats`

Meaning:

- low-rate counters or snapshots describing the current operational state of the component

Typical payload shape:

```json
{
  "processed_inputs": 1824,
  "translated_messages": 9241,
  "errors": 2,
  "dlq": 1
}
```

Policy:

- QoS 1
- retain true when the message represents the latest snapshot
- publish at a low rate, not on every hot-path sample

### `error`

Meaning:

- structured operator-visible errors
- faults that deserve attention but do not require embedding diagnostics into semantic bus payloads

Typical payload shape:

```json
{
  "code": "invalid_topic",
  "reason": "Topic must start with zigbee2mqtt/ZG-204ZV",
  "source_topic": "zigbee2mqtt/bad/topic",
  "adapter_id": "z2m-zg-204zv"
}
```

Policy:

- QoS 1
- retain false by default
- SHOULD remain structured and compact

### `dlq`

Meaning:

- dead-letter payloads for messages that could not be normalized or safely processed

Typical payload shape:

```json
{
  "code": "payload_not_object",
  "source_topic": "zigbee2mqtt/ZG-204ZV/vad/balcon/south",
  "payload": "offline"
}
```

Policy:

- QoS 1
- retain false
- SHOULD carry enough context to reproduce or inspect the rejected message

---

## Availability Semantics

Operational availability and semantic availability are different signals.

Examples:

- `vad/sys/adapter/z2m-zg-204zv/availability` means the adapter is running
- `vad/home/balcon/motion/south/availability` means the canonical endpoint is available to consumers

These topics are complementary, not duplicates.

Rules:

- `sys/.../availability` describes component health
- semantic `.../availability` describes endpoint availability on a domain bus

---

## Consumer Guidance

- historians SHOULD NOT treat `sys` topics as normal semantic measurements by default
- dashboards, alerting, and operator tooling SHOULD subscribe to `sys`
- malformed input, normalization failures, and adapter faults SHOULD be routed to `sys`, not embedded into semantic payloads
- consumers SHOULD expect `stats` to be snapshots and `error`/`dlq` to be transient diagnostics

---

## Relationship to Other Documents

- shared transport and payload rules: `mqtt_contract.md`
- adapter operational responsibilities: `addapters.md`
- historian worker operational topics: `historian_worker.md`
- semantic endpoint contracts: `home_bus.md`, `energy_bus.md`
