Showing 1 changed files with 1 additions and 0 deletions
+1 -0
projects/thunderbolts/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8 8
 ## [Unreleased]
9 9
 
10 10
 ### Fixed
11
+- `tb-recover.sh` `peer_ip_for_iface()` used a static `host:interface → peer-IP` table that assumed kernel Thunderbolt interface numbers follow a fixed port order. The kernel assigns interface numbers dynamically; after a baobab reboot `thunderbolt0` was bound to the tapia connection (domain1, `1-1.0`) while the table expected it to face ebony. This caused `assess_peer_health` to ping the wrong peer on every 30-second cycle, accumulate two failures, and call `recover_iface_cycle` (`ifdown/ifup`) every ~5 minutes — continuously aborting active ThunderboltIP sessions and keeping all cluster nodes isolated on `thunderbridge`. Fixed by replacing the static table with a dynamic sysfs lookup: `readlink /sys/class/net/<iface>/device` resolves to the XDomain service path whose parent directory exposes `device_name` (e.g. `tapia`), which is then mapped to the correct peer IP. The static table is retained as a fallback. [ISSUE-2026-003]
11 12
 - Invalid `ExecStop` syntax in `tb-enlist@.service` caused failed unit teardown on Thunderbolt device removal [ISSUE-2026-001]
12 13
 - Tapia-Baobab Thunderbolt recovery path hardened after reboot-time disconnect/reconnect events [ISSUE-2026-001]
13 14
 - `tb-enlist@.service` now stays active until `network.target` stops, so NFS storages routed over `thunderbridge` can unmount cleanly before Thunderbolt ports are detached; this is the Thunderbolt-side fix for the cluster-wide maintenance shutdown incident [ISSUE-2026-002]