Madagascar / projects / thunderbolts /
README

Madagascar's Thunderbolts

Thunderbolt networking toolkit for three Proxmox hosts (baobab, ebony, tapia).
The goal is to bring up a high-MTU Thunderbolt bridge (thunderbridge) early in boot, enlist hot-plugged Thunderbolt NICs as they appear, and keep management networking configs consistent across the cluster.

Repository layout

deploy/attempt1/
├── common/                    # Shared bits copied to every host
│   ├── systemd/system/
│   │   ├── tb-bridge.service  # Ensures the bridge device exists and is up
│   │   └── tb-enlist@.service # Enlists hotplugged NICs into the bridge
│   └── udev/rules.d/
│       └── 90-…systemd.rules  # Starts tb-enlist@ for thunderbolt* devices
├── baobab/…                   # Node-specific /etc/network config
├── ebony/…
├── tapia/…
└── deploy_tb.sh               # Main deployment script

The repo currently holds a single deployment attempt (deploy/attempt1). If you iterate on the design, prefer adding a new attempt directory so older snapshots stay reproducible.

Standardized lifecycle

This project now has two distinct operational paths:

  • Full bootstrap: deploy/attempt1/deploy_tb.sh
    • can update host-specific network configuration
    • use for initial deployment or deliberate network template rollout
  • Shared runtime reinstall: ./setup.sh
    • standardizes the shared runtime artifacts only
    • installs/removes tb-recover.sh, the shared systemd units, and the udev rule
    • intentionally leaves /etc/network/interfaces and /etc/network/interfaces.d/10-thunderbolt untouched

Standardized host paths for the shared runtime:

  • canonical uninstall: /usr/local/lib/xdev/thunderbolts/uninstall.sh
  • canonical shared script: /usr/local/lib/xdev/thunderbolts/tb-recover.sh
  • operator wrapper: /usr/local/sbin/tb-recover.sh
  • installed docs: /usr/local/share/doc/xdev/thunderbolts

Use:

./setup.sh                 # reinstall shared runtime on baobab ebony tapia
./setup.sh baobab          # single host
./setup.sh --uninstall baobab

Prerequisites

  • Machine with Bash ≥3, ssh, and scp available.
  • Access to the target hosts as root (default username) over the management or Thunderbolt network; passwordless SSH is assumed.
  • Target hosts run Proxmox (or any Debian-like system with ifupdown2 and systemd).
  • ip, systemctl, and udevadm available on the remote hosts.

How deployment works

deploy_tb.sh is idempotent. For each target host it:

  • Chooses an IP by trying management first, then Thunderbolt (get_mgmt_ip/get_tb_ip).
  • Uploads shared udev and systemd units that prepare the thunderbridge device and attach Thunderbolt NICs when they hot-plug.
  • Replaces /etc/network/interfaces with the host-specific template and places the Thunderbolt overlay in /etc/network/interfaces.d/10-thunderbolt.
  • Reloads udev and systemd, triggers network reloads, enables the services, and prints a short status report (bridge state, enlisted NICs).

Run it from inside the attempt directory so relative paths resolve correctly.

cd deploy/attempt1
./deploy_tb.sh            # deploys to baobab, ebony, tapia
./deploy_tb.sh baobab     # deploys to a single host
./deploy_tb.sh tapia ebony

Customising host lists and addresses

Edit the get_mgmt_ip() and get_tb_ip() helpers near the top of deploy/attempt1/deploy_tb.sh to match your environment. Each host that you want to target must:

  1. Have a subdirectory named after the host inside deploy/attempt1.
  2. Provide the full /etc/network/interfaces template.
  3. Provide etc/network/interfaces.d/10-thunderbolt with the bridge definition and hotplug rules for Thunderbolt interfaces.

To add a new host, copy one of the existing directories, adjust static IPs and interface names, then extend both helper functions so the script can locate it.

What the systemd/udev pieces do

  • tb-bridge.service (oneshot) makes sure the thunderbridge device exists as a Linux bridge, sets MTU 65520, and brings it up during early boot.
  • tb-enlist@.service attaches Thunderbolt NIC instances to the bridge, aligning their MTU and keeping them hotplug friendly; systemd stops the unit cleanly on device removal and keeps it ordered before network.target during shutdown so remote filesystems over thunderbridge can unmount before the ports detach.
  • 90-thunderbolt-net-systemd.rules tags thunderbolt* NICs so udev starts the enlist service automatically.

These files live under deploy/attempt1/common/ and are copied verbatim to the remote host’s /etc/systemd/system and /etc/udev/rules.d.

Validation checklist

After running the deploy script on a host:

  • systemctl status tb-bridge.service should show an active oneshot unit.
  • systemctl list-units 'tb-enlist@*' should list one unit per detected Thunderbolt NIC, each loaded and active.
  • ip -d link show thunderbridge should display MTU 65520 and state UP.
  • bridge link should list your Thunderbolt interfaces as ports of thunderbridge once cables are connected.

If you change the network definitions, re-run ./deploy_tb.sh <host> to push the updates. The script re-applies permissions, reloads systemd, retriggers udev, and refreshes the interfaces.

Troubleshooting tips

  • SSH unreachable: Confirm management and Thunderbolt IPs in the helper functions are correct, and that firewalls allow SSH. The script prints which IP it tried.
  • Bridge missing after reboot: Ensure tb-bridge.service is enabled; run systemctl enable --now tb-bridge.service on the host.
  • NICs not joining: Check journalctl -u tb-enlist@thunderbolt0 for logs and make sure the udev rule is present under /etc/udev/rules.d.
  • Slow shutdown with NFS on thunderbridge: Verify the host has the updated tb-enlist@.service with Before=network.target; otherwise thunderbridge can disappear before Proxmox unmounts NFS storages and shutdown waits on NFS timeouts.
  • MTU mismatch complaints: The service forces MTU 65520 on both sides; verify the connected devices also support it.

Extending beyond attempt1

Prefer copying deploy/attempt1 into a new versioned folder (for example, attempt2) when you experiment with alternate topologies or addresses. This keeps previous rollouts reproducible and eases diffing of changes.