# Issue ISSUE-2026-002: Planned reboot stalls on NFS storages over thunderbridge before network shutdown

## Issue ID: ISSUE-2026-002

**Status:** investigating  
**Priority:** high  
**Created:** 2026-03-07  
**Updated:** 2026-03-07  
**Assigned to:** unassigned

---

## Summary

Planned node reboot on `baobab` spent ~106 seconds in shutdown because Proxmox NFS storages were still mounted after Thunderbolt transport had already been detached from `thunderbridge`.

---

## Description

During a controlled reboot validation on `baobab`, guest suspend worked correctly, but the host remained reachable over ICMP for almost two minutes after `systemctl reboot`. Journal analysis showed that the Thunderbolt bridge ports were detached early in shutdown, while Proxmox only attempted to unmount NFS storages later. Because `AutoNAS-1` and `AutoNAS-2` are mounted over `192.168.10.x` through `thunderbridge`, the NFS unmount path lost transport and waited for timeout.

The same investigation exposed a second maintenance risk in `pgs`: preflight cleanup could block in kernel I/O wait when it touched remote NFS-backed storages that were stale or temporarily unavailable. That does not create the slow reboot itself, but it can block the maintenance preparation step.

Follow-up validation on `ebony` showed a different but related cluster behavior: `AutoNAS-1` is currently exported by `ebony` itself. During reboot, `autonas.service` stops early, which makes the node's own Proxmox NFS client mount for `AutoNAS-1` stale and it then waits for timeout during unmount. In the same window, VM `301 is-anjohibe` (PBS `anjothibe`) is intentionally suspended by `pgs`, so PBS availability loss is expected during the maintenance window.

Validation on `tapia` initially showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The first AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.

Follow-up work in the `autoNAS` project added an explicit `nfs-server.service` drop-in for self-hosted Proxmox NFS mounts discovered from `storage.cfg`. After that second patch, `tapia` reboot timing dropped into the same range as `ebony`, confirming that the remaining blocker was provider ordering on `nfs-server.service`, not on `autonas.service`.

---

## Environment

- **Affected nodes:** `baobab` confirmed, likely all nodes using Proxmox NFS storages over `thunderbridge`
- **Component:** network + storage + maintenance workflow
- **Version/software:** Proxmox VE 9.1 / kernel `6.17.13-1-pve`, `tb-enlist@.service`, `pgs`

---

## Steps to Reproduce

1. On a node with Proxmox NFS storages routed over `thunderbridge`, run `/usr/local/sbin/pgs suspend -v`.
2. Trigger `systemctl reboot`.
3. Measure ICMP availability during shutdown and boot.
4. Inspect `journalctl -b -1` around the reboot window.

---

## Expected Behavior

- NFS storages should unmount while Thunderbolt transport is still available.
- Host should stop replying to ICMP shortly after reboot is requested.
- `pgs suspend` should not hang because a remote NFS mount is stale.

---

## Actual Behavior

- First validation on `baobab`:
  - `TIME_TO_STOP_SECONDS 105.852`
  - `TIME_TO_FIRST_REPLY_SECONDS 130.230`
  - `DOWNTIME_SECONDS 24.377`
- Follow-up validation on `ebony`:
  - `TIME_TO_STOP_SECONDS 120.275`
  - `TIME_TO_FIRST_REPLY_SECONDS 145.840`
  - `DOWNTIME_SECONDS 25.565`
- Follow-up validation on `tapia` after cluster-wide AutoNAS rollout:
  - `TIME_TO_STOP_SECONDS 123.285`
  - `TIME_TO_FIRST_REPLY_SECONDS 149.420`
  - `DOWNTIME_SECONDS 26.135`
- Revalidation on `tapia` after explicit `nfs-server.service` self-hosted ordering fix:
  - `TIME_TO_STOP_SECONDS 28.305`
  - `TIME_TO_FIRST_REPLY_SECONDS 53.588`
  - `DOWNTIME_SECONDS 25.283`
- `journalctl -b -1` showed:
  - Thunderbolt bridge ports detached at `08:48:17.989`
  - NFS unmount only started at `08:48:30.540`
  - `mnt-pve-AutoNAS-1.mount` and `mnt-pve-AutoNAS-2.mount` timed out at `08:50:00.604/0.605`
- `journalctl -b -1` on `ebony` showed:
  - `autonas.service` stopped at `11:04:22.326`
  - `mnt-pve-AutoNAS-2.mount` unmounted successfully by `11:04:38.693`
  - `mnt-pve-AutoNAS-1.mount` timed out at `11:06:08.679`
  - only after that did `network.target` stop and `tb-enlist@thunderbolt0.service` detach from `thunderbridge`
- A later maintenance attempt also showed `pgs suspend` blocked in `nfs4_proc_getattr` while scanning storage paths.

---

## Logs/Evidence

```text
Mar 07 08:48:17.989246 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt0 was detached
Mar 07 08:48:17.993120 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt1 was detached
Mar 07 08:48:30.540186 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-1.mount - /mnt/pve/AutoNAS-1...
Mar 07 08:48:30.541335 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-2.mount - /mnt/pve/AutoNAS-2...
Mar 07 08:50:00.604036 baobab systemd[1]: mnt-pve-AutoNAS-2.mount: Unmounting timed out. Terminating.
Mar 07 08:50:00.605215 baobab systemd[1]: mnt-pve-AutoNAS-1.mount: Unmounting timed out. Terminating.
```

Blocked `pgs` stack during stale-NFS preflight:

```text
[<0>] rpc_wait_bit_killable+0x11/0x80 [sunrpc]
[<0>] nfs4_do_call_sync+0x6a/0xc0 [nfsv4]
[<0>] __nfs_revalidate_inode+0xd4/0x320 [nfs]
[<0>] __do_sys_newfstatat+0x43/0x90
```

Validated timing after fixes on `baobab`:

```text
TIME_TO_STOP_SECONDS 14.599
TIME_TO_FIRST_REPLY_SECONDS 35.651
DOWNTIME_SECONDS 21.053
```

---

## Investigation Notes

- 2026-03-07: Confirmed `AutoNAS-1` and `AutoNAS-2` on `baobab` are Proxmox NFS storages mounted from `192.168.10.21` and `192.168.10.22` over `thunderbridge`.
- 2026-03-07: First reboot validation on `baobab` showed shutdown delay dominated by NFS unmount timeout, not by boot.
- 2026-03-07: `tb-enlist@.service` had no ordering against `network.target`; systemd stopped Thunderbolt bridge membership before Proxmox unmounted remote storages.
- 2026-03-07: Patched shared `tb-enlist@.service` with `Before=network.target` and deployed to `baobab`, then cluster-wide.
- 2026-03-07: Separate maintenance attempt showed `pgs suspend` can block in `nfs4_proc_getattr` while scanning storage paths on stale remote NFS mounts.
- 2026-03-07: Patched `pgs` cleanup to scan only local `dir` storages; remote storages such as NFS are skipped intentionally.
- 2026-03-07: Revalidated on `baobab` after both fixes:
  - NFS unmount started at `10:48:12.354/10:48:12.356`
  - both NFS mounts unmounted successfully by `10:48:12.460`
  - `network.target` stopped later at `10:48:16.152`
  - ICMP loss dropped from ~106s to ~15s after reboot command
- 2026-03-07: `pgs resume` completed successfully after reboot on `baobab`; state file survived boot and all 4 VMs + 1 CT were restored.
- 2026-03-07: Validated `ebony` with current `pgs` and cluster-wide `thunderbolts` rollout. `pgs suspend` / `resume` succeeded for VMs `101`, `102`, `301`; state file survived reboot and restore completed.
- 2026-03-07: `ebony` still showed long shutdown because `AutoNAS-1` is currently provided by `ebony` itself through `autonas`. Stopping `autonas.service` made the node's own NFS client mount stale and `mnt-pve-AutoNAS-1.mount` waited for timeout.
- 2026-03-07: On `ebony`, PBS `anjothibe` availability loss during maintenance is expected because VM `301 is-anjohibe` is intentionally suspended by `pgs`, and its datastore dependency is also on `AutoNAS-1`.
- 2026-03-07: Implemented AutoNAS shutdown-ordering experiment on `ebony`: `autonas.service` and `autonas-boot-scan.service` now declare `Before=remote-fs.target` and `Before=umount.target`.
- 2026-03-07: Revalidated `ebony` after AutoNAS patch:
  - previous timing: `TIME_TO_STOP_SECONDS 120.275`, `TIME_TO_FIRST_REPLY_SECONDS 145.840`
  - new timing: `TIME_TO_STOP_SECONDS 27.573`, `TIME_TO_FIRST_REPLY_SECONDS 53.288`
  - `mnt-pve-AutoNAS-2.mount` still unmounted cleanly
  - `AutoNAS-1` no longer waited for the old 90s timeout, though a brief `Stale file handle` was still observed before the provider side stopped
- 2026-03-07: Residual issue on `ebony`: even with later provider shutdown, `pvestatd` briefly logged `storage 'AutoNAS-1' is not online` / `Stale file handle` during the maintenance window, so the self-hosted NFS topology remains fragile but no longer dominates shutdown time.
- 2026-03-07: Deployed the same AutoNAS ordering patch cluster-wide and revalidated `tapia`.
- 2026-03-07: `pgs suspend` / reboot / `pgs resume` succeeded on `tapia` for VMs `104`, `107`, `113`, `302`; state file survived reboot and all four guests were restored.
- 2026-03-07: `tapia` still showed slow shutdown after the AutoNAS patch:
  - `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
  - `mnt-pve-AutoNAS-1.mount` unmounted immediately at `11:45:01.827`
  - `autonas.service` and `nfs-server.service` stopped around `11:45:01.689/11:45:01.900`
  - `mnt-pve-AutoNAS-2.mount` then waited until timeout at `11:46:31.778`
  - `network.target` stopped only after that, at `11:46:31.781`
- 2026-03-07: On `tapia`, the remaining delay is concentrated on self-hosted `AutoNAS-2` (`server 192.168.10.22`) plus expected maintenance-window loss of PBS `andrafiabe-AutoNAS` (`192.168.10.96`).
- 2026-03-07: Implemented a second-generation AutoNAS fix that generates `/etc/systemd/system/nfs-server.service.d/50-autonas-self-hosted-proxmox.conf` from `storage.cfg`, adding `Before=` ordering from `nfs-server.service` to the matching self-hosted Proxmox mount units.
- 2026-03-07: Revalidated `tapia` after the `nfs-server.service` ordering fix:
  - previous timing after first AutoNAS patch: `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
  - new timing: `TIME_TO_STOP_SECONDS 28.305`, `TIME_TO_FIRST_REPLY_SECONDS 53.588`
  - `nfs-server.service` stopped at `12:07:42.157`, `network.target` stopped later at `12:07:47.230`
  - the old ~90s timeout on `mnt-pve-AutoNAS-2.mount` no longer dominated shutdown
  - `pgs suspend` / reboot / `pgs resume` completed successfully for VMs `104`, `107`, `113`, `302`

---

## Proposed Solution

1. Keep Thunderbolt enlist units ordered before `network.target` so storage traffic over `thunderbridge` remains alive until remote filesystems are unmounted.
2. Keep `pgs` cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on `ebony`, exclude `AutoNAS-1` from local use or replace that local dependency with a direct/local storage path.
4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example `autonas` and PBS on `ebony`).
5. Keep the generated `nfs-server.service` self-hosted ordering drop-in as the cluster fix for nodes that export AutoNAS locally and also consume the same export back through Proxmox NFS.
6. Validate the same shutdown path on the remaining nodes after storage-role cleanup and after the `nfs-server.service` ordering fix is deployed.

---

## Related Issues

- ISSUE-2026-001

---

## Changelog References

List CHANGELOG.md entries that reference this issue:
- `projects/thunderbolts/CHANGELOG.md`: [Unreleased] - `tb-enlist@.service` now stays active until `network.target` stops... [ISSUE-2026-002]
- `projects/pve-guests-state/CHANGELOG.md`: [1.5] - Suspend-artifact cleanup now scans only local `dir` storages... [ISSUE-2026-002]
