c299213 3 months ago History
1 contributor
170 lines | 10.242kb

Issue ISSUE-2026-002: Planned reboot stalls on NFS storages over thunderbridge before network shutdown

Issue ID: ISSUE-2026-002

Status: investigating
Priority: high
Created: 2026-03-07
Updated: 2026-03-07
Assigned to: unassigned


Summary

Planned node reboot on baobab spent ~106 seconds in shutdown because Proxmox NFS storages were still mounted after Thunderbolt transport had already been detached from thunderbridge.


Description

During a controlled reboot validation on baobab, guest suspend worked correctly, but the host remained reachable over ICMP for almost two minutes after systemctl reboot. Journal analysis showed that the Thunderbolt bridge ports were detached early in shutdown, while Proxmox only attempted to unmount NFS storages later. Because AutoNAS-1 and AutoNAS-2 are mounted over 192.168.10.x through thunderbridge, the NFS unmount path lost transport and waited for timeout.

The same investigation exposed a second maintenance risk in pgs: preflight cleanup could block in kernel I/O wait when it touched remote NFS-backed storages that were stale or temporarily unavailable. That does not create the slow reboot itself, but it can block the maintenance preparation step.

Follow-up validation on ebony showed a different but related cluster behavior: AutoNAS-1 is currently exported by ebony itself. During reboot, autonas.service stops early, which makes the node's own Proxmox NFS client mount for AutoNAS-1 stale and it then waits for timeout during unmount. In the same window, VM 301 is-anjohibe (PBS anjothibe) is intentionally suspended by pgs, so PBS availability loss is expected during the maintenance window.

Validation on tapia showed the same class of topology problem for AutoNAS-2, which is locally exported there and mounted back as a Proxmox NFS storage. The AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because mnt-pve-AutoNAS-2.mount waited for timeout during shutdown while PBS andrafiabe-AutoNAS had already become unreachable.


Environment

  • Affected nodes: baobab confirmed, likely all nodes using Proxmox NFS storages over thunderbridge
  • Component: network + storage + maintenance workflow
  • Version/software: Proxmox VE 9.1 / kernel 6.17.13-1-pve, tb-enlist@.service, pgs

Steps to Reproduce

  1. On a node with Proxmox NFS storages routed over thunderbridge, run /usr/local/sbin/pgs suspend -v.
  2. Trigger systemctl reboot.
  3. Measure ICMP availability during shutdown and boot.
  4. Inspect journalctl -b -1 around the reboot window.

Expected Behavior

  • NFS storages should unmount while Thunderbolt transport is still available.
  • Host should stop replying to ICMP shortly after reboot is requested.
  • pgs suspend should not hang because a remote NFS mount is stale.

Actual Behavior

  • First validation on baobab:
    • TIME_TO_STOP_SECONDS 105.852
    • TIME_TO_FIRST_REPLY_SECONDS 130.230
    • DOWNTIME_SECONDS 24.377
  • Follow-up validation on ebony:
    • TIME_TO_STOP_SECONDS 120.275
    • TIME_TO_FIRST_REPLY_SECONDS 145.840
    • DOWNTIME_SECONDS 25.565
  • Follow-up validation on tapia after cluster-wide AutoNAS rollout:
    • TIME_TO_STOP_SECONDS 123.285
    • TIME_TO_FIRST_REPLY_SECONDS 149.420
    • DOWNTIME_SECONDS 26.135
  • journalctl -b -1 showed:
    • Thunderbolt bridge ports detached at 08:48:17.989
    • NFS unmount only started at 08:48:30.540
    • mnt-pve-AutoNAS-1.mount and mnt-pve-AutoNAS-2.mount timed out at 08:50:00.604/0.605
  • journalctl -b -1 on ebony showed:
    • autonas.service stopped at 11:04:22.326
    • mnt-pve-AutoNAS-2.mount unmounted successfully by 11:04:38.693
    • mnt-pve-AutoNAS-1.mount timed out at 11:06:08.679
    • only after that did network.target stop and tb-enlist@thunderbolt0.service detach from thunderbridge
  • A later maintenance attempt also showed pgs suspend blocked in nfs4_proc_getattr while scanning storage paths.

Logs/Evidence

Mar 07 08:48:17.989246 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt0 was detached
Mar 07 08:48:17.993120 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt1 was detached
Mar 07 08:48:30.540186 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-1.mount - /mnt/pve/AutoNAS-1...
Mar 07 08:48:30.541335 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-2.mount - /mnt/pve/AutoNAS-2...
Mar 07 08:50:00.604036 baobab systemd[1]: mnt-pve-AutoNAS-2.mount: Unmounting timed out. Terminating.
Mar 07 08:50:00.605215 baobab systemd[1]: mnt-pve-AutoNAS-1.mount: Unmounting timed out. Terminating.

Blocked pgs stack during stale-NFS preflight:

[<0>] rpc_wait_bit_killable+0x11/0x80 [sunrpc]
[<0>] nfs4_do_call_sync+0x6a/0xc0 [nfsv4]
[<0>] __nfs_revalidate_inode+0xd4/0x320 [nfs]
[<0>] __do_sys_newfstatat+0x43/0x90

Validated timing after fixes on baobab:

TIME_TO_STOP_SECONDS 14.599
TIME_TO_FIRST_REPLY_SECONDS 35.651
DOWNTIME_SECONDS 21.053

Investigation Notes

  • 2026-03-07: Confirmed AutoNAS-1 and AutoNAS-2 on baobab are Proxmox NFS storages mounted from 192.168.10.21 and 192.168.10.22 over thunderbridge.
  • 2026-03-07: First reboot validation on baobab showed shutdown delay dominated by NFS unmount timeout, not by boot.
  • 2026-03-07: tb-enlist@.service had no ordering against network.target; systemd stopped Thunderbolt bridge membership before Proxmox unmounted remote storages.
  • 2026-03-07: Patched shared tb-enlist@.service with Before=network.target and deployed to baobab, then cluster-wide.
  • 2026-03-07: Separate maintenance attempt showed pgs suspend can block in nfs4_proc_getattr while scanning storage paths on stale remote NFS mounts.
  • 2026-03-07: Patched pgs cleanup to scan only local dir storages; remote storages such as NFS are skipped intentionally.
  • 2026-03-07: Revalidated on baobab after both fixes:
    • NFS unmount started at 10:48:12.354/10:48:12.356
    • both NFS mounts unmounted successfully by 10:48:12.460
    • network.target stopped later at 10:48:16.152
    • ICMP loss dropped from ~106s to ~15s after reboot command
  • 2026-03-07: pgs resume completed successfully after reboot on baobab; state file survived boot and all 4 VMs + 1 CT were restored.
  • 2026-03-07: Validated ebony with current pgs and cluster-wide thunderbolts rollout. pgs suspend / resume succeeded for VMs 101, 102, 301; state file survived reboot and restore completed.
  • 2026-03-07: ebony still showed long shutdown because AutoNAS-1 is currently provided by ebony itself through autonas. Stopping autonas.service made the node's own NFS client mount stale and mnt-pve-AutoNAS-1.mount waited for timeout.
  • 2026-03-07: On ebony, PBS anjothibe availability loss during maintenance is expected because VM 301 is-anjohibe is intentionally suspended by pgs, and its datastore dependency is also on AutoNAS-1.
  • 2026-03-07: Implemented AutoNAS shutdown-ordering experiment on ebony: autonas.service and autonas-boot-scan.service now declare Before=remote-fs.target and Before=umount.target.
  • 2026-03-07: Revalidated ebony after AutoNAS patch:
    • previous timing: TIME_TO_STOP_SECONDS 120.275, TIME_TO_FIRST_REPLY_SECONDS 145.840
    • new timing: TIME_TO_STOP_SECONDS 27.573, TIME_TO_FIRST_REPLY_SECONDS 53.288
    • mnt-pve-AutoNAS-2.mount still unmounted cleanly
    • AutoNAS-1 no longer waited for the old 90s timeout, though a brief Stale file handle was still observed before the provider side stopped
  • 2026-03-07: Residual issue on ebony: even with later provider shutdown, pvestatd briefly logged storage 'AutoNAS-1' is not online / Stale file handle during the maintenance window, so the self-hosted NFS topology remains fragile but no longer dominates shutdown time.
  • 2026-03-07: Deployed the same AutoNAS ordering patch cluster-wide and revalidated tapia.
  • 2026-03-07: pgs suspend / reboot / pgs resume succeeded on tapia for VMs 104, 107, 113, 302; state file survived reboot and all four guests were restored.
  • 2026-03-07: tapia still showed slow shutdown after the AutoNAS patch:
    • TIME_TO_STOP_SECONDS 123.285, TIME_TO_FIRST_REPLY_SECONDS 149.420
    • mnt-pve-AutoNAS-1.mount unmounted immediately at 11:45:01.827
    • autonas.service and nfs-server.service stopped around 11:45:01.689/11:45:01.900
    • mnt-pve-AutoNAS-2.mount then waited until timeout at 11:46:31.778
    • network.target stopped only after that, at 11:46:31.781
  • 2026-03-07: On tapia, the remaining delay is concentrated on self-hosted AutoNAS-2 (server 192.168.10.22) plus expected maintenance-window loss of PBS andrafiabe-AutoNAS (192.168.10.96).

Proposed Solution

  1. Keep Thunderbolt enlist units ordered before network.target so storage traffic over thunderbridge remains alive until remote filesystems are unmounted.
  2. Keep pgs cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
  3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on ebony, exclude AutoNAS-1 from local use or replace that local dependency with a direct/local storage path.
  4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example autonas and PBS on ebony).
  5. Apply the same self-hosted-storage review on tapia, where AutoNAS-2 remains the dominant shutdown delay even after the AutoNAS ordering patch.
  6. Validate the same shutdown path on the remaining nodes after storage-role cleanup.

Related Issues

  • ISSUE-2026-001

Changelog References

List CHANGELOG.md entries that reference this issue: - projects/thunderbolts/CHANGELOG.md: [Unreleased] - tb-enlist@.service now stays active until network.target stops... [ISSUE-2026-002] - projects/pve-guests-state/CHANGELOG.md: [1.5] - Suspend-artifact cleanup now scans only local dir storages... [ISSUE-2026-002]