|
Bogdan Timofte
authored
3 months ago
|
1
|
# Issue ISSUE-2026-002: Planned reboot stalls on NFS storages over thunderbridge before network shutdown
|
|
|
2
|
|
|
|
3
|
## Issue ID: ISSUE-2026-002
|
|
|
4
|
|
|
|
5
|
**Status:** investigating
|
|
|
6
|
**Priority:** high
|
|
|
7
|
**Created:** 2026-03-07
|
|
|
8
|
**Updated:** 2026-03-07
|
|
|
9
|
**Assigned to:** unassigned
|
|
|
10
|
|
|
|
11
|
---
|
|
|
12
|
|
|
|
13
|
## Summary
|
|
|
14
|
|
|
|
15
|
Planned node reboot on `baobab` spent ~106 seconds in shutdown because Proxmox NFS storages were still mounted after Thunderbolt transport had already been detached from `thunderbridge`.
|
|
|
16
|
|
|
|
17
|
---
|
|
|
18
|
|
|
|
19
|
## Description
|
|
|
20
|
|
|
|
21
|
During a controlled reboot validation on `baobab`, guest suspend worked correctly, but the host remained reachable over ICMP for almost two minutes after `systemctl reboot`. Journal analysis showed that the Thunderbolt bridge ports were detached early in shutdown, while Proxmox only attempted to unmount NFS storages later. Because `AutoNAS-1` and `AutoNAS-2` are mounted over `192.168.10.x` through `thunderbridge`, the NFS unmount path lost transport and waited for timeout.
|
|
|
22
|
|
|
|
23
|
The same investigation exposed a second maintenance risk in `pgs`: preflight cleanup could block in kernel I/O wait when it touched remote NFS-backed storages that were stale or temporarily unavailable. That does not create the slow reboot itself, but it can block the maintenance preparation step.
|
|
|
24
|
|
|
|
25
|
Follow-up validation on `ebony` showed a different but related cluster behavior: `AutoNAS-1` is currently exported by `ebony` itself. During reboot, `autonas.service` stops early, which makes the node's own Proxmox NFS client mount for `AutoNAS-1` stale and it then waits for timeout during unmount. In the same window, VM `301 is-anjohibe` (PBS `anjothibe`) is intentionally suspended by `pgs`, so PBS availability loss is expected during the maintenance window.
|
|
|
26
|
|
|
|
27
|
Validation on `tapia` showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
|
|
|
28
|
|
|
|
29
|
---
|
|
|
30
|
|
|
|
31
|
## Environment
|
|
|
32
|
|
|
|
33
|
- **Affected nodes:** `baobab` confirmed, likely all nodes using Proxmox NFS storages over `thunderbridge`
|
|
|
34
|
- **Component:** network + storage + maintenance workflow
|
|
|
35
|
- **Version/software:** Proxmox VE 9.1 / kernel `6.17.13-1-pve`, `tb-enlist@.service`, `pgs`
|
|
|
36
|
|
|
|
37
|
---
|
|
|
38
|
|
|
|
39
|
## Steps to Reproduce
|
|
|
40
|
|
|
|
41
|
1. On a node with Proxmox NFS storages routed over `thunderbridge`, run `/usr/local/sbin/pgs suspend -v`.
|
|
|
42
|
2. Trigger `systemctl reboot`.
|
|
|
43
|
3. Measure ICMP availability during shutdown and boot.
|
|
|
44
|
4. Inspect `journalctl -b -1` around the reboot window.
|
|
|
45
|
|
|
|
46
|
---
|
|
|
47
|
|
|
|
48
|
## Expected Behavior
|
|
|
49
|
|
|
|
50
|
- NFS storages should unmount while Thunderbolt transport is still available.
|
|
|
51
|
- Host should stop replying to ICMP shortly after reboot is requested.
|
|
|
52
|
- `pgs suspend` should not hang because a remote NFS mount is stale.
|
|
|
53
|
|
|
|
54
|
---
|
|
|
55
|
|
|
|
56
|
## Actual Behavior
|
|
|
57
|
|
|
|
58
|
- First validation on `baobab`:
|
|
|
59
|
- `TIME_TO_STOP_SECONDS 105.852`
|
|
|
60
|
- `TIME_TO_FIRST_REPLY_SECONDS 130.230`
|
|
|
61
|
- `DOWNTIME_SECONDS 24.377`
|
|
|
62
|
- Follow-up validation on `ebony`:
|
|
|
63
|
- `TIME_TO_STOP_SECONDS 120.275`
|
|
|
64
|
- `TIME_TO_FIRST_REPLY_SECONDS 145.840`
|
|
|
65
|
- `DOWNTIME_SECONDS 25.565`
|
|
|
66
|
- Follow-up validation on `tapia` after cluster-wide AutoNAS rollout:
|
|
|
67
|
- `TIME_TO_STOP_SECONDS 123.285`
|
|
|
68
|
- `TIME_TO_FIRST_REPLY_SECONDS 149.420`
|
|
|
69
|
- `DOWNTIME_SECONDS 26.135`
|
|
|
70
|
- `journalctl -b -1` showed:
|
|
|
71
|
- Thunderbolt bridge ports detached at `08:48:17.989`
|
|
|
72
|
- NFS unmount only started at `08:48:30.540`
|
|
|
73
|
- `mnt-pve-AutoNAS-1.mount` and `mnt-pve-AutoNAS-2.mount` timed out at `08:50:00.604/0.605`
|
|
|
74
|
- `journalctl -b -1` on `ebony` showed:
|
|
|
75
|
- `autonas.service` stopped at `11:04:22.326`
|
|
|
76
|
- `mnt-pve-AutoNAS-2.mount` unmounted successfully by `11:04:38.693`
|
|
|
77
|
- `mnt-pve-AutoNAS-1.mount` timed out at `11:06:08.679`
|
|
|
78
|
- only after that did `network.target` stop and `tb-enlist@thunderbolt0.service` detach from `thunderbridge`
|
|
|
79
|
- A later maintenance attempt also showed `pgs suspend` blocked in `nfs4_proc_getattr` while scanning storage paths.
|
|
|
80
|
|
|
|
81
|
---
|
|
|
82
|
|
|
|
83
|
## Logs/Evidence
|
|
|
84
|
|
|
|
85
|
```text
|
|
|
86
|
Mar 07 08:48:17.989246 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt0 was detached
|
|
|
87
|
Mar 07 08:48:17.993120 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt1 was detached
|
|
|
88
|
Mar 07 08:48:30.540186 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-1.mount - /mnt/pve/AutoNAS-1...
|
|
|
89
|
Mar 07 08:48:30.541335 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-2.mount - /mnt/pve/AutoNAS-2...
|
|
|
90
|
Mar 07 08:50:00.604036 baobab systemd[1]: mnt-pve-AutoNAS-2.mount: Unmounting timed out. Terminating.
|
|
|
91
|
Mar 07 08:50:00.605215 baobab systemd[1]: mnt-pve-AutoNAS-1.mount: Unmounting timed out. Terminating.
|
|
|
92
|
```
|
|
|
93
|
|
|
|
94
|
Blocked `pgs` stack during stale-NFS preflight:
|
|
|
95
|
|
|
|
96
|
```text
|
|
|
97
|
[<0>] rpc_wait_bit_killable+0x11/0x80 [sunrpc]
|
|
|
98
|
[<0>] nfs4_do_call_sync+0x6a/0xc0 [nfsv4]
|
|
|
99
|
[<0>] __nfs_revalidate_inode+0xd4/0x320 [nfs]
|
|
|
100
|
[<0>] __do_sys_newfstatat+0x43/0x90
|
|
|
101
|
```
|
|
|
102
|
|
|
|
103
|
Validated timing after fixes on `baobab`:
|
|
|
104
|
|
|
|
105
|
```text
|
|
|
106
|
TIME_TO_STOP_SECONDS 14.599
|
|
|
107
|
TIME_TO_FIRST_REPLY_SECONDS 35.651
|
|
|
108
|
DOWNTIME_SECONDS 21.053
|
|
|
109
|
```
|
|
|
110
|
|
|
|
111
|
---
|
|
|
112
|
|
|
|
113
|
## Investigation Notes
|
|
|
114
|
|
|
|
115
|
- 2026-03-07: Confirmed `AutoNAS-1` and `AutoNAS-2` on `baobab` are Proxmox NFS storages mounted from `192.168.10.21` and `192.168.10.22` over `thunderbridge`.
|
|
|
116
|
- 2026-03-07: First reboot validation on `baobab` showed shutdown delay dominated by NFS unmount timeout, not by boot.
|
|
|
117
|
- 2026-03-07: `tb-enlist@.service` had no ordering against `network.target`; systemd stopped Thunderbolt bridge membership before Proxmox unmounted remote storages.
|
|
|
118
|
- 2026-03-07: Patched shared `tb-enlist@.service` with `Before=network.target` and deployed to `baobab`, then cluster-wide.
|
|
|
119
|
- 2026-03-07: Separate maintenance attempt showed `pgs suspend` can block in `nfs4_proc_getattr` while scanning storage paths on stale remote NFS mounts.
|
|
|
120
|
- 2026-03-07: Patched `pgs` cleanup to scan only local `dir` storages; remote storages such as NFS are skipped intentionally.
|
|
|
121
|
- 2026-03-07: Revalidated on `baobab` after both fixes:
|
|
|
122
|
- NFS unmount started at `10:48:12.354/10:48:12.356`
|
|
|
123
|
- both NFS mounts unmounted successfully by `10:48:12.460`
|
|
|
124
|
- `network.target` stopped later at `10:48:16.152`
|
|
|
125
|
- ICMP loss dropped from ~106s to ~15s after reboot command
|
|
|
126
|
- 2026-03-07: `pgs resume` completed successfully after reboot on `baobab`; state file survived boot and all 4 VMs + 1 CT were restored.
|
|
|
127
|
- 2026-03-07: Validated `ebony` with current `pgs` and cluster-wide `thunderbolts` rollout. `pgs suspend` / `resume` succeeded for VMs `101`, `102`, `301`; state file survived reboot and restore completed.
|
|
|
128
|
- 2026-03-07: `ebony` still showed long shutdown because `AutoNAS-1` is currently provided by `ebony` itself through `autonas`. Stopping `autonas.service` made the node's own NFS client mount stale and `mnt-pve-AutoNAS-1.mount` waited for timeout.
|
|
|
129
|
- 2026-03-07: On `ebony`, PBS `anjothibe` availability loss during maintenance is expected because VM `301 is-anjohibe` is intentionally suspended by `pgs`, and its datastore dependency is also on `AutoNAS-1`.
|
|
|
130
|
- 2026-03-07: Implemented AutoNAS shutdown-ordering experiment on `ebony`: `autonas.service` and `autonas-boot-scan.service` now declare `Before=remote-fs.target` and `Before=umount.target`.
|
|
|
131
|
- 2026-03-07: Revalidated `ebony` after AutoNAS patch:
|
|
|
132
|
- previous timing: `TIME_TO_STOP_SECONDS 120.275`, `TIME_TO_FIRST_REPLY_SECONDS 145.840`
|
|
|
133
|
- new timing: `TIME_TO_STOP_SECONDS 27.573`, `TIME_TO_FIRST_REPLY_SECONDS 53.288`
|
|
|
134
|
- `mnt-pve-AutoNAS-2.mount` still unmounted cleanly
|
|
|
135
|
- `AutoNAS-1` no longer waited for the old 90s timeout, though a brief `Stale file handle` was still observed before the provider side stopped
|
|
|
136
|
- 2026-03-07: Residual issue on `ebony`: even with later provider shutdown, `pvestatd` briefly logged `storage 'AutoNAS-1' is not online` / `Stale file handle` during the maintenance window, so the self-hosted NFS topology remains fragile but no longer dominates shutdown time.
|
|
|
137
|
- 2026-03-07: Deployed the same AutoNAS ordering patch cluster-wide and revalidated `tapia`.
|
|
|
138
|
- 2026-03-07: `pgs suspend` / reboot / `pgs resume` succeeded on `tapia` for VMs `104`, `107`, `113`, `302`; state file survived reboot and all four guests were restored.
|
|
|
139
|
- 2026-03-07: `tapia` still showed slow shutdown after the AutoNAS patch:
|
|
|
140
|
- `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
|
|
|
141
|
- `mnt-pve-AutoNAS-1.mount` unmounted immediately at `11:45:01.827`
|
|
|
142
|
- `autonas.service` and `nfs-server.service` stopped around `11:45:01.689/11:45:01.900`
|
|
|
143
|
- `mnt-pve-AutoNAS-2.mount` then waited until timeout at `11:46:31.778`
|
|
|
144
|
- `network.target` stopped only after that, at `11:46:31.781`
|
|
|
145
|
- 2026-03-07: On `tapia`, the remaining delay is concentrated on self-hosted `AutoNAS-2` (`server 192.168.10.22`) plus expected maintenance-window loss of PBS `andrafiabe-AutoNAS` (`192.168.10.96`).
|
|
|
146
|
|
|
|
147
|
---
|
|
|
148
|
|
|
|
149
|
## Proposed Solution
|
|
|
150
|
|
|
|
151
|
1. Keep Thunderbolt enlist units ordered before `network.target` so storage traffic over `thunderbridge` remains alive until remote filesystems are unmounted.
|
|
|
152
|
2. Keep `pgs` cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
|
|
|
153
|
3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on `ebony`, exclude `AutoNAS-1` from local use or replace that local dependency with a direct/local storage path.
|
|
|
154
|
4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example `autonas` and PBS on `ebony`).
|
|
|
155
|
5. Apply the same self-hosted-storage review on `tapia`, where `AutoNAS-2` remains the dominant shutdown delay even after the AutoNAS ordering patch.
|
|
|
156
|
6. Validate the same shutdown path on the remaining nodes after storage-role cleanup.
|
|
|
157
|
|
|
|
158
|
---
|
|
|
159
|
|
|
|
160
|
## Related Issues
|
|
|
161
|
|
|
|
162
|
- ISSUE-2026-001
|
|
|
163
|
|
|
|
164
|
---
|
|
|
165
|
|
|
|
166
|
## Changelog References
|
|
|
167
|
|
|
|
168
|
List CHANGELOG.md entries that reference this issue:
|
|
|
169
|
- `projects/thunderbolts/CHANGELOG.md`: [Unreleased] - `tb-enlist@.service` now stays active until `network.target` stops... [ISSUE-2026-002]
|
|
|
170
|
- `projects/pve-guests-state/CHANGELOG.md`: [1.5] - Suspend-artifact cleanup now scans only local `dir` storages... [ISSUE-2026-002]
|