|
Bogdan Timofte
authored
3 months ago
|
1
|
# Issue ISSUE-2026-002: Planned reboot stalls on shared NFS storages during maintenance shutdown
|
|
|
2
|
|
|
|
3
|
## Issue ID: ISSUE-2026-002
|
|
|
4
|
|
|
|
5
|
**Status:** resolved
|
|
|
6
|
**Priority:** high
|
|
|
7
|
**Created:** 2026-03-07
|
|
|
8
|
**Updated:** 2026-03-07
|
|
|
9
|
**Assigned to:** unassigned
|
|
|
10
|
|
|
|
11
|
---
|
|
|
12
|
|
|
|
13
|
## Summary
|
|
|
14
|
|
|
|
15
|
Planned node reboot could spend 90 to 120 seconds in shutdown because shared Proxmox NFS storages were not consistently ordered ahead of transport or provider teardown.
|
|
|
16
|
|
|
|
17
|
---
|
|
|
18
|
|
|
|
19
|
## Description
|
|
|
20
|
|
|
|
21
|
This incident had two independent cluster-level contributors that happened to surface in the same maintenance workflow.
|
|
|
22
|
|
|
|
23
|
The first was transport-related on `baobab`: `AutoNAS-1` and `AutoNAS-2` are mounted over `192.168.10.x` through `thunderbridge`, but Thunderbolt bridge membership was being torn down before Proxmox attempted to unmount those remote NFS storages.
|
|
|
24
|
|
|
|
25
|
The second was provider-related on `ebony` and `tapia`: local AutoNAS exports were mounted back on the same node as Proxmox NFS storages. In that self-hosted topology, shutdown became sensitive to the ordering between `umount.nfs4` and `nfs-server.service`.
|
|
|
26
|
|
|
|
27
|
The same investigation also exposed a separate maintenance-preflight issue in `pgs`: cleanup could block in kernel I/O wait when it touched stale remote NFS-backed storages.
|
|
|
28
|
|
|
|
29
|
The final fix therefore spans cluster maintenance, `thunderbolts`, `autoNAS`, and `pve-guests-state`, and should be tracked as a cluster issue rather than a project-local one.
|
|
|
30
|
|
|
|
31
|
---
|
|
|
32
|
|
|
|
33
|
## Environment
|
|
|
34
|
|
|
|
35
|
- **Affected nodes:** `baobab`, `ebony`, `tapia`
|
|
|
36
|
- **Component:** cluster storage + maintenance workflow
|
|
|
37
|
- **Version/software:** Proxmox VE 9.1 / kernel `6.17.13-1-pve`, `tb-enlist@.service`, `autoNAS`, `pgs`
|
|
|
38
|
|
|
|
39
|
---
|
|
|
40
|
|
|
|
41
|
## Steps to Reproduce
|
|
|
42
|
|
|
|
43
|
1. On a node with shared Proxmox NFS storages, run `/usr/local/sbin/pgs suspend -v`.
|
|
|
44
|
2. Trigger `systemctl reboot`.
|
|
|
45
|
3. Measure ICMP availability during shutdown and boot.
|
|
|
46
|
4. Inspect `journalctl -b -1` around the reboot window.
|
|
|
47
|
|
|
|
48
|
---
|
|
|
49
|
|
|
|
50
|
## Expected Behavior
|
|
|
51
|
|
|
|
52
|
- NFS storages should unmount before either their transport or provider disappears.
|
|
|
53
|
- Host should stop replying to ICMP shortly after reboot is requested.
|
|
|
54
|
- `pgs suspend` should not block because a remote NFS mount is stale.
|
|
|
55
|
|
|
|
56
|
---
|
|
|
57
|
|
|
|
58
|
## Actual Behavior
|
|
|
59
|
|
|
|
60
|
- First validation on `baobab`:
|
|
|
61
|
- `TIME_TO_STOP_SECONDS 105.852`
|
|
|
62
|
- `TIME_TO_FIRST_REPLY_SECONDS 130.230`
|
|
|
63
|
- `DOWNTIME_SECONDS 24.377`
|
|
|
64
|
- Follow-up validation on `ebony` before self-hosted fix:
|
|
|
65
|
- `TIME_TO_STOP_SECONDS 120.275`
|
|
|
66
|
- `TIME_TO_FIRST_REPLY_SECONDS 145.840`
|
|
|
67
|
- `DOWNTIME_SECONDS 25.565`
|
|
|
68
|
- Follow-up validation on `tapia` before provider-ordering fix:
|
|
|
69
|
- `TIME_TO_STOP_SECONDS 123.285`
|
|
|
70
|
- `TIME_TO_FIRST_REPLY_SECONDS 149.420`
|
|
|
71
|
- `DOWNTIME_SECONDS 26.135`
|
|
|
72
|
- Revalidation after fixes:
|
|
|
73
|
- `baobab`: `TIME_TO_STOP_SECONDS 14.599`, `TIME_TO_FIRST_REPLY_SECONDS 35.651`
|
|
|
74
|
- `ebony`: `TIME_TO_STOP_SECONDS 27.573`, `TIME_TO_FIRST_REPLY_SECONDS 53.288`
|
|
|
75
|
- `tapia`: `TIME_TO_STOP_SECONDS 28.305`, `TIME_TO_FIRST_REPLY_SECONDS 53.588`
|
|
|
76
|
- repeated `tapia` validation: `TIME_TO_STOP_SECONDS 28.990`, `TIME_TO_FIRST_REPLY_SECONDS 53.384`
|
|
|
77
|
|
|
|
78
|
---
|
|
|
79
|
|
|
|
80
|
## Logs/Evidence
|
|
|
81
|
|
|
|
82
|
Transport ordering failure on `baobab`:
|
|
|
83
|
|
|
|
84
|
```text
|
|
|
85
|
Mar 07 08:48:17.989246 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt0 was detached
|
|
|
86
|
Mar 07 08:48:30.540186 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-1.mount - /mnt/pve/AutoNAS-1...
|
|
|
87
|
Mar 07 08:50:00.604036 baobab systemd[1]: mnt-pve-AutoNAS-2.mount: Unmounting timed out. Terminating.
|
|
|
88
|
```
|
|
|
89
|
|
|
|
90
|
Preflight stale-NFS block in `pgs`:
|
|
|
91
|
|
|
|
92
|
```text
|
|
|
93
|
[<0>] rpc_wait_bit_killable+0x11/0x80 [sunrpc]
|
|
|
94
|
[<0>] nfs4_do_call_sync+0x6a/0xc0 [nfsv4]
|
|
|
95
|
[<0>] __nfs_revalidate_inode+0xd4/0x320 [nfs]
|
|
|
96
|
```
|
|
|
97
|
|
|
|
98
|
Provider-ordering fix validated on `tapia`:
|
|
|
99
|
|
|
|
100
|
```text
|
|
|
101
|
TIME_TO_STOP_SECONDS 28.990
|
|
|
102
|
TIME_TO_FIRST_REPLY_SECONDS 53.384
|
|
|
103
|
DOWNTIME_SECONDS 24.394
|
|
|
104
|
```
|
|
|
105
|
|
|
|
106
|
---
|
|
|
107
|
|
|
|
108
|
## Investigation Notes
|
|
|
109
|
|
|
|
110
|
- 2026-03-07: Confirmed `baobab` delay was dominated by NFS unmount timeout after Thunderbolt transport disappeared too early.
|
|
|
111
|
- 2026-03-07: Patched `tb-enlist@.service` with `Before=network.target`; reboot timing on `baobab` dropped from ~106s to ~15s.
|
|
|
112
|
- 2026-03-07: Confirmed `pgs` preflight could block on stale remote NFS during storage cleanup.
|
|
|
113
|
- 2026-03-07: Patched `pgs` cleanup to scan only local `dir` storages; remote NFS is skipped intentionally.
|
|
|
114
|
- 2026-03-07: Confirmed `ebony` delay was self-hosted `AutoNAS-1`: the node exported local storage and mounted it back as Proxmox NFS.
|
|
|
115
|
- 2026-03-07: First AutoNAS patch kept `autonas.service` and `autonas-boot-scan.service` ordered before `remote-fs.target` and `umount.target`; `ebony` improved to ~28s shutdown-to-ICMP-loss.
|
|
|
116
|
- 2026-03-07: `tapia` still showed ~123s shutdown with that first AutoNAS patch because `nfs-server.service` still stopped too early for self-hosted `AutoNAS-2`.
|
|
|
117
|
- 2026-03-07: Implemented second-generation AutoNAS fix that generates `/etc/systemd/system/nfs-server.service.d/50-autonas-self-hosted-proxmox.conf` from `storage.cfg`, adding explicit `Before=` ordering from `nfs-server.service` to matching self-hosted Proxmox mount units.
|
|
|
118
|
- 2026-03-07: Revalidated `tapia` twice after the `nfs-server.service` ordering fix; both tests converged around `29s` to ICMP loss and `53s` to first ICMP reply.
|
|
|
119
|
|
|
|
120
|
---
|
|
|
121
|
|
|
|
122
|
## Proposed Solution
|
|
|
123
|
|
|
|
124
|
1. Keep Thunderbolt enlist units ordered before `network.target` so transport-backed NFS over `thunderbridge` stays alive until remote filesystems unmount.
|
|
|
125
|
2. Keep `pgs` cleanup limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
|
|
|
126
|
3. For self-hosted AutoNAS exports, generate explicit `nfs-server.service` ordering against the matching Proxmox `mnt-pve-*.mount` units discovered from `storage.cfg`.
|
|
|
127
|
|
|
|
128
|
---
|
|
|
129
|
|
|
|
130
|
## Related Issues
|
|
|
131
|
|
|
|
132
|
- ISSUE-2026-001
|
|
|
133
|
|
|
|
134
|
---
|
|
|
135
|
|
|
|
136
|
## Changelog References
|
|
|
137
|
|
|
|
138
|
List CHANGELOG.md entries that reference this issue:
|
|
|
139
|
- `projects/thunderbolts/CHANGELOG.md`: `tb-enlist@.service` now stays active until `network.target` stops... [ISSUE-2026-002]
|
|
|
140
|
- `projects/autoNAS/CHANGELOG.md`: self-hosted AutoNAS shutdown now adds explicit `nfs-server.service` ordering... [ISSUE-2026-002]
|
|
|
141
|
- `projects/pve-guests-state/CHANGELOG.md`: Suspend-artifact cleanup now scans only local `dir` storages... [ISSUE-2026-002]
|