Showing 7 changed files with 184 additions and 3 deletions
+1 -1
projects/autoNAS
@@ -1 +1 @@
1
-Subproject commit d426b0effcb2e2195b7c6742718037862bd15767
1
+Subproject commit 9443424f399d74cacc9a7f8888c038aa23bd9713
+2 -0
projects/pve-guests-state/CHANGELOG.md
@@ -12,6 +12,7 @@
12 12
 - Cleanup explicitly ignores `vm-*-state-cp*.raw` checkpoint files and only targets `vm-*-state-suspend-YYYY-MM-DD.raw`
13 13
 - Repeated `pgs suspend` runs now merge with the existing state file instead of discarding prior `to_resume` intent
14 14
 - State now records `vm_details.suspend_volume` and `vm_details.suspend_file_date`, and `resume` skips auto-restore when a VM's suspend artifact changed after the state was saved
15
+- Suspend-artifact cleanup now scans only local `dir` storages; remote storages such as NFS are skipped so planned maintenance cannot block in kernel I/O wait on a stale mount [ISSUE-2026-002]
15 16
 
16 17
 ## [1.4] - 2026-03-06
17 18
 
@@ -92,6 +93,7 @@ Tested on:
92 93
 - Mixed VM configurations (4GB-16GB RAM)
93 94
 - LXC containers with running services
94 95
 - Storage: local-dir, NFS mount points
96
+- Planned reboot validation on `baobab`: shutdown-to-ICMP-loss improved from ~106s to ~15s once NFS over Thunderbolt stopped losing transport before unmount
95 97
 
96 98
 ## Future Enhancements
97 99
 
+2 -0
projects/pve-guests-state/README.md
@@ -19,6 +19,7 @@ Automatizarea prin systemd pentru shutdown si boot a fost abandonata intentionat
19 19
 - cleanup pentru volume orphan `vm-*-state-suspend-YYYY-MM-DD.raw`
20 20
 - retry pentru anumite erori legate de quorum
21 21
 - dry-run pentru verificare fara efecte
22
+- preflight cleanup limitat la storages locale `dir`, astfel incat un NFS remote stale sa nu blocheze `pgs suspend`
22 23
 
23 24
 ## Layout proiect
24 25
 
@@ -90,6 +91,7 @@ sudo /usr/local/lib/xdev/pve-guests-state/uninstall.sh
90 91
 - dupa un `resume` complet reusit, state file-ul este sters
91 92
 - daca `resume` are erori, state file-ul este pastrat pentru retry
92 93
 - `cleanup` si preflight-ul din `suspend` ating doar fisiere `vm-*-state-suspend-YYYY-MM-DD.raw`; fisiere `vm-*-state-cp*.raw` sau alte variante raman neatinse
94
+- `cleanup` si preflight-ul din `suspend` scaneaza doar storages locale de tip `dir`; storages remote (de exemplu NFS) sunt sarite intentionat pentru a evita blocarea mentenantei cand un mount remote este stale
93 95
 - un nou `suspend` peste un state file existent face merge, nu reseteaza lista de guest-uri de restaurat
94 96
 - state file-ul retine si `suspend_volume`/`suspend_file_date` per VM pentru a detecta guest-uri alterate dupa salvarea state-ului
95 97
 
+2 -1
projects/pve-guests-state/docs/TECHNICAL.md
@@ -52,9 +52,10 @@ State file-ul contine:
52 52
 
53 53
 ### Cleanup
54 54
 
55
-- scaneaza storage-urile cu `content images` definite in `/etc/pve/storage.cfg`
55
+- scaneaza doar storage-urile locale de tip `dir` cu `content images` definite in `/etc/pve/storage.cfg`
56 56
 - cauta exclusiv fisiere `vm-*-state-suspend-YYYY-MM-DD.raw`
57 57
 - ignora fisiere de forma `vm-*-state-cp*.raw`
58
+- storages remote precum NFS sunt sarite intentionat, pentru ca un mount stale poate bloca procesul in kernel I/O wait chiar inainte de mentenanta
58 59
 - daca un volum `state-suspend` este referit de un VM valid suspendat, il pastreaza
59 60
 - daca un volum `state-suspend` este referit, dar VM-ul nu mai are stare valida de suspend, curata `lock`, `vmstate` si volumul
60 61
 - daca un volum `state-suspend` nu mai este referit de niciun VM, il trateaza ca orphan si il sterge
+1 -0
projects/thunderbolts/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
10 10
 ### Fixed
11 11
 - Invalid `ExecStop` syntax in `tb-enlist@.service` caused failed unit teardown on Thunderbolt device removal [ISSUE-2026-001]
12 12
 - Tapia-Baobab Thunderbolt recovery path hardened after reboot-time disconnect/reconnect events [ISSUE-2026-001]
13
+- `tb-enlist@.service` now stays active until `network.target` stops, so NFS storages routed over `thunderbridge` can unmount cleanly before Thunderbolt ports are detached [ISSUE-2026-002]
13 14
 
14 15
 ### Added
15 16
 - Automatic Thunderbolt recovery worker (`tb-recover.service`) and periodic timer (`tb-recover.timer`) for flap resilience [ISSUE-2026-001]
+6 -1
projects/thunderbolts/README.md
@@ -101,7 +101,8 @@ interface names, then extend both helper functions so the script can locate it.
101 101
   Linux bridge, sets MTU 65520, and brings it up during early boot.
102 102
 - `tb-enlist@.service` attaches Thunderbolt NIC instances to the bridge, aligning
103 103
   their MTU and keeping them hotplug friendly; systemd stops the unit cleanly on
104
-  device removal.
104
+  device removal and keeps it ordered before `network.target` during shutdown so
105
+  remote filesystems over `thunderbridge` can unmount before the ports detach.
105 106
 - `90-thunderbolt-net-systemd.rules` tags `thunderbolt*` NICs so udev starts the
106 107
   enlist service automatically.
107 108
 
@@ -131,6 +132,10 @@ refreshes the interfaces.
131 132
   `systemctl enable --now tb-bridge.service` on the host.
132 133
 - *NICs not joining*: Check `journalctl -u tb-enlist@thunderbolt0` for logs and make
133 134
   sure the udev rule is present under `/etc/udev/rules.d`.
135
+- *Slow shutdown with NFS on thunderbridge*: Verify the host has the updated
136
+  `tb-enlist@.service` with `Before=network.target`; otherwise `thunderbridge`
137
+  can disappear before Proxmox unmounts NFS storages and shutdown waits on NFS
138
+  timeouts.
134 139
 - *MTU mismatch complaints*: The service forces MTU 65520 on both sides; verify the
135 140
   connected devices also support it.
136 141
 
+170 -0
projects/thunderbolts/issues/ISSUE-2026-002.md
@@ -0,0 +1,170 @@
1
+# Issue ISSUE-2026-002: Planned reboot stalls on NFS storages over thunderbridge before network shutdown
2
+
3
+## Issue ID: ISSUE-2026-002
4
+
5
+**Status:** investigating  
6
+**Priority:** high  
7
+**Created:** 2026-03-07  
8
+**Updated:** 2026-03-07  
9
+**Assigned to:** unassigned
10
+
11
+---
12
+
13
+## Summary
14
+
15
+Planned node reboot on `baobab` spent ~106 seconds in shutdown because Proxmox NFS storages were still mounted after Thunderbolt transport had already been detached from `thunderbridge`.
16
+
17
+---
18
+
19
+## Description
20
+
21
+During a controlled reboot validation on `baobab`, guest suspend worked correctly, but the host remained reachable over ICMP for almost two minutes after `systemctl reboot`. Journal analysis showed that the Thunderbolt bridge ports were detached early in shutdown, while Proxmox only attempted to unmount NFS storages later. Because `AutoNAS-1` and `AutoNAS-2` are mounted over `192.168.10.x` through `thunderbridge`, the NFS unmount path lost transport and waited for timeout.
22
+
23
+The same investigation exposed a second maintenance risk in `pgs`: preflight cleanup could block in kernel I/O wait when it touched remote NFS-backed storages that were stale or temporarily unavailable. That does not create the slow reboot itself, but it can block the maintenance preparation step.
24
+
25
+Follow-up validation on `ebony` showed a different but related cluster behavior: `AutoNAS-1` is currently exported by `ebony` itself. During reboot, `autonas.service` stops early, which makes the node's own Proxmox NFS client mount for `AutoNAS-1` stale and it then waits for timeout during unmount. In the same window, VM `301 is-anjohibe` (PBS `anjothibe`) is intentionally suspended by `pgs`, so PBS availability loss is expected during the maintenance window.
26
+
27
+Validation on `tapia` showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
28
+
29
+---
30
+
31
+## Environment
32
+
33
+- **Affected nodes:** `baobab` confirmed, likely all nodes using Proxmox NFS storages over `thunderbridge`
34
+- **Component:** network + storage + maintenance workflow
35
+- **Version/software:** Proxmox VE 9.1 / kernel `6.17.13-1-pve`, `tb-enlist@.service`, `pgs`
36
+
37
+---
38
+
39
+## Steps to Reproduce
40
+
41
+1. On a node with Proxmox NFS storages routed over `thunderbridge`, run `/usr/local/sbin/pgs suspend -v`.
42
+2. Trigger `systemctl reboot`.
43
+3. Measure ICMP availability during shutdown and boot.
44
+4. Inspect `journalctl -b -1` around the reboot window.
45
+
46
+---
47
+
48
+## Expected Behavior
49
+
50
+- NFS storages should unmount while Thunderbolt transport is still available.
51
+- Host should stop replying to ICMP shortly after reboot is requested.
52
+- `pgs suspend` should not hang because a remote NFS mount is stale.
53
+
54
+---
55
+
56
+## Actual Behavior
57
+
58
+- First validation on `baobab`:
59
+  - `TIME_TO_STOP_SECONDS 105.852`
60
+  - `TIME_TO_FIRST_REPLY_SECONDS 130.230`
61
+  - `DOWNTIME_SECONDS 24.377`
62
+- Follow-up validation on `ebony`:
63
+  - `TIME_TO_STOP_SECONDS 120.275`
64
+  - `TIME_TO_FIRST_REPLY_SECONDS 145.840`
65
+  - `DOWNTIME_SECONDS 25.565`
66
+- Follow-up validation on `tapia` after cluster-wide AutoNAS rollout:
67
+  - `TIME_TO_STOP_SECONDS 123.285`
68
+  - `TIME_TO_FIRST_REPLY_SECONDS 149.420`
69
+  - `DOWNTIME_SECONDS 26.135`
70
+- `journalctl -b -1` showed:
71
+  - Thunderbolt bridge ports detached at `08:48:17.989`
72
+  - NFS unmount only started at `08:48:30.540`
73
+  - `mnt-pve-AutoNAS-1.mount` and `mnt-pve-AutoNAS-2.mount` timed out at `08:50:00.604/0.605`
74
+- `journalctl -b -1` on `ebony` showed:
75
+  - `autonas.service` stopped at `11:04:22.326`
76
+  - `mnt-pve-AutoNAS-2.mount` unmounted successfully by `11:04:38.693`
77
+  - `mnt-pve-AutoNAS-1.mount` timed out at `11:06:08.679`
78
+  - only after that did `network.target` stop and `tb-enlist@thunderbolt0.service` detach from `thunderbridge`
79
+- A later maintenance attempt also showed `pgs suspend` blocked in `nfs4_proc_getattr` while scanning storage paths.
80
+
81
+---
82
+
83
+## Logs/Evidence
84
+
85
+```text
86
+Mar 07 08:48:17.989246 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt0 was detached
87
+Mar 07 08:48:17.993120 baobab NetworkManager[1096]: device (thunderbridge): bridge port thunderbolt1 was detached
88
+Mar 07 08:48:30.540186 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-1.mount - /mnt/pve/AutoNAS-1...
89
+Mar 07 08:48:30.541335 baobab systemd[1]: Unmounting mnt-pve-AutoNAS-2.mount - /mnt/pve/AutoNAS-2...
90
+Mar 07 08:50:00.604036 baobab systemd[1]: mnt-pve-AutoNAS-2.mount: Unmounting timed out. Terminating.
91
+Mar 07 08:50:00.605215 baobab systemd[1]: mnt-pve-AutoNAS-1.mount: Unmounting timed out. Terminating.
92
+```
93
+
94
+Blocked `pgs` stack during stale-NFS preflight:
95
+
96
+```text
97
+[<0>] rpc_wait_bit_killable+0x11/0x80 [sunrpc]
98
+[<0>] nfs4_do_call_sync+0x6a/0xc0 [nfsv4]
99
+[<0>] __nfs_revalidate_inode+0xd4/0x320 [nfs]
100
+[<0>] __do_sys_newfstatat+0x43/0x90
101
+```
102
+
103
+Validated timing after fixes on `baobab`:
104
+
105
+```text
106
+TIME_TO_STOP_SECONDS 14.599
107
+TIME_TO_FIRST_REPLY_SECONDS 35.651
108
+DOWNTIME_SECONDS 21.053
109
+```
110
+
111
+---
112
+
113
+## Investigation Notes
114
+
115
+- 2026-03-07: Confirmed `AutoNAS-1` and `AutoNAS-2` on `baobab` are Proxmox NFS storages mounted from `192.168.10.21` and `192.168.10.22` over `thunderbridge`.
116
+- 2026-03-07: First reboot validation on `baobab` showed shutdown delay dominated by NFS unmount timeout, not by boot.
117
+- 2026-03-07: `tb-enlist@.service` had no ordering against `network.target`; systemd stopped Thunderbolt bridge membership before Proxmox unmounted remote storages.
118
+- 2026-03-07: Patched shared `tb-enlist@.service` with `Before=network.target` and deployed to `baobab`, then cluster-wide.
119
+- 2026-03-07: Separate maintenance attempt showed `pgs suspend` can block in `nfs4_proc_getattr` while scanning storage paths on stale remote NFS mounts.
120
+- 2026-03-07: Patched `pgs` cleanup to scan only local `dir` storages; remote storages such as NFS are skipped intentionally.
121
+- 2026-03-07: Revalidated on `baobab` after both fixes:
122
+  - NFS unmount started at `10:48:12.354/10:48:12.356`
123
+  - both NFS mounts unmounted successfully by `10:48:12.460`
124
+  - `network.target` stopped later at `10:48:16.152`
125
+  - ICMP loss dropped from ~106s to ~15s after reboot command
126
+- 2026-03-07: `pgs resume` completed successfully after reboot on `baobab`; state file survived boot and all 4 VMs + 1 CT were restored.
127
+- 2026-03-07: Validated `ebony` with current `pgs` and cluster-wide `thunderbolts` rollout. `pgs suspend` / `resume` succeeded for VMs `101`, `102`, `301`; state file survived reboot and restore completed.
128
+- 2026-03-07: `ebony` still showed long shutdown because `AutoNAS-1` is currently provided by `ebony` itself through `autonas`. Stopping `autonas.service` made the node's own NFS client mount stale and `mnt-pve-AutoNAS-1.mount` waited for timeout.
129
+- 2026-03-07: On `ebony`, PBS `anjothibe` availability loss during maintenance is expected because VM `301 is-anjohibe` is intentionally suspended by `pgs`, and its datastore dependency is also on `AutoNAS-1`.
130
+- 2026-03-07: Implemented AutoNAS shutdown-ordering experiment on `ebony`: `autonas.service` and `autonas-boot-scan.service` now declare `Before=remote-fs.target` and `Before=umount.target`.
131
+- 2026-03-07: Revalidated `ebony` after AutoNAS patch:
132
+  - previous timing: `TIME_TO_STOP_SECONDS 120.275`, `TIME_TO_FIRST_REPLY_SECONDS 145.840`
133
+  - new timing: `TIME_TO_STOP_SECONDS 27.573`, `TIME_TO_FIRST_REPLY_SECONDS 53.288`
134
+  - `mnt-pve-AutoNAS-2.mount` still unmounted cleanly
135
+  - `AutoNAS-1` no longer waited for the old 90s timeout, though a brief `Stale file handle` was still observed before the provider side stopped
136
+- 2026-03-07: Residual issue on `ebony`: even with later provider shutdown, `pvestatd` briefly logged `storage 'AutoNAS-1' is not online` / `Stale file handle` during the maintenance window, so the self-hosted NFS topology remains fragile but no longer dominates shutdown time.
137
+- 2026-03-07: Deployed the same AutoNAS ordering patch cluster-wide and revalidated `tapia`.
138
+- 2026-03-07: `pgs suspend` / reboot / `pgs resume` succeeded on `tapia` for VMs `104`, `107`, `113`, `302`; state file survived reboot and all four guests were restored.
139
+- 2026-03-07: `tapia` still showed slow shutdown after the AutoNAS patch:
140
+  - `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
141
+  - `mnt-pve-AutoNAS-1.mount` unmounted immediately at `11:45:01.827`
142
+  - `autonas.service` and `nfs-server.service` stopped around `11:45:01.689/11:45:01.900`
143
+  - `mnt-pve-AutoNAS-2.mount` then waited until timeout at `11:46:31.778`
144
+  - `network.target` stopped only after that, at `11:46:31.781`
145
+- 2026-03-07: On `tapia`, the remaining delay is concentrated on self-hosted `AutoNAS-2` (`server 192.168.10.22`) plus expected maintenance-window loss of PBS `andrafiabe-AutoNAS` (`192.168.10.96`).
146
+
147
+---
148
+
149
+## Proposed Solution
150
+
151
+1. Keep Thunderbolt enlist units ordered before `network.target` so storage traffic over `thunderbridge` remains alive until remote filesystems are unmounted.
152
+2. Keep `pgs` cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
153
+3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on `ebony`, exclude `AutoNAS-1` from local use or replace that local dependency with a direct/local storage path.
154
+4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example `autonas` and PBS on `ebony`).
155
+5. Apply the same self-hosted-storage review on `tapia`, where `AutoNAS-2` remains the dominant shutdown delay even after the AutoNAS ordering patch.
156
+6. Validate the same shutdown path on the remaining nodes after storage-role cleanup.
157
+
158
+---
159
+
160
+## Related Issues
161
+
162
+- ISSUE-2026-001
163
+
164
+---
165
+
166
+## Changelog References
167
+
168
+List CHANGELOG.md entries that reference this issue:
169
+- `projects/thunderbolts/CHANGELOG.md`: [Unreleased] - `tb-enlist@.service` now stays active until `network.target` stops... [ISSUE-2026-002]
170
+- `projects/pve-guests-state/CHANGELOG.md`: [1.5] - Suspend-artifact cleanup now scans only local `dir` storages... [ISSUE-2026-002]