Showing 2 changed files with 17 additions and 4 deletions
+1 -1
projects/autoNAS
@@ -1 +1 @@
1
-Subproject commit 9443424f399d74cacc9a7f8888c038aa23bd9713
1
+Subproject commit 5bf8614cfafe29b3ec048bdcbf89ce09a65990dc
+16 -3
projects/thunderbolts/issues/ISSUE-2026-002.md
@@ -24,7 +24,9 @@ The same investigation exposed a second maintenance risk in `pgs`: preflight cle
24 24
 
25 25
 Follow-up validation on `ebony` showed a different but related cluster behavior: `AutoNAS-1` is currently exported by `ebony` itself. During reboot, `autonas.service` stops early, which makes the node's own Proxmox NFS client mount for `AutoNAS-1` stale and it then waits for timeout during unmount. In the same window, VM `301 is-anjohibe` (PBS `anjothibe`) is intentionally suspended by `pgs`, so PBS availability loss is expected during the maintenance window.
26 26
 
27
-Validation on `tapia` showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
27
+Validation on `tapia` initially showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The first AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
28
+
29
+Follow-up work in the `autoNAS` project added an explicit `nfs-server.service` drop-in for self-hosted Proxmox NFS mounts discovered from `storage.cfg`. After that second patch, `tapia` reboot timing dropped into the same range as `ebony`, confirming that the remaining blocker was provider ordering on `nfs-server.service`, not on `autonas.service`.
28 30
 
29 31
 ---
30 32
 
@@ -67,6 +69,10 @@ Validation on `tapia` showed the same class of topology problem for `AutoNAS-2`,
67 69
   - `TIME_TO_STOP_SECONDS 123.285`
68 70
   - `TIME_TO_FIRST_REPLY_SECONDS 149.420`
69 71
   - `DOWNTIME_SECONDS 26.135`
72
+- Revalidation on `tapia` after explicit `nfs-server.service` self-hosted ordering fix:
73
+  - `TIME_TO_STOP_SECONDS 28.305`
74
+  - `TIME_TO_FIRST_REPLY_SECONDS 53.588`
75
+  - `DOWNTIME_SECONDS 25.283`
70 76
 - `journalctl -b -1` showed:
71 77
   - Thunderbolt bridge ports detached at `08:48:17.989`
72 78
   - NFS unmount only started at `08:48:30.540`
@@ -143,6 +149,13 @@ DOWNTIME_SECONDS 21.053
143 149
   - `mnt-pve-AutoNAS-2.mount` then waited until timeout at `11:46:31.778`
144 150
   - `network.target` stopped only after that, at `11:46:31.781`
145 151
 - 2026-03-07: On `tapia`, the remaining delay is concentrated on self-hosted `AutoNAS-2` (`server 192.168.10.22`) plus expected maintenance-window loss of PBS `andrafiabe-AutoNAS` (`192.168.10.96`).
152
+- 2026-03-07: Implemented a second-generation AutoNAS fix that generates `/etc/systemd/system/nfs-server.service.d/50-autonas-self-hosted-proxmox.conf` from `storage.cfg`, adding `Before=` ordering from `nfs-server.service` to the matching self-hosted Proxmox mount units.
153
+- 2026-03-07: Revalidated `tapia` after the `nfs-server.service` ordering fix:
154
+  - previous timing after first AutoNAS patch: `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
155
+  - new timing: `TIME_TO_STOP_SECONDS 28.305`, `TIME_TO_FIRST_REPLY_SECONDS 53.588`
156
+  - `nfs-server.service` stopped at `12:07:42.157`, `network.target` stopped later at `12:07:47.230`
157
+  - the old ~90s timeout on `mnt-pve-AutoNAS-2.mount` no longer dominated shutdown
158
+  - `pgs suspend` / reboot / `pgs resume` completed successfully for VMs `104`, `107`, `113`, `302`
146 159
 
147 160
 ---
148 161
 
@@ -152,8 +165,8 @@ DOWNTIME_SECONDS 21.053
152 165
 2. Keep `pgs` cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
153 166
 3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on `ebony`, exclude `AutoNAS-1` from local use or replace that local dependency with a direct/local storage path.
154 167
 4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example `autonas` and PBS on `ebony`).
155
-5. Apply the same self-hosted-storage review on `tapia`, where `AutoNAS-2` remains the dominant shutdown delay even after the AutoNAS ordering patch.
156
-6. Validate the same shutdown path on the remaining nodes after storage-role cleanup.
168
+5. Keep the generated `nfs-server.service` self-hosted ordering drop-in as the cluster fix for nodes that export AutoNAS locally and also consume the same export back through Proxmox NFS.
169
+6. Validate the same shutdown path on the remaining nodes after storage-role cleanup and after the `nfs-server.service` ordering fix is deployed.
157 170
 
158 171
 ---
159 172