|
24
|
24
|
|
|
25
|
25
|
Follow-up validation on `ebony` showed a different but related cluster behavior: `AutoNAS-1` is currently exported by `ebony` itself. During reboot, `autonas.service` stops early, which makes the node's own Proxmox NFS client mount for `AutoNAS-1` stale and it then waits for timeout during unmount. In the same window, VM `301 is-anjohibe` (PBS `anjothibe`) is intentionally suspended by `pgs`, so PBS availability loss is expected during the maintenance window.
|
|
26
|
26
|
|
|
27
|
|
-Validation on `tapia` showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
|
|
|
27
|
+Validation on `tapia` initially showed the same class of topology problem for `AutoNAS-2`, which is locally exported there and mounted back as a Proxmox NFS storage. The first AutoNAS shutdown-ordering patch remained active, but reboot timing still stayed near the pre-fix range because `mnt-pve-AutoNAS-2.mount` waited for timeout during shutdown while PBS `andrafiabe-AutoNAS` had already become unreachable.
|
|
|
28
|
+
|
|
|
29
|
+Follow-up work in the `autoNAS` project added an explicit `nfs-server.service` drop-in for self-hosted Proxmox NFS mounts discovered from `storage.cfg`. After that second patch, `tapia` reboot timing dropped into the same range as `ebony`, confirming that the remaining blocker was provider ordering on `nfs-server.service`, not on `autonas.service`.
|
|
28
|
30
|
|
|
29
|
31
|
---
|
|
30
|
32
|
|
|
67
|
69
|
- `TIME_TO_STOP_SECONDS 123.285`
|
|
68
|
70
|
- `TIME_TO_FIRST_REPLY_SECONDS 149.420`
|
|
69
|
71
|
- `DOWNTIME_SECONDS 26.135`
|
|
|
72
|
+- Revalidation on `tapia` after explicit `nfs-server.service` self-hosted ordering fix:
|
|
|
73
|
+ - `TIME_TO_STOP_SECONDS 28.305`
|
|
|
74
|
+ - `TIME_TO_FIRST_REPLY_SECONDS 53.588`
|
|
|
75
|
+ - `DOWNTIME_SECONDS 25.283`
|
|
70
|
76
|
- `journalctl -b -1` showed:
|
|
71
|
77
|
- Thunderbolt bridge ports detached at `08:48:17.989`
|
|
72
|
78
|
- NFS unmount only started at `08:48:30.540`
|
|
143
|
149
|
- `mnt-pve-AutoNAS-2.mount` then waited until timeout at `11:46:31.778`
|
|
144
|
150
|
- `network.target` stopped only after that, at `11:46:31.781`
|
|
145
|
151
|
- 2026-03-07: On `tapia`, the remaining delay is concentrated on self-hosted `AutoNAS-2` (`server 192.168.10.22`) plus expected maintenance-window loss of PBS `andrafiabe-AutoNAS` (`192.168.10.96`).
|
|
|
152
|
+- 2026-03-07: Implemented a second-generation AutoNAS fix that generates `/etc/systemd/system/nfs-server.service.d/50-autonas-self-hosted-proxmox.conf` from `storage.cfg`, adding `Before=` ordering from `nfs-server.service` to the matching self-hosted Proxmox mount units.
|
|
|
153
|
+- 2026-03-07: Revalidated `tapia` after the `nfs-server.service` ordering fix:
|
|
|
154
|
+ - previous timing after first AutoNAS patch: `TIME_TO_STOP_SECONDS 123.285`, `TIME_TO_FIRST_REPLY_SECONDS 149.420`
|
|
|
155
|
+ - new timing: `TIME_TO_STOP_SECONDS 28.305`, `TIME_TO_FIRST_REPLY_SECONDS 53.588`
|
|
|
156
|
+ - `nfs-server.service` stopped at `12:07:42.157`, `network.target` stopped later at `12:07:47.230`
|
|
|
157
|
+ - the old ~90s timeout on `mnt-pve-AutoNAS-2.mount` no longer dominated shutdown
|
|
|
158
|
+ - `pgs suspend` / reboot / `pgs resume` completed successfully for VMs `104`, `107`, `113`, `302`
|
|
146
|
159
|
|
|
147
|
160
|
---
|
|
148
|
161
|
|
|
152
|
165
|
2. Keep `pgs` cleanup path limited to local directory-backed storages; do not let remote NFS availability gate planned maintenance.
|
|
153
|
166
|
3. Do not mount a node's own AutoNAS export back onto the same node as a Proxmox NFS storage; on `ebony`, exclude `AutoNAS-1` from local use or replace that local dependency with a direct/local storage path.
|
|
154
|
167
|
4. Review colocated service dependencies before planned reboot, especially when the node provides the storage it also consumes (for example `autonas` and PBS on `ebony`).
|
|
155
|
|
-5. Apply the same self-hosted-storage review on `tapia`, where `AutoNAS-2` remains the dominant shutdown delay even after the AutoNAS ordering patch.
|
|
156
|
|
-6. Validate the same shutdown path on the remaining nodes after storage-role cleanup.
|
|
|
168
|
+5. Keep the generated `nfs-server.service` self-hosted ordering drop-in as the cluster fix for nodes that export AutoNAS locally and also consume the same export back through Proxmox NFS.
|
|
|
169
|
+6. Validate the same shutdown path on the remaining nodes after storage-role cleanup and after the `nfs-server.service` ordering fix is deployed.
|
|
157
|
170
|
|
|
158
|
171
|
---
|
|
159
|
172
|
|