|
Bogdan Timofte
authored
3 months ago
|
1
|
# PGS - Changelog
|
|
|
2
|
|
|
|
3
|
## [1.5] - 2026-03-07
|
|
|
4
|
|
|
|
5
|
### Added
|
|
|
6
|
- Added `pgs cleanup` to scan image storages for orphan/stale `vm-*-state-suspend-YYYY-MM-DD.raw` volumes and remove them safely
|
|
|
7
|
|
|
|
8
|
### Fixed
|
|
|
9
|
- Stopped VMs are no longer classified as "already suspended to disk" from config flags alone; `bin/pgs` now requires `lock: suspended`, `vmstate:`, and a resolvable backing saved-state volume
|
|
|
10
|
- Added cleanup for inconsistent suspend artifacts on stopped VMs, including stale suspend locks, stale `vmstate:` metadata, and orphaned saved-state volumes on storage
|
|
|
11
|
- `pgs suspend` now runs suspend-artifact cleanup as a preflight, reducing same-day collisions with stale `state-suspend` volumes
|
|
|
12
|
- Cleanup explicitly ignores `vm-*-state-cp*.raw` checkpoint files and only targets `vm-*-state-suspend-YYYY-MM-DD.raw`
|
|
|
13
|
- Repeated `pgs suspend` runs now merge with the existing state file instead of discarding prior `to_resume` intent
|
|
|
14
|
- State now records `vm_details.suspend_volume` and `vm_details.suspend_file_date`, and `resume` skips auto-restore when a VM's suspend artifact changed after the state was saved
|
|
|
15
|
|
|
|
16
|
## [1.4] - 2026-03-06
|
|
|
17
|
|
|
|
18
|
### Changed
|
|
|
19
|
- Standardized install layout around `xdev` paths for uninstall, documentation, and runtime state
|
|
|
20
|
- Added dedicated `scripts/install.sh` and `scripts/uninstall.sh` and reduced `setup.sh` to a local/remote wrapper
|
|
|
21
|
- Updated `bin/pgs` to migrate legacy state from `/var/lib/pve-manager/pgs-state.json` to `/var/lib/xdev/pve-guests-state/pgs-state.json`
|
|
|
22
|
- Promoted `bin/pgs` as the canonical executable and removed the duplicate top-level `pgs` file
|
|
|
23
|
- Marked `systemd/` artifacts as legacy reference material instead of active install targets
|
|
|
24
|
|
|
|
25
|
### Fixed
|
|
|
26
|
- Fixed documentation to reflect the current manual workflow and the standardized host layout
|
|
|
27
|
|
|
|
28
|
## [1.2] - 2026-03-05
|
|
|
29
|
|
|
|
30
|
### Added
|
|
|
31
|
- LXC container (CT) support: graceful shutdown before maintenance, auto-start after maintenance
|
|
|
32
|
- New `ct_to_start` array in state JSON for CT restoration
|
|
|
33
|
- `load_ct_info()` function using single `pct list` call
|
|
|
34
|
- `shutdown_ct()` function with 120s timeout for graceful shutdown
|
|
|
35
|
- `start_ct()` function for post-maintenance startup
|
|
|
36
|
- TODO placeholder for critical VM/CT migration support
|
|
|
37
|
|
|
|
38
|
### Changed
|
|
|
39
|
- State file now includes `ct_to_start` array
|
|
|
40
|
- Suspend operation processes VMs then CTs
|
|
|
41
|
- Resume operation resumes VMs then starts CTs
|
|
|
42
|
|
|
|
43
|
### Fixed
|
|
|
44
|
- Fixed `pct list` column parsing (Status/Lock/Name column order)
|
|
|
45
|
- Handle empty Lock column in `pct list` output
|
|
|
46
|
|
|
|
47
|
## [1.1] - 2026-03-05
|
|
|
48
|
|
|
|
49
|
### Fixed
|
|
|
50
|
- Fixed `load_state()` outputting log messages to stdout, corrupting JSON parsing
|
|
|
51
|
- Fixed empty arrays in JSON state file (was generating `[""]` instead of `[]`)
|
|
|
52
|
- Fixed paused VMs being treated as "running" - now properly detects `paused` status
|
|
|
53
|
|
|
|
54
|
### Changed
|
|
|
55
|
- Optimized VM info loading: single `qm list` call instead of per-VM calls
|
|
|
56
|
- Optimized suspend lock detection: read config files directly, no extra `qm` calls
|
|
|
57
|
- Optimized status checking: only verify actual status for "running" VMs, rest trust `qm list`
|
|
|
58
|
- Reduced scan time from ~180 seconds to ~2.5 seconds for 30+ VMs
|
|
|
59
|
|
|
|
60
|
### Added
|
|
|
61
|
- Proper systemd service setup for manual suspend before maintenance
|
|
|
62
|
- Proper systemd service setup for manual resume after maintenance
|
|
|
63
|
- Better handling of paused VMs: suspend to disk but don't auto-resume
|
|
|
64
|
- Comprehensive journal logging with severity levels (INFO, WARNING, ERROR, SUCCESS)
|
|
|
65
|
- Dry-run mode for testing without effects
|
|
|
66
|
|
|
|
67
|
## [1.0] - 2026-03-05
|
|
|
68
|
|
|
|
69
|
### Initial Release
|
|
|
70
|
- Basic suspend/resume functionality
|
|
|
71
|
- State file preservation
|
|
|
72
|
- Manual testing scripts
|
|
|
73
|
|
|
|
74
|
---
|
|
|
75
|
|
|
|
76
|
## Performance Improvements
|
|
|
77
|
|
|
|
78
|
| Operation | v1.0 | v1.1 | Improvement |
|
|
|
79
|
|-----------|------|------|-------------|
|
|
|
80
|
| Scan 30 VMs | ~180s | ~2.5s | **72x faster** |
|
|
|
81
|
| System calls | Per-VM qm calls | Single qm list + file I/O | **Drastically reduced** |
|
|
|
82
|
|
|
|
83
|
## Known Limitations
|
|
|
84
|
|
|
|
85
|
- Requires passwordless SSH for cluster-wide operations
|
|
|
86
|
- No critical VM/CT migration support yet (TODO)
|
|
|
87
|
|
|
|
88
|
## Testing
|
|
|
89
|
|
|
|
90
|
Tested on:
|
|
|
91
|
- Proxmox VE 8.x with 30+ VMs and CTs
|
|
|
92
|
- Mixed VM configurations (4GB-16GB RAM)
|
|
|
93
|
- LXC containers with running services
|
|
|
94
|
- Storage: local-dir, NFS mount points
|
|
|
95
|
|
|
|
96
|
## Future Enhancements
|
|
|
97
|
|
|
|
98
|
- [ ] Support for LXC container shutdown
|
|
|
99
|
- [ ] Configurable exclusion list for VMs
|
|
|
100
|
- [ ] Metrics/performance monitoring
|
|
|
101
|
- [ ] Multi-node coordination for cluster-wide operations
|
|
|
102
|
- [ ] Backup integration for backup snapshots before suspend
|