f16725e 3 months ago History
1 contributor
102 lines | 4.399kb

PGS - Changelog

[1.5] - 2026-03-07

Added

  • Added pgs cleanup to scan image storages for orphan/stale vm-*-state-suspend-YYYY-MM-DD.raw volumes and remove them safely

Fixed

  • Stopped VMs are no longer classified as "already suspended to disk" from config flags alone; bin/pgs now requires lock: suspended, vmstate:, and a resolvable backing saved-state volume
  • Added cleanup for inconsistent suspend artifacts on stopped VMs, including stale suspend locks, stale vmstate: metadata, and orphaned saved-state volumes on storage
  • pgs suspend now runs suspend-artifact cleanup as a preflight, reducing same-day collisions with stale state-suspend volumes
  • Cleanup explicitly ignores vm-*-state-cp*.raw checkpoint files and only targets vm-*-state-suspend-YYYY-MM-DD.raw
  • Repeated pgs suspend runs now merge with the existing state file instead of discarding prior to_resume intent
  • State now records vm_details.suspend_volume and vm_details.suspend_file_date, and resume skips auto-restore when a VM's suspend artifact changed after the state was saved

[1.4] - 2026-03-06

Changed

  • Standardized install layout around xdev paths for uninstall, documentation, and runtime state
  • Added dedicated scripts/install.sh and scripts/uninstall.sh and reduced setup.sh to a local/remote wrapper
  • Updated bin/pgs to migrate legacy state from /var/lib/pve-manager/pgs-state.json to /var/lib/xdev/pve-guests-state/pgs-state.json
  • Promoted bin/pgs as the canonical executable and removed the duplicate top-level pgs file
  • Marked systemd/ artifacts as legacy reference material instead of active install targets

Fixed

  • Fixed documentation to reflect the current manual workflow and the standardized host layout

[1.2] - 2026-03-05

Added

  • LXC container (CT) support: graceful shutdown before maintenance, auto-start after maintenance
  • New ct_to_start array in state JSON for CT restoration
  • load_ct_info() function using single pct list call
  • shutdown_ct() function with 120s timeout for graceful shutdown
  • start_ct() function for post-maintenance startup
  • TODO placeholder for critical VM/CT migration support

Changed

  • State file now includes ct_to_start array
  • Suspend operation processes VMs then CTs
  • Resume operation resumes VMs then starts CTs

Fixed

  • Fixed pct list column parsing (Status/Lock/Name column order)
  • Handle empty Lock column in pct list output

[1.1] - 2026-03-05

Fixed

  • Fixed load_state() outputting log messages to stdout, corrupting JSON parsing
  • Fixed empty arrays in JSON state file (was generating [""] instead of [])
  • Fixed paused VMs being treated as "running" - now properly detects paused status

Changed

  • Optimized VM info loading: single qm list call instead of per-VM calls
  • Optimized suspend lock detection: read config files directly, no extra qm calls
  • Optimized status checking: only verify actual status for "running" VMs, rest trust qm list
  • Reduced scan time from ~180 seconds to ~2.5 seconds for 30+ VMs

Added

  • Proper systemd service setup for manual suspend before maintenance
  • Proper systemd service setup for manual resume after maintenance
  • Better handling of paused VMs: suspend to disk but don't auto-resume
  • Comprehensive journal logging with severity levels (INFO, WARNING, ERROR, SUCCESS)
  • Dry-run mode for testing without effects

[1.0] - 2026-03-05

Initial Release

  • Basic suspend/resume functionality
  • State file preservation
  • Manual testing scripts

Performance Improvements

Operation v1.0 v1.1 Improvement
Scan 30 VMs ~180s ~2.5s 72x faster
System calls Per-VM qm calls Single qm list + file I/O Drastically reduced

Known Limitations

  • Requires passwordless SSH for cluster-wide operations
  • No critical VM/CT migration support yet (TODO)

Testing

Tested on: - Proxmox VE 8.x with 30+ VMs and CTs - Mixed VM configurations (4GB-16GB RAM) - LXC containers with running services - Storage: local-dir, NFS mount points

Future Enhancements

  • [ ] Support for LXC container shutdown
  • [ ] Configurable exclusion list for VMs
  • [ ] Metrics/performance monitoring
  • [ ] Multi-node coordination for cluster-wide operations
  • [ ] Backup integration for backup snapshots before suspend