Showing 1 changed files with 15 additions and 8 deletions
+15 -8
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -583,7 +583,7 @@ rows exist".
583 583
 | 2026-06-03 | pending | Clarify capture mode in record-import diagnostics. | Two full-profile repeated snapshots after SQLite capture-state persistence completed successfully with stable checksum and identical record count (`2,646,613`). The first ran in `6.3s` with `SummedProcessingElapsed: 0.0s`, `SummedInsertElapsed: 0.0s`, `SummedFinalizeElapsed: 2.6s`; the second ran in `6.4s` with `0.0s` processing/insert and `2.7s` finalize. Heart Rate `922,526` and Active Energy `346,478` each completed around `0.2s`, proving the heavy full reimport path was avoided. The report wording still said "`N` samples in 1 anchored segment", which is ambiguous for inherited unchanged summaries; diagnostics now label unchanged empty-delta, delta-apply, and full-import modes explicitly. |
584 584
 | 2026-06-03 | pending | Add capture-mode summary to diagnostics. | Repeated full-profile captures rarely produce a perfect no-delta report because at least one metric can change between manual runs. Diagnostic reports now include aggregate `CaptureModes` counts plus per-metric `captureMode`, so comparisons can separate unchanged empty-delta metrics from delta-applied metrics and full imports without manually reading every `record_import` line. Expected signal: stable checksum plus high `unchangedDelta` count and zero summed processing/insert confirms the fast path even when a few metrics changed. |
585 585
 | 2026-06-03 | pending | Add delta-event counts to diagnostics. | A full-profile follow-up completed in `47.4s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=115, delta=12, initialImport=0`, `SummedProcessingElapsed: 25.9s`, `SummedInsertElapsed: 0.2s`, and `SummedFinalizeElapsed: 16.0s`. This confirms anchors work and no full import ran. Remaining cost is delta application for large metrics: Heart Rate `23.5s` total (`14.1s` processing, `8.8s` finalize), Active Energy `7.1s`, and Basal Energy `6.0s`. Diagnostics now report aggregate/per-metric `DeltaEvents` so future logs can separate true HealthKit delta size from the final visible record count. |
586
-| 2026-06-04 | pending | Rebuild delta compact archives without large intermediate record arrays. | Delta captures still need the legacy per-type content hash and compact record archive until the SwiftData bridge is fully retired. The delta path now streams the previous compact archive into the UUID map without first decoding a `[HealthRecordValue]`, and rebuilds the new compact archive/hash directly from the map instead of sorting and materializing a second large record array. Expected signal: lower `processingElapsed` for high-volume delta metrics such as Heart Rate, Active Energy, and Basal Energy while `snapshotChecksum` remains stable for equivalent content. |
586
+| 2026-06-04 | `a676df1` | Rebuild delta compact archives without large intermediate record arrays. | Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. |
587 587
 
588 588
 ## Current Diagnosis
589 589
 
@@ -644,6 +644,11 @@ The likely bottleneck is per-row SQLite work:
644 644
   removes avoidable Swift allocations. A larger future optimization would need a
645 645
   deliberate replacement for the legacy per-type fingerprint hash, not an ad hoc
646 646
   switch to SQLite aggregate hashes.
647
+- The `a676df1` streaming archive rebuild achieved the intended processing
648
+  reduction. Do not repeat this experiment unless a regression appears. The next
649
+  high-value target is `markVerification` / daily aggregate rebuild for changed
650
+  high-volume delta types, where Heart Rate still spent `8.9s` finalizing and
651
+  Active Energy spent `4.8s`.
647 652
 
648 653
 ## Open Issues / Observations
649 654
 
@@ -682,13 +687,15 @@ Prioritize experiments in this order:
682 687
    `CaptureModes`, and high-volume type timings. Treat stable checksum, a high
683 688
    `unchangedDelta` count, and zero processing/insert as the main unchanged-path
684 689
    signal.
685
-4. Use `DeltaEvents` to quantify changed high-volume metrics, especially Heart
686
-   Rate, Active Energy, and Basal Energy. If delta events are small while
687
-   processing/finalize remain large, optimize legacy compact archive/hash
688
-   reconstruction rather than HealthKit fetch or SQLite insert.
689
-5. Compare the next full-profile delta report against the 2026-06-03 `47.4s`
690
-   run: Heart Rate processing `14.1s`, Active Energy processing `4.9s`, Basal
691
-   Energy processing `4.1s`, and `SummedProcessingElapsed: 25.9s`.
690
+4. Optimize finalization for changed high-volume delta types. The latest
691
+   full-profile report showed `SummedFinalizeElapsed: 19.6s`, with Heart Rate
692
+   `8.9s`, Active Energy `4.8s`, and Basal Energy `1.7s`. Investigate whether
693
+   daily aggregates can be updated incrementally from `DeltaEvents` instead of
694
+   rebuilding all visible aggregates for the type.
695
+5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
696
+   Heart Rate, Active Energy, and Basal Energy. If delta events are small while
697
+   finalize remains large, optimize aggregate rebuild/finalization rather than
698
+   HealthKit fetch, SQLite insert, or legacy compact archive reconstruction.
692 699
 6. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
693 700
 7. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
694 701
 8. Profile whether index maintenance dominates first-import insert cost.