Showing 1 changed files with 23 additions and 10 deletions
+23 -10
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -586,7 +586,7 @@ rows exist".
586 586
 | 2026-06-04 | `a676df1` | Rebuild delta compact archives without large intermediate record arrays. | Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. |
587 587
 | 2026-06-04 | `457fd80` | Incrementally replace changed daily aggregate buckets. | Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. |
588 588
 | 2026-06-04 | `2ebfab3` | Incrementally update changed type summaries. | Follow-up full-profile delta report completed in `27.5s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=118, delta=9, initialImport=0`, and `DeltaEvents: 46`. Compared with the prior `31.2s` run, `SummedFinalizeElapsed` dropped `15.5s -> 11.7s`; Heart Rate finalize dropped `8.7s -> 4.8s`; Active Energy finalize stayed bounded at `1.7s`. Remaining cost moved back to delta archive processing: `SummedProcessingElapsed` was `11.6s`, with Heart Rate processing `6.1s`, Active Energy `2.3s`, and Basal Energy `1.9s` for small deltas. |
589
-| 2026-06-04 | pending | Patch compact archives from delta without full record maps. | The previous delta path decoded the whole compact archive into a UUID dictionary before applying a small HealthKit delta, so Heart Rate still paid high allocation/hash-map cost for only `9` delta events. Delta processing now keeps only changed samples/deletions in memory, scans the previous compact archive sequentially, replaces/deletes matching UUIDs, appends new records, and preserves existing per-type hash semantics. Expected signal: lower `SummedProcessingElapsed` and lower Heart Rate processing than the `6.1s` baseline when `DeltaEvents` stays small. |
589
+| 2026-06-04 | `4894b77` | Patch compact archives from delta without full record maps. | Follow-up full-profile delta report completed in `52.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=106, delta=21, initialImport=0`, and `DeltaEvents: 11,093`. This is not comparable to the previous `46`-event baseline: Active Energy had `2,377` events, Basal Energy `2,347`, Cycling Distance `6,052`, and Heart Rate `231`. Processing remained bounded relative to delta size (`SummedProcessingElapsed: 16.0s`; Heart Rate `5.9s`, Active Energy `2.2s`, Basal Energy `1.9s`, Cycling Distance `4.2s`), but wall clock rose because fetch `16.1s`, insert `2.3s`, and finalize `14.8s` all had real work. Conclusion: compact dictionary removal did not regress and looks healthy for large deltas, but a small-delta repeat is still needed to validate the original `6.1s` Heart Rate target. |
590 590
 
591 591
 ## Current Diagnosis
592 592
 
@@ -658,6 +658,14 @@ The likely bottleneck is per-row SQLite work:
658 658
   high after this patch, the next larger step would be a deliberate redesign of
659 659
   the legacy per-type compact archive/hash maintenance, not more HealthKit page
660 660
   tuning.
661
+- The latest post-`4894b77` report was a large-delta capture (`11,093` events),
662
+  not a small-delta repeat. It showed no full import fallback and no degraded
663
+  metrics, but it cannot prove the Heart Rate small-delta target because Heart
664
+  Rate had `231` events and other high-volume metrics had thousands. It does
665
+  reveal a second full-profile cost: some zero-count types can spend several
666
+  seconds in an empty anchored HealthKit query (`Wheelchair Pushes`, `Wheezing`,
667
+  `Zinc`). Treat this as a HealthKit fetch / profile scheduling issue, separate
668
+  from compact archive rebuild.
661 669
 
662 670
 ## Open Issues / Observations
663 671
 
@@ -699,22 +707,27 @@ Prioritize experiments in this order:
699 707
 4. Run a full-profile repeated capture after compact-delta archive patching.
700 708
    Compare `SummedProcessingElapsed`, Heart Rate processing time, and
701 709
    `DeltaEvents`. Expected success is Heart Rate processing below the previous
702
-   `6.1s` baseline when its delta remains small.
710
+   `6.1s` baseline when its delta remains small. The `52.1s` / `11,093`-event
711
+   report is useful stress evidence, but not the small-delta validation run.
703 712
 5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
704 713
    Heart Rate, Active Energy, and Basal Energy. If delta events are small while
705 714
    finalize remains large, optimize aggregate rebuild/finalization rather than
706 715
    HealthKit fetch, SQLite insert, or legacy compact archive reconstruction.
707
-6. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
708
-7. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
709
-8. Profile whether index maintenance dominates first-import insert cost.
710
-9. Consider a guarded bulk-import mode for first observations:
716
+6. Investigate full-profile empty anchored-query cost for zero-count types.
717
+   Compare slow empty types across reports before changing behavior; any skip or
718
+   lower-frequency strategy must preserve the promise that full authorized
719
+   backup can notice newly appearing data.
720
+7. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
721
+8. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
722
+9. Profile whether index maintenance dominates first-import insert cost.
723
+10. Consider a guarded bulk-import mode for first observations:
711 724
    - keep archive semantics unchanged;
712 725
    - only relax work that can be safely reconstructed or validated;
713 726
    - re-enable normal idempotent paths for incremental observations.
714
-10. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
715
-11. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
716
-12. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
717
-13. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
727
+11. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
728
+12. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
729
+13. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
730
+14. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
718 731
 
719 732
 ## Verification Checklist For Each Optimization
720 733