Record large-delta import benchmark · a2af382

+23 -10

HealthProbe/Doc/04-project/Import-Optimization-Log.md

@@ -586,7 +586,7 @@ rows exist".
 | 2026-06-04 | `a676df1` | Rebuild delta compact archives without large intermediate record arrays. | Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. |
 | 2026-06-04 | `457fd80` | Incrementally replace changed daily aggregate buckets. | Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. |
 | 2026-06-04 | `2ebfab3` | Incrementally update changed type summaries. | Follow-up full-profile delta report completed in `27.5s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=118, delta=9, initialImport=0`, and `DeltaEvents: 46`. Compared with the prior `31.2s` run, `SummedFinalizeElapsed` dropped `15.5s -> 11.7s`; Heart Rate finalize dropped `8.7s -> 4.8s`; Active Energy finalize stayed bounded at `1.7s`. Remaining cost moved back to delta archive processing: `SummedProcessingElapsed` was `11.6s`, with Heart Rate processing `6.1s`, Active Energy `2.3s`, and Basal Energy `1.9s` for small deltas. |
-| 2026-06-04 | pending | Patch compact archives from delta without full record maps. | The previous delta path decoded the whole compact archive into a UUID dictionary before applying a small HealthKit delta, so Heart Rate still paid high allocation/hash-map cost for only `9` delta events. Delta processing now keeps only changed samples/deletions in memory, scans the previous compact archive sequentially, replaces/deletes matching UUIDs, appends new records, and preserves existing per-type hash semantics. Expected signal: lower `SummedProcessingElapsed` and lower Heart Rate processing than the `6.1s` baseline when `DeltaEvents` stays small. |
+| 2026-06-04 | `4894b77` | Patch compact archives from delta without full record maps. | Follow-up full-profile delta report completed in `52.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=106, delta=21, initialImport=0`, and `DeltaEvents: 11,093`. This is not comparable to the previous `46`-event baseline: Active Energy had `2,377` events, Basal Energy `2,347`, Cycling Distance `6,052`, and Heart Rate `231`. Processing remained bounded relative to delta size (`SummedProcessingElapsed: 16.0s`; Heart Rate `5.9s`, Active Energy `2.2s`, Basal Energy `1.9s`, Cycling Distance `4.2s`), but wall clock rose because fetch `16.1s`, insert `2.3s`, and finalize `14.8s` all had real work. Conclusion: compact dictionary removal did not regress and looks healthy for large deltas, but a small-delta repeat is still needed to validate the original `6.1s` Heart Rate target. |
 
 ## Current Diagnosis
 
@@ -658,6 +658,14 @@ The likely bottleneck is per-row SQLite work:
   high after this patch, the next larger step would be a deliberate redesign of
   the legacy per-type compact archive/hash maintenance, not more HealthKit page
   tuning.
+- The latest post-`4894b77` report was a large-delta capture (`11,093` events),
+  not a small-delta repeat. It showed no full import fallback and no degraded
+  metrics, but it cannot prove the Heart Rate small-delta target because Heart
+  Rate had `231` events and other high-volume metrics had thousands. It does
+  reveal a second full-profile cost: some zero-count types can spend several
+  seconds in an empty anchored HealthKit query (`Wheelchair Pushes`, `Wheezing`,
+  `Zinc`). Treat this as a HealthKit fetch / profile scheduling issue, separate
+  from compact archive rebuild.
 
 ## Open Issues / Observations
 
@@ -699,22 +707,27 @@ Prioritize experiments in this order:
 4. Run a full-profile repeated capture after compact-delta archive patching.
    Compare `SummedProcessingElapsed`, Heart Rate processing time, and
    `DeltaEvents`. Expected success is Heart Rate processing below the previous
-   `6.1s` baseline when its delta remains small.
+   `6.1s` baseline when its delta remains small. The `52.1s` / `11,093`-event
+   report is useful stress evidence, but not the small-delta validation run.
 5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
    Heart Rate, Active Energy, and Basal Energy. If delta events are small while
    finalize remains large, optimize aggregate rebuild/finalization rather than
    HealthKit fetch, SQLite insert, or legacy compact archive reconstruction.
-6. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
-7. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
-8. Profile whether index maintenance dominates first-import insert cost.
-9. Consider a guarded bulk-import mode for first observations:
+6. Investigate full-profile empty anchored-query cost for zero-count types.
+   Compare slow empty types across reports before changing behavior; any skip or
+   lower-frequency strategy must preserve the promise that full authorized
+   backup can notice newly appearing data.
+7. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
+8. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
+9. Profile whether index maintenance dominates first-import insert cost.
+10. Consider a guarded bulk-import mode for first observations:
    - keep archive semantics unchanged;
    - only relax work that can be safely reconstructed or validated;
    - re-enable normal idempotent paths for incremental observations.
-10. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
-11. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
-12. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
-13. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
+11. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
+12. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
+13. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
+14. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
 
 ## Verification Checklist For Each Optimization
 


	@@ -586,7 +586,7 @@ rows exist".
586	586	\| 2026-06-04 \| `a676df1` \| Rebuild delta compact archives without large intermediate record arrays. \| Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. \|
587	587	\| 2026-06-04 \| `457fd80` \| Incrementally replace changed daily aggregate buckets. \| Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. \|
588	588	\| 2026-06-04 \| `2ebfab3` \| Incrementally update changed type summaries. \| Follow-up full-profile delta report completed in `27.5s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=118, delta=9, initialImport=0`, and `DeltaEvents: 46`. Compared with the prior `31.2s` run, `SummedFinalizeElapsed` dropped `15.5s -> 11.7s`; Heart Rate finalize dropped `8.7s -> 4.8s`; Active Energy finalize stayed bounded at `1.7s`. Remaining cost moved back to delta archive processing: `SummedProcessingElapsed` was `11.6s`, with Heart Rate processing `6.1s`, Active Energy `2.3s`, and Basal Energy `1.9s` for small deltas. \|
589		-\| 2026-06-04 \| pending \| Patch compact archives from delta without full record maps. \| The previous delta path decoded the whole compact archive into a UUID dictionary before applying a small HealthKit delta, so Heart Rate still paid high allocation/hash-map cost for only `9` delta events. Delta processing now keeps only changed samples/deletions in memory, scans the previous compact archive sequentially, replaces/deletes matching UUIDs, appends new records, and preserves existing per-type hash semantics. Expected signal: lower `SummedProcessingElapsed` and lower Heart Rate processing than the `6.1s` baseline when `DeltaEvents` stays small. \|
	589	+\| 2026-06-04 \| `4894b77` \| Patch compact archives from delta without full record maps. \| Follow-up full-profile delta report completed in `52.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=106, delta=21, initialImport=0`, and `DeltaEvents: 11,093`. This is not comparable to the previous `46`-event baseline: Active Energy had `2,377` events, Basal Energy `2,347`, Cycling Distance `6,052`, and Heart Rate `231`. Processing remained bounded relative to delta size (`SummedProcessingElapsed: 16.0s`; Heart Rate `5.9s`, Active Energy `2.2s`, Basal Energy `1.9s`, Cycling Distance `4.2s`), but wall clock rose because fetch `16.1s`, insert `2.3s`, and finalize `14.8s` all had real work. Conclusion: compact dictionary removal did not regress and looks healthy for large deltas, but a small-delta repeat is still needed to validate the original `6.1s` Heart Rate target. \|
590	590
591	591	## Current Diagnosis
592	592
	@@ -658,6 +658,14 @@ The likely bottleneck is per-row SQLite work:
658	658	high after this patch, the next larger step would be a deliberate redesign of
659	659	the legacy per-type compact archive/hash maintenance, not more HealthKit page
660	660	tuning.
	661	+- The latest post-`4894b77` report was a large-delta capture (`11,093` events),
	662	+ not a small-delta repeat. It showed no full import fallback and no degraded
	663	+ metrics, but it cannot prove the Heart Rate small-delta target because Heart
	664	+ Rate had `231` events and other high-volume metrics had thousands. It does
	665	+ reveal a second full-profile cost: some zero-count types can spend several
	666	+ seconds in an empty anchored HealthKit query (`Wheelchair Pushes`, `Wheezing`,
	667	+ `Zinc`). Treat this as a HealthKit fetch / profile scheduling issue, separate
	668	+ from compact archive rebuild.
661	669
662	670	## Open Issues / Observations
663	671
	@@ -699,22 +707,27 @@ Prioritize experiments in this order:
699	707	4. Run a full-profile repeated capture after compact-delta archive patching.
700	708	Compare `SummedProcessingElapsed`, Heart Rate processing time, and
701	709	`DeltaEvents`. Expected success is Heart Rate processing below the previous
702		- `6.1s` baseline when its delta remains small.
	710	+ `6.1s` baseline when its delta remains small. The `52.1s` / `11,093`-event
	711	+ report is useful stress evidence, but not the small-delta validation run.
703	712	5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
704	713	Heart Rate, Active Energy, and Basal Energy. If delta events are small while
705	714	finalize remains large, optimize aggregate rebuild/finalization rather than
706	715	HealthKit fetch, SQLite insert, or legacy compact archive reconstruction.
707		-6. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
708		-7. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
709		-8. Profile whether index maintenance dominates first-import insert cost.
710		-9. Consider a guarded bulk-import mode for first observations:
	716	+6. Investigate full-profile empty anchored-query cost for zero-count types.
	717	+ Compare slow empty types across reports before changing behavior; any skip or
	718	+ lower-frequency strategy must preserve the promise that full authorized
	719	+ backup can notice newly appearing data.
	720	+7. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
	721	+8. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
	722	+9. Profile whether index maintenance dominates first-import insert cost.
	723	+10. Consider a guarded bulk-import mode for first observations:
711	724	- keep archive semantics unchanged;
712	725	- only relax work that can be safely reconstructed or validated;
713	726	- re-enable normal idempotent paths for incremental observations.
714		-10. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
715		-11. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
716		-12. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
717		-13. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
	727	+11. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
	728	+12. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
	729	+13. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
	730	+14. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
718	731
719	732	## Verification Checklist For Each Optimization
720	733