Showing 2 changed files with 56 additions and 21 deletions
+23 -13
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -587,7 +587,8 @@ rows exist".
587 587
 | 2026-06-04 | `457fd80` | Incrementally replace changed daily aggregate buckets. | Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. |
588 588
 | 2026-06-04 | `2ebfab3` | Incrementally update changed type summaries. | Follow-up full-profile delta report completed in `27.5s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=118, delta=9, initialImport=0`, and `DeltaEvents: 46`. Compared with the prior `31.2s` run, `SummedFinalizeElapsed` dropped `15.5s -> 11.7s`; Heart Rate finalize dropped `8.7s -> 4.8s`; Active Energy finalize stayed bounded at `1.7s`. Remaining cost moved back to delta archive processing: `SummedProcessingElapsed` was `11.6s`, with Heart Rate processing `6.1s`, Active Energy `2.3s`, and Basal Energy `1.9s` for small deltas. |
589 589
 | 2026-06-04 | `4894b77` | Patch compact archives from delta without full record maps. | Follow-up full-profile delta report completed in `52.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=106, delta=21, initialImport=0`, and `DeltaEvents: 11,093`. This is not comparable to the previous `46`-event baseline: Active Energy had `2,377` events, Basal Energy `2,347`, Cycling Distance `6,052`, and Heart Rate `231`. Processing remained bounded relative to delta size (`SummedProcessingElapsed: 16.0s`; Heart Rate `5.9s`, Active Energy `2.2s`, Basal Energy `1.9s`, Cycling Distance `4.2s`), but wall clock rose because fetch `16.1s`, insert `2.3s`, and finalize `14.8s` all had real work. Conclusion: compact dictionary removal did not regress and looks healthy for large deltas, but a small-delta repeat is still needed to validate the original `6.1s` Heart Rate target. |
590
-| 2026-06-04 | pending | Hash delta compact archives in recorded order. | Small-delta follow-up completed in `24.0s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=124, delta=3, initialImport=0`, and `DeltaEvents: 13`. This finally validated the remaining bottleneck: Heart Rate still spent `5.7s` processing only `6` events, Active Energy spent `2.1s` for `5` events, and Basal Energy spent `1.8s` for `2` events. The delta rebuild no longer builds a full UUID record map, but it still collected every fingerprint into a large array and sorted it for the per-type hash. Delta rebuild now uses the same recorded-order `TypeHashBuilder` strategy as initial import, avoiding the all-record fingerprint array and sort. Expected signal: lower Heart Rate processing than `5.7s` on the next small-delta run. |
590
+| 2026-06-04 | `1ba6c38` | Hash delta compact archives in recorded order. | Small-delta follow-up completed in `24.0s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=124, delta=3, initialImport=0`, and `DeltaEvents: 13`. This finally validated the remaining bottleneck: Heart Rate still spent `5.7s` processing only `6` events, Active Energy spent `2.1s` for `5` events, and Basal Energy spent `1.8s` for `2` events. The delta rebuild no longer builds a full UUID record map, but it still collected every fingerprint into a large array and sorted it for the per-type hash. Delta rebuild now uses the same recorded-order `TypeHashBuilder` strategy as initial import, avoiding the all-record fingerprint array and sort. |
591
+| 2026-06-04 | pending | Copy unchanged daily aggregates inside SQLite. | Follow-up small-delta run after `1ba6c38` completed in `23.6s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=120, delta=7, initialImport=0`, and `DeltaEvents: 11`. The hash change helped processing: `SummedProcessingElapsed` dropped `9.6s -> 8.1s`; Heart Rate processing dropped `5.7s -> 4.2s`, Active Energy `2.1s -> 1.7s`, and Basal Energy `1.8s -> 1.5s`. The bottleneck shifted to finalization: `SummedFinalizeElapsed` rose `10.2s -> 11.4s`, with Heart Rate still at `4.8s`. Changed daily aggregates were copying all previous daily rows through Swift before replacing affected buckets. Copying those unchanged rows now happens with `INSERT ... SELECT` in SQLite, while affected buckets remain recalculated. Expected signal: lower Heart Rate finalize than `4.8s` on the next small-delta run. |
591 592
 
592 593
 ## Current Diagnosis
593 594
 
@@ -673,6 +674,13 @@ The likely bottleneck is per-row SQLite work:
673 674
   is that per-type hash sorting over all record fingerprints is a large part of
674 675
   that residual cost. Delta hashing now uses recorded order, matching the
675 676
   initial-import hash builder path and removing the full fingerprint array/sort.
677
+- The first report after recorded-order delta hashing confirmed the direction:
678
+  processing improved, but finalization became the dominant cost. `Heart Rate`
679
+  still spent `4.8s` in finalize with only `3` delta events. The current
680
+  hypothesis is that changed daily aggregate replacement spends avoidable Swift
681
+  time copying unchanged materialized daily rows before replacing affected
682
+  buckets. Those copied rows now use SQLite `INSERT ... SELECT`; changed buckets
683
+  are still rebuilt normally.
676 684
 
677 685
 ## Open Issues / Observations
678 686
 
@@ -714,28 +722,30 @@ Prioritize experiments in this order:
714 722
 4. Run a full-profile repeated capture after compact-delta archive patching.
715 723
    Compare `SummedProcessingElapsed`, Heart Rate processing time, and
716 724
    `DeltaEvents`. Expected success is Heart Rate processing below the previous
717
-   `5.7s` baseline when its delta remains small. The `52.1s` / `11,093`-event
718
-   report is useful stress evidence; the later `24.0s` / `13`-event report is
719
-   the current small-delta baseline before recorded-order delta hashing.
725
+   `5.7s` baseline when its delta remains small. This target was met in the
726
+   `23.6s` / `11`-event report, where Heart Rate processing dropped to `4.2s`.
720 727
 5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
721 728
    Heart Rate, Active Energy, and Basal Energy. If delta events are small while
722 729
    finalize remains large, optimize aggregate rebuild/finalization rather than
723 730
    HealthKit fetch, SQLite insert, or legacy compact archive reconstruction.
724
-6. Investigate full-profile empty anchored-query cost for zero-count types.
731
+6. Run a full-profile repeated capture after SQL-side daily aggregate copying.
732
+   Compare `SummedFinalizeElapsed` and Heart Rate finalize against the current
733
+   `11.4s` total finalize / `4.8s` Heart Rate finalize baseline.
734
+7. Investigate full-profile empty anchored-query cost for zero-count types.
725 735
    Compare slow empty types across reports before changing behavior; any skip or
726 736
    lower-frequency strategy must preserve the promise that full authorized
727 737
    backup can notice newly appearing data.
728
-7. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
729
-8. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
730
-9. Profile whether index maintenance dominates first-import insert cost.
731
-10. Consider a guarded bulk-import mode for first observations:
738
+8. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
739
+9. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
740
+10. Profile whether index maintenance dominates first-import insert cost.
741
+11. Consider a guarded bulk-import mode for first observations:
732 742
    - keep archive semantics unchanged;
733 743
    - only relax work that can be safely reconstructed or validated;
734 744
    - re-enable normal idempotent paths for incremental observations.
735
-11. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
736
-12. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
737
-13. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
738
-14. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
745
+12. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
746
+13. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
747
+14. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
748
+15. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
739 749
 
740 750
 ## Verification Checklist For Each Optimization
741 751
 
+33 -8
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -3464,14 +3464,9 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
3464 3464
             }
3465 3465
         }
3466 3466
 
3467
-        let rows = try dailyAggregateRows(
3468
-            observationID: fromObservationID,
3469
-            sampleTypeID: sampleTypeID,
3470
-            db: db
3471
-        )
3472
-        try insertDailyAggregateRows(
3473
-            rows,
3474
-            observationID: toObservationID,
3467
+        try copyDailyAggregateRows(
3468
+            fromObservationID: fromObservationID,
3469
+            toObservationID: toObservationID,
3475 3470
             sampleTypeID: sampleTypeID,
3476 3471
             db: db
3477 3472
         )
@@ -3520,6 +3515,36 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
3520 3515
         return true
3521 3516
     }
3522 3517
 
3518
+    private func copyDailyAggregateRows(
3519
+        fromObservationID: Int64,
3520
+        toObservationID: Int64,
3521
+        sampleTypeID: Int64,
3522
+        db: OpaquePointer?
3523
+    ) throws {
3524
+        try withStatement(
3525
+            """
3526
+            INSERT INTO daily_type_aggregates (
3527
+                observation_id, sample_type_id, bucket_start, bucket_end,
3528
+                visible_record_count, value_sum, value_max, source_revision_id, aggregate_hash
3529
+            )
3530
+            SELECT
3531
+                ?, sample_type_id, bucket_start, bucket_end,
3532
+                visible_record_count, value_sum, value_max, source_revision_id, aggregate_hash
3533
+            FROM daily_type_aggregates
3534
+            WHERE observation_id = ?
3535
+              AND sample_type_id = ?
3536
+            """,
3537
+            db: db
3538
+        ) { statement in
3539
+            bindInt64(toObservationID, to: 1, in: statement)
3540
+            bindInt64(fromObservationID, to: 2, in: statement)
3541
+            bindInt64(sampleTypeID, to: 3, in: statement)
3542
+            guard sqlite3_step(statement) == SQLITE_DONE else {
3543
+                throw SQLiteHealthArchiveStoreError.stepFailed(lastErrorMessage(db))
3544
+            }
3545
+        }
3546
+    }
3547
+
3523 3548
     private func deleteDailyAggregateRows(
3524 3549
         observationID: Int64,
3525 3550
         sampleTypeID: Int64,