Showing 2 changed files with 192 additions and 12 deletions
+10 -11
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -584,7 +584,8 @@ rows exist".
584 584
 | 2026-06-03 | pending | Add capture-mode summary to diagnostics. | Repeated full-profile captures rarely produce a perfect no-delta report because at least one metric can change between manual runs. Diagnostic reports now include aggregate `CaptureModes` counts plus per-metric `captureMode`, so comparisons can separate unchanged empty-delta metrics from delta-applied metrics and full imports without manually reading every `record_import` line. Expected signal: stable checksum plus high `unchangedDelta` count and zero summed processing/insert confirms the fast path even when a few metrics changed. |
585 585
 | 2026-06-03 | pending | Add delta-event counts to diagnostics. | A full-profile follow-up completed in `47.4s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=115, delta=12, initialImport=0`, `SummedProcessingElapsed: 25.9s`, `SummedInsertElapsed: 0.2s`, and `SummedFinalizeElapsed: 16.0s`. This confirms anchors work and no full import ran. Remaining cost is delta application for large metrics: Heart Rate `23.5s` total (`14.1s` processing, `8.8s` finalize), Active Energy `7.1s`, and Basal Energy `6.0s`. Diagnostics now report aggregate/per-metric `DeltaEvents` so future logs can separate true HealthKit delta size from the final visible record count. |
586 586
 | 2026-06-04 | `a676df1` | Rebuild delta compact archives without large intermediate record arrays. | Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. |
587
-| 2026-06-04 | pending | Incrementally replace changed daily aggregate buckets. | The latest measured bottleneck is finalize work for changed high-volume delta types. Changed metrics now copy the previous observation's daily aggregates and recompute only the days touched by appeared, disappeared, or representation-changed events, with a bounded start-date window for the replacement query. Expected signal: lower `SummedFinalizeElapsed`, especially Heart Rate / Active Energy / Basal Energy finalize times, when `DeltaEvents` is small relative to the full visible type count. |
587
+| 2026-06-04 | `457fd80` | Incrementally replace changed daily aggregate buckets. | Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. |
588
+| 2026-06-04 | pending | Incrementally update changed type summaries. | Heart Rate still spent `8.7s` in finalize with only `5` delta events because changed verification computed `visibleRecordCount`, range, value sum, and max by scanning all visible Heart Rate rows. Changed metrics now attempt to derive the new type summary from the previous materialized summary plus appeared/removed version deltas, falling back to the full visible scan if a removed row may have been the previous earliest, latest, or max value. Expected signal: Heart Rate finalize time drops sharply when a small recent delta does not remove historical extrema. |
588 589
 
589 590
 ## Current Diagnosis
590 591
 
@@ -650,11 +651,10 @@ The likely bottleneck is per-row SQLite work:
650 651
   high-value target is `markVerification` / daily aggregate rebuild for changed
651 652
   high-volume delta types, where Heart Rate still spent `8.9s` finalizing and
652 653
   Active Energy spent `4.8s`.
653
-- Changed high-volume metrics should not rebuild daily aggregates for the whole
654
-  type when the current observation touched only a few days. The active
655
-  experiment is to copy previous daily aggregates and replace only affected
656
-  buckets; compare `DeltaEvents` and per-type finalize time before doing deeper
657
-  schema/index work.
654
+- Incremental daily aggregate replacement helped Active Energy but did not move
655
+  Heart Rate enough. The next bottleneck is the changed-type `visibleAggregate`
656
+  summary query, which still scans all visible rows for the type before daily
657
+  aggregate replacement runs.
658 658
 
659 659
 ## Open Issues / Observations
660 660
 
@@ -693,11 +693,10 @@ Prioritize experiments in this order:
693 693
    `CaptureModes`, and high-volume type timings. Treat stable checksum, a high
694 694
    `unchangedDelta` count, and zero processing/insert as the main unchanged-path
695 695
    signal.
696
-4. Run a full-profile repeated capture after incremental changed-bucket
697
-   aggregate replacement. Compare `SummedFinalizeElapsed`, Heart Rate / Active
698
-   Energy / Basal Energy finalize times, and `DeltaEvents`. If finalize remains
699
-   high with small `DeltaEvents`, inspect the bounded replacement SQL plan before
700
-   adding new write-time indexes.
696
+4. Run a full-profile repeated capture after incremental type-summary updates.
697
+   Compare `SummedFinalizeElapsed`, Heart Rate finalize time, and `DeltaEvents`.
698
+   Expected success is Heart Rate finalize below the previous `8.7s` when its
699
+   delta does not remove the previous earliest/latest/max row.
701 700
 5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
702 701
    Heart Rate, Active Energy, and Basal Energy. If delta events are small while
703 702
    finalize remains large, optimize aggregate rebuild/finalization rather than
+182 -1
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -2047,7 +2047,12 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
2047 2047
             return
2048 2048
         }
2049 2049
 
2050
-        let aggregate = try visibleAggregate(sampleTypeID: sampleTypeID, db: db)
2050
+        let aggregate = try incrementalVisibleAggregate(
2051
+            previous: previousSummary?.aggregate,
2052
+            observationID: observationID,
2053
+            sampleTypeID: sampleTypeID,
2054
+            db: db
2055
+        ) ?? visibleAggregate(sampleTypeID: sampleTypeID, db: db)
2051 2056
         let visibleCount = aggregate.visibleRecordCount
2052 2057
         try insertObservationTypeRun(
2053 2058
             observationID: observationID,
@@ -3876,6 +3881,156 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
3876 3881
         }
3877 3882
     }
3878 3883
 
3884
+    private func incrementalVisibleAggregate(
3885
+        previous: ArchiveV2VisibleAggregate?,
3886
+        observationID: Int64,
3887
+        sampleTypeID: Int64,
3888
+        db: OpaquePointer?
3889
+    ) throws -> ArchiveV2VisibleAggregate? {
3890
+        guard let previous else { return nil }
3891
+        let delta = try visibleAggregateDelta(
3892
+            observationID: observationID,
3893
+            sampleTypeID: sampleTypeID,
3894
+            db: db
3895
+        )
3896
+        guard delta.addedCount > 0 || delta.removedCount > 0 else {
3897
+            return previous
3898
+        }
3899
+
3900
+        let visibleRecordCount = previous.visibleRecordCount + delta.addedCount - delta.removedCount
3901
+        guard visibleRecordCount >= 0 else { return nil }
3902
+        guard visibleRecordCount > 0 else {
3903
+            return ArchiveV2VisibleAggregate(
3904
+                visibleRecordCount: 0,
3905
+                earliestStartDate: nil,
3906
+                latestEndDate: nil,
3907
+                valueSum: nil,
3908
+                valueMax: nil
3909
+            )
3910
+        }
3911
+
3912
+        if let previousEarliest = previous.earliestStartDate,
3913
+           let removedEarliest = delta.removedEarliestStartDate,
3914
+           sameSQLiteDouble(previousEarliest, removedEarliest) {
3915
+            return nil
3916
+        }
3917
+        if let previousLatest = previous.latestEndDate,
3918
+           let removedLatest = delta.removedLatestEndDate,
3919
+           sameSQLiteDouble(previousLatest, removedLatest) {
3920
+            return nil
3921
+        }
3922
+        if let previousMax = previous.valueMax,
3923
+           let removedMax = delta.removedValueMax,
3924
+           sameSQLiteDouble(previousMax, removedMax) {
3925
+            return nil
3926
+        }
3927
+
3928
+        let earliestStartDate = minOptional(previous.earliestStartDate, delta.addedEarliestStartDate)
3929
+        let latestEndDate = maxOptional(previous.latestEndDate, delta.addedLatestEndDate)
3930
+        let hasValueSum = previous.valueSum != nil || delta.addedValueSum != nil || delta.removedValueSum != nil
3931
+        let valueSum = hasValueSum
3932
+            ? (previous.valueSum ?? 0) + (delta.addedValueSum ?? 0) - (delta.removedValueSum ?? 0)
3933
+            : nil
3934
+        let valueMax = maxOptional(previous.valueMax, delta.addedValueMax)
3935
+
3936
+        return ArchiveV2VisibleAggregate(
3937
+            visibleRecordCount: visibleRecordCount,
3938
+            earliestStartDate: earliestStartDate,
3939
+            latestEndDate: latestEndDate,
3940
+            valueSum: valueSum,
3941
+            valueMax: valueMax
3942
+        )
3943
+    }
3944
+
3945
+    private func visibleAggregateDelta(
3946
+        observationID: Int64,
3947
+        sampleTypeID: Int64,
3948
+        db: OpaquePointer?
3949
+    ) throws -> ArchiveV2VisibleAggregateDelta {
3950
+        let sql = """
3951
+        WITH added_versions AS (
3952
+            SELECT v.start_date, v.end_date, v.numeric_value
3953
+            FROM sample_observation_events e
3954
+            JOIN samples s ON s.id = e.sample_id
3955
+            JOIN sample_versions v ON v.id = e.version_id
3956
+            WHERE e.observation_id = ?
3957
+              AND s.sample_type_id = ?
3958
+              AND e.version_id IS NOT NULL
3959
+              AND e.event_kind IN ('appeared', 'representationChanged')
3960
+        ),
3961
+        removed_versions AS (
3962
+            SELECT v.start_date, v.end_date, v.numeric_value
3963
+            FROM sample_visibility_ranges r
3964
+            JOIN samples s ON s.id = r.sample_id
3965
+            JOIN sample_versions v ON v.id = r.version_id
3966
+            WHERE r.last_observation_id = ?
3967
+              AND s.sample_type_id = ?
3968
+        )
3969
+        SELECT
3970
+            (SELECT COUNT(*) FROM added_versions),
3971
+            (SELECT MIN(start_date) FROM added_versions),
3972
+            (SELECT MAX(end_date) FROM added_versions),
3973
+            (SELECT SUM(numeric_value) FROM added_versions),
3974
+            (SELECT MAX(numeric_value) FROM added_versions),
3975
+            (SELECT COUNT(*) FROM removed_versions),
3976
+            (SELECT MIN(start_date) FROM removed_versions),
3977
+            (SELECT MAX(end_date) FROM removed_versions),
3978
+            (SELECT SUM(numeric_value) FROM removed_versions),
3979
+            (SELECT MAX(numeric_value) FROM removed_versions)
3980
+        """
3981
+        return try withStatement(sql, db: db) { statement in
3982
+            bindInt64(observationID, to: 1, in: statement)
3983
+            bindInt64(sampleTypeID, to: 2, in: statement)
3984
+            bindInt64(observationID, to: 3, in: statement)
3985
+            bindInt64(sampleTypeID, to: 4, in: statement)
3986
+            guard sqlite3_step(statement) == SQLITE_ROW else {
3987
+                return ArchiveV2VisibleAggregateDelta.empty
3988
+            }
3989
+            return ArchiveV2VisibleAggregateDelta(
3990
+                addedCount: columnInt(statement, 0) ?? 0,
3991
+                addedEarliestStartDate: columnDouble(statement, 1),
3992
+                addedLatestEndDate: columnDouble(statement, 2),
3993
+                addedValueSum: columnDouble(statement, 3),
3994
+                addedValueMax: columnDouble(statement, 4),
3995
+                removedCount: columnInt(statement, 5) ?? 0,
3996
+                removedEarliestStartDate: columnDouble(statement, 6),
3997
+                removedLatestEndDate: columnDouble(statement, 7),
3998
+                removedValueSum: columnDouble(statement, 8),
3999
+                removedValueMax: columnDouble(statement, 9)
4000
+            )
4001
+        }
4002
+    }
4003
+
4004
+    private func minOptional(_ lhs: Double?, _ rhs: Double?) -> Double? {
4005
+        switch (lhs, rhs) {
4006
+        case let (lhs?, rhs?):
4007
+            return min(lhs, rhs)
4008
+        case let (lhs?, nil):
4009
+            return lhs
4010
+        case let (nil, rhs?):
4011
+            return rhs
4012
+        case (nil, nil):
4013
+            return nil
4014
+        }
4015
+    }
4016
+
4017
+    private func maxOptional(_ lhs: Double?, _ rhs: Double?) -> Double? {
4018
+        switch (lhs, rhs) {
4019
+        case let (lhs?, rhs?):
4020
+            return max(lhs, rhs)
4021
+        case let (lhs?, nil):
4022
+            return lhs
4023
+        case let (nil, rhs?):
4024
+            return rhs
4025
+        case (nil, nil):
4026
+            return nil
4027
+        }
4028
+    }
4029
+
4030
+    private func sameSQLiteDouble(_ lhs: Double, _ rhs: Double) -> Bool {
4031
+        abs(lhs - rhs) < 0.000_001
4032
+    }
4033
+
3879 4034
     private func bindDiffObservationIDs(
3880 4035
         _ fromObservationID: Int64,
3881 4036
         _ toObservationID: Int64,
@@ -4384,6 +4539,32 @@ private struct ArchiveV2VisibleAggregate {
4384 4539
     let valueMax: Double?
4385 4540
 }
4386 4541
 
4542
+private struct ArchiveV2VisibleAggregateDelta {
4543
+    let addedCount: Int
4544
+    let addedEarliestStartDate: Double?
4545
+    let addedLatestEndDate: Double?
4546
+    let addedValueSum: Double?
4547
+    let addedValueMax: Double?
4548
+    let removedCount: Int
4549
+    let removedEarliestStartDate: Double?
4550
+    let removedLatestEndDate: Double?
4551
+    let removedValueSum: Double?
4552
+    let removedValueMax: Double?
4553
+
4554
+    static let empty = ArchiveV2VisibleAggregateDelta(
4555
+        addedCount: 0,
4556
+        addedEarliestStartDate: nil,
4557
+        addedLatestEndDate: nil,
4558
+        addedValueSum: nil,
4559
+        addedValueMax: nil,
4560
+        removedCount: 0,
4561
+        removedEarliestStartDate: nil,
4562
+        removedLatestEndDate: nil,
4563
+        removedValueSum: nil,
4564
+        removedValueMax: nil
4565
+    )
4566
+}
4567
+
4387 4568
 private struct ArchiveV2DailyAggregateRow {
4388 4569
     let bucketStart: Double
4389 4570
     let bucketEnd: Double