Showing 3 changed files with 213 additions and 25 deletions
+23 -10
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -589,7 +589,8 @@ rows exist".
589 589
 | 2026-06-04 | `4894b77` | Patch compact archives from delta without full record maps. | Follow-up full-profile delta report completed in `52.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=106, delta=21, initialImport=0`, and `DeltaEvents: 11,093`. This is not comparable to the previous `46`-event baseline: Active Energy had `2,377` events, Basal Energy `2,347`, Cycling Distance `6,052`, and Heart Rate `231`. Processing remained bounded relative to delta size (`SummedProcessingElapsed: 16.0s`; Heart Rate `5.9s`, Active Energy `2.2s`, Basal Energy `1.9s`, Cycling Distance `4.2s`), but wall clock rose because fetch `16.1s`, insert `2.3s`, and finalize `14.8s` all had real work. Conclusion: compact dictionary removal did not regress and looks healthy for large deltas, but a small-delta repeat is still needed to validate the original `6.1s` Heart Rate target. |
590 590
 | 2026-06-04 | `1ba6c38` | Hash delta compact archives in recorded order. | Small-delta follow-up completed in `24.0s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=124, delta=3, initialImport=0`, and `DeltaEvents: 13`. This finally validated the remaining bottleneck: Heart Rate still spent `5.7s` processing only `6` events, Active Energy spent `2.1s` for `5` events, and Basal Energy spent `1.8s` for `2` events. The delta rebuild no longer builds a full UUID record map, but it still collected every fingerprint into a large array and sorted it for the per-type hash. Delta rebuild now uses the same recorded-order `TypeHashBuilder` strategy as initial import, avoiding the all-record fingerprint array and sort. |
591 591
 | 2026-06-04 | `d4de48c` | Copy unchanged daily aggregates inside SQLite. | First small-delta run before this commit completed in `23.6s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=120, delta=7`, and `DeltaEvents: 11`. The hash change helped processing (`9.6s -> 8.1s`; Heart Rate `5.7s -> 4.2s`), but finalization stayed high (`11.4s`, Heart Rate `4.8s`). Changed daily aggregates were copying all previous daily rows through Swift before replacing affected buckets. This commit moved unchanged daily aggregate copying to SQLite `INSERT ... SELECT`, while affected buckets remain recalculated. Follow-up report completed in `21.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=123, delta=4`, and `DeltaEvents: 50`. `SummedFinalizeElapsed` improved `11.4s -> 9.3s` and wall clock improved `23.6s -> 21.1s`; however Heart Rate finalize was still `5.0s` with `4` events, so this helped overall finalize cost but did not remove the high-volume changed-type floor. |
592
-| 2026-06-04 | pending | Include build identity in diagnostic reports. | The latest diagnostic report only included `App Version: 1.0(1)` near the end and did not include a commit/build source identifier. This makes it too easy to compare reports from the wrong installed binary. Diagnostics now emit `appVersion`, `buildFingerprint`, `sourceCommit`, and `sourceDirty` in `OPERATION METADATA`. `buildFingerprint` is derived from the installed executable and should change when a different binary is installed; `sourceCommit/sourceDirty` remain available for builds that inject those Info.plist keys. Expected signal: future pasted reports have enough build identity to detect wrong-version reports immediately. |
592
+| 2026-06-04 | `c9091de` | Include build identity in diagnostic reports. | The latest diagnostic report only included `App Version: 1.0(1)` near the end and did not include a commit/build source identifier. This makes it too easy to compare reports from the wrong installed binary. Diagnostics now emit `appVersion`, `buildFingerprint`, `sourceCommit`, and `sourceDirty` in `OPERATION METADATA`. `buildFingerprint` is derived from the installed executable and should change when a different binary is installed; `sourceCommit/sourceDirty` remain available for builds that inject those Info.plist keys. Expected signal: future pasted reports have enough build identity to detect wrong-version reports immediately. |
593
+| 2026-06-04 | pending | Avoid summary full scans when delta safely replaces extremes. | Follow-up after `d4de48c` still showed `SummedFinalizeElapsed: 9.3s`, with Heart Rate finalize at `5.0s` despite only `4` Heart Rate delta events. `markVerification` already tries to update type summaries incrementally, but it fell back to a full visible-row aggregate scan whenever a removed row matched the previous earliest/latest/max, even if the same delta added a row that safely preserved or extended that extreme. The fallback is now narrower: full scan is still used when an extreme becomes unknown, but not when added rows replace the removed earliest/latest/max with an equivalent or stronger value. Expected signal: on repeated full-profile captures with small recent deltas, `SummedFinalizeElapsed` and high-volume type finalize times, especially Heart Rate, should drop when deltas replace recent/latest rows. Correctness is covered by synthetic SQLite archive tests for replaced latest/max and removed-latest-without-replacement cases. |
593 594
 
594 595
 ## Current Diagnosis
595 596
 
@@ -688,6 +689,11 @@ The likely bottleneck is per-row SQLite work:
688 689
   from conversation/order unless backed by an external build note. New reports
689 690
   should include `buildFingerprint`; `sourceCommit` and `sourceDirty` may still
690 691
   be `unknown` unless the build pipeline injects those Info.plist keys.
692
+- Incremental type-summary finalization was still too conservative around
693
+  removed extremes. A delta that removes the previous latest/max but adds a
694
+  newer/larger replacement no longer needs a full visible aggregate scan. The
695
+  archive still falls back to the full scan when a removed earliest/latest/max
696
+  leaves the new extreme genuinely unknown.
691 697
 
692 698
 ## Open Issues / Observations
693 699
 
@@ -746,21 +752,28 @@ Prioritize experiments in this order:
746 752
    identity unless the build provenance is otherwise certain. `sourceCommit`
747 753
    and `sourceDirty` are useful when present, but may be `unknown` for normal
748 754
    Xcode test installs.
749
-8. Investigate full-profile empty anchored-query cost for zero-count types.
755
+8. Run a repeated full-profile capture after narrowing incremental summary
756
+   fallback. Compare `SummedFinalizeElapsed` and Heart Rate finalize against the
757
+   `9.3s` total finalize / `5.0s` Heart Rate finalize report after `d4de48c`.
758
+   This optimization should help only when changed high-volume metrics replace
759
+   extremes safely; if Heart Rate finalize remains around `5s`, the remaining
760
+   cost is likely daily-bucket replacement or compact archive processing rather
761
+   than type summary full-scan fallback.
762
+9. Investigate full-profile empty anchored-query cost for zero-count types.
750 763
    Compare slow empty types across reports before changing behavior; any skip or
751 764
    lower-frequency strategy must preserve the promise that full authorized
752 765
    backup can notice newly appearing data.
753
-9. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
754
-10. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
755
-11. Profile whether index maintenance dominates first-import insert cost.
756
-12. Consider a guarded bulk-import mode for first observations:
766
+10. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
767
+11. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
768
+12. Profile whether index maintenance dominates first-import insert cost.
769
+13. Consider a guarded bulk-import mode for first observations:
757 770
    - keep archive semantics unchanged;
758 771
    - only relax work that can be safely reconstructed or validated;
759 772
    - re-enable normal idempotent paths for incremental observations.
760
-13. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
761
-14. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
762
-15. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
763
-16. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
773
+14. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
774
+15. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
775
+16. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
776
+17. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
764 777
 
765 778
 ## Verification Checklist For Each Optimization
766 779
 
+47 -15
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -3934,36 +3934,42 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
3934 3934
             )
3935 3935
         }
3936 3936
 
3937
-        if let previousEarliest = previous.earliestStartDate,
3938
-           let removedEarliest = delta.removedEarliestStartDate,
3939
-           sameSQLiteDouble(previousEarliest, removedEarliest) {
3937
+        let earliestUpdate = incrementallyUpdatedMinimum(
3938
+            previous: previous.earliestStartDate,
3939
+            removed: delta.removedEarliestStartDate,
3940
+            added: delta.addedEarliestStartDate
3941
+        )
3942
+        guard earliestUpdate.isKnown else {
3940 3943
             return nil
3941 3944
         }
3942
-        if let previousLatest = previous.latestEndDate,
3943
-           let removedLatest = delta.removedLatestEndDate,
3944
-           sameSQLiteDouble(previousLatest, removedLatest) {
3945
+        let latestUpdate = incrementallyUpdatedMaximum(
3946
+            previous: previous.latestEndDate,
3947
+            removed: delta.removedLatestEndDate,
3948
+            added: delta.addedLatestEndDate
3949
+        )
3950
+        guard latestUpdate.isKnown else {
3945 3951
             return nil
3946 3952
         }
3947
-        if let previousMax = previous.valueMax,
3948
-           let removedMax = delta.removedValueMax,
3949
-           sameSQLiteDouble(previousMax, removedMax) {
3953
+        let maxUpdate = incrementallyUpdatedMaximum(
3954
+            previous: previous.valueMax,
3955
+            removed: delta.removedValueMax,
3956
+            added: delta.addedValueMax
3957
+        )
3958
+        guard maxUpdate.isKnown else {
3950 3959
             return nil
3951 3960
         }
3952 3961
 
3953
-        let earliestStartDate = minOptional(previous.earliestStartDate, delta.addedEarliestStartDate)
3954
-        let latestEndDate = maxOptional(previous.latestEndDate, delta.addedLatestEndDate)
3955 3962
         let hasValueSum = previous.valueSum != nil || delta.addedValueSum != nil || delta.removedValueSum != nil
3956 3963
         let valueSum = hasValueSum
3957 3964
             ? (previous.valueSum ?? 0) + (delta.addedValueSum ?? 0) - (delta.removedValueSum ?? 0)
3958 3965
             : nil
3959
-        let valueMax = maxOptional(previous.valueMax, delta.addedValueMax)
3960 3966
 
3961 3967
         return ArchiveV2VisibleAggregate(
3962 3968
             visibleRecordCount: visibleRecordCount,
3963
-            earliestStartDate: earliestStartDate,
3964
-            latestEndDate: latestEndDate,
3969
+            earliestStartDate: earliestUpdate.value,
3970
+            latestEndDate: latestUpdate.value,
3965 3971
             valueSum: valueSum,
3966
-            valueMax: valueMax
3972
+            valueMax: maxUpdate.value
3967 3973
         )
3968 3974
     }
3969 3975
 
@@ -4026,6 +4032,32 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
4026 4032
         }
4027 4033
     }
4028 4034
 
4035
+    private func incrementallyUpdatedMinimum(previous: Double?, removed: Double?, added: Double?) -> (isKnown: Bool, value: Double?) {
4036
+        guard let previous else {
4037
+            return (true, added)
4038
+        }
4039
+        guard let removed, sameSQLiteDouble(previous, removed) else {
4040
+            return (true, minOptional(previous, added))
4041
+        }
4042
+        guard let added, added <= previous || sameSQLiteDouble(added, previous) else {
4043
+            return (false, nil)
4044
+        }
4045
+        return (true, added)
4046
+    }
4047
+
4048
+    private func incrementallyUpdatedMaximum(previous: Double?, removed: Double?, added: Double?) -> (isKnown: Bool, value: Double?) {
4049
+        guard let previous else {
4050
+            return (true, added)
4051
+        }
4052
+        guard let removed, sameSQLiteDouble(previous, removed) else {
4053
+            return (true, maxOptional(previous, added))
4054
+        }
4055
+        guard let added, added >= previous || sameSQLiteDouble(added, previous) else {
4056
+            return (false, nil)
4057
+        }
4058
+        return (true, added)
4059
+    }
4060
+
4029 4061
     private func minOptional(_ lhs: Double?, _ rhs: Double?) -> Double? {
4030 4062
         switch (lhs, rhs) {
4031 4063
         case let (lhs?, rhs?):
+143 -0
HealthProbeTests/SQLiteHealthArchiveStoreTests.swift
@@ -224,6 +224,100 @@ final class SQLiteHealthArchiveStoreTests: XCTestCase {
224 224
         )
225 225
     }
226 226
 
227
+    func testChangedVerificationKeepsSummaryCorrectWhenLatestAndMaxAreReplaced() async throws {
228
+        let url = databaseURL()
229
+        let store = SQLiteHealthArchiveStore(databaseURL: url)
230
+        let oldSample = makeStepCountSample(value: 10, start: 1_000, end: 1_300)
231
+        let oldLatestMaxSample = makeStepCountSample(value: 20, start: 4_000, end: 4_300)
232
+        let newLatestMaxSample = makeStepCountSample(value: 25, start: 5_000, end: 5_300)
233
+        let typeIdentifier = HKQuantityTypeIdentifier.stepCount.rawValue
234
+
235
+        _ = try await store.upsertSamples(
236
+            [oldSample, oldLatestMaxSample],
237
+            observedAt: Date(timeIntervalSince1970: 6_000)
238
+        )
239
+        try await store.markVerification(
240
+            sampleType: oldSample.sampleType,
241
+            verifiedAt: Date(timeIntervalSince1970: 6_060)
242
+        )
243
+        let changedObservationID = try await store.beginObservation(
244
+            observedAt: Date(timeIntervalSince1970: 6_120),
245
+            triggerReason: "manual",
246
+            selectedTypeSetHash: "selected-types"
247
+        )
248
+
249
+        _ = try await store.upsertSamples(
250
+            [newLatestMaxSample],
251
+            observedAt: Date(timeIntervalSince1970: 6_120),
252
+            observationID: changedObservationID
253
+        )
254
+        try await store.recordDisappearance(
255
+            sampleUUIDHash: HashService.sampleUUIDHash(oldLatestMaxSample.uuid.uuidString),
256
+            sampleTypeIdentifier: typeIdentifier,
257
+            observedMissingAt: Date(timeIntervalSince1970: 6_120),
258
+            observationID: changedObservationID
259
+        )
260
+        try await store.markVerification(
261
+            sampleType: oldSample.sampleType,
262
+            verifiedAt: Date(timeIntervalSince1970: 6_120),
263
+            observationID: changedObservationID
264
+        )
265
+
266
+        let summary = try typeSummary(observationID: changedObservationID, at: url)
267
+
268
+        XCTAssertEqual(summary.visibleRecordCount, 2)
269
+        XCTAssertEqual(summary.appearedCount, 1)
270
+        XCTAssertEqual(summary.disappearedCount, 1)
271
+        XCTAssertEqual(summary.earliestStartDate, oldSample.startDate.timeIntervalSince1970, accuracy: 0.000_001)
272
+        XCTAssertEqual(summary.latestEndDate, newLatestMaxSample.endDate.timeIntervalSince1970, accuracy: 0.000_001)
273
+        XCTAssertEqual(summary.valueSum, 35, accuracy: 0.000_001)
274
+        XCTAssertEqual(summary.valueMax, 25, accuracy: 0.000_001)
275
+    }
276
+
277
+    func testChangedVerificationKeepsSummaryCorrectWhenLatestIsRemovedWithoutReplacement() async throws {
278
+        let url = databaseURL()
279
+        let store = SQLiteHealthArchiveStore(databaseURL: url)
280
+        let remainingSample = makeStepCountSample(value: 10, start: 1_000, end: 1_300)
281
+        let removedLatestSample = makeStepCountSample(value: 20, start: 4_000, end: 4_300)
282
+        let typeIdentifier = HKQuantityTypeIdentifier.stepCount.rawValue
283
+
284
+        _ = try await store.upsertSamples(
285
+            [remainingSample, removedLatestSample],
286
+            observedAt: Date(timeIntervalSince1970: 6_000)
287
+        )
288
+        try await store.markVerification(
289
+            sampleType: remainingSample.sampleType,
290
+            verifiedAt: Date(timeIntervalSince1970: 6_060)
291
+        )
292
+        let changedObservationID = try await store.beginObservation(
293
+            observedAt: Date(timeIntervalSince1970: 6_120),
294
+            triggerReason: "manual",
295
+            selectedTypeSetHash: "selected-types"
296
+        )
297
+
298
+        try await store.recordDisappearance(
299
+            sampleUUIDHash: HashService.sampleUUIDHash(removedLatestSample.uuid.uuidString),
300
+            sampleTypeIdentifier: typeIdentifier,
301
+            observedMissingAt: Date(timeIntervalSince1970: 6_120),
302
+            observationID: changedObservationID
303
+        )
304
+        try await store.markVerification(
305
+            sampleType: remainingSample.sampleType,
306
+            verifiedAt: Date(timeIntervalSince1970: 6_120),
307
+            observationID: changedObservationID
308
+        )
309
+
310
+        let summary = try typeSummary(observationID: changedObservationID, at: url)
311
+
312
+        XCTAssertEqual(summary.visibleRecordCount, 1)
313
+        XCTAssertEqual(summary.appearedCount, 0)
314
+        XCTAssertEqual(summary.disappearedCount, 1)
315
+        XCTAssertEqual(summary.earliestStartDate, remainingSample.startDate.timeIntervalSince1970, accuracy: 0.000_001)
316
+        XCTAssertEqual(summary.latestEndDate, remainingSample.endDate.timeIntervalSince1970, accuracy: 0.000_001)
317
+        XCTAssertEqual(summary.valueSum, 10, accuracy: 0.000_001)
318
+        XCTAssertEqual(summary.valueMax, 10, accuracy: 0.000_001)
319
+    }
320
+
227 321
     func testDiffSummaryAndRecordsBetweenObservationsUseSQLVisibility() async throws {
228 322
         let url = databaseURL()
229 323
         let store = SQLiteHealthArchiveStore(databaseURL: url)
@@ -780,6 +874,55 @@ final class SQLiteHealthArchiveStoreTests: XCTestCase {
780 874
         return ids
781 875
     }
782 876
 
877
+    private func typeSummary(observationID: Int64, at url: URL) throws -> (
878
+        visibleRecordCount: Int,
879
+        appearedCount: Int,
880
+        disappearedCount: Int,
881
+        earliestStartDate: Double,
882
+        latestEndDate: Double,
883
+        valueSum: Double,
884
+        valueMax: Double
885
+    ) {
886
+        var db: OpaquePointer?
887
+        guard sqlite3_open_v2(url.path, &db, SQLITE_OPEN_READONLY | SQLITE_OPEN_FULLMUTEX, nil) == SQLITE_OK else {
888
+            sqlite3_close(db)
889
+            XCTFail("Could not open test database")
890
+            return (0, 0, 0, 0, 0, 0, 0)
891
+        }
892
+        defer { sqlite3_close(db) }
893
+
894
+        let sql = """
895
+        SELECT visible_record_count, appeared_count, disappeared_count,
896
+               earliest_start_date, latest_end_date, value_sum, value_max
897
+        FROM observation_type_summaries
898
+        WHERE observation_id = ?
899
+        LIMIT 1
900
+        """
901
+        var statement: OpaquePointer?
902
+        guard sqlite3_prepare_v2(db, sql, -1, &statement, nil) == SQLITE_OK else {
903
+            sqlite3_finalize(statement)
904
+            XCTFail("Could not prepare type summary query")
905
+            return (0, 0, 0, 0, 0, 0, 0)
906
+        }
907
+        defer { sqlite3_finalize(statement) }
908
+
909
+        sqlite3_bind_int64(statement, 1, observationID)
910
+        guard sqlite3_step(statement) == SQLITE_ROW else {
911
+            XCTFail("Missing type summary row")
912
+            return (0, 0, 0, 0, 0, 0, 0)
913
+        }
914
+
915
+        return (
916
+            visibleRecordCount: Int(sqlite3_column_int(statement, 0)),
917
+            appearedCount: Int(sqlite3_column_int(statement, 1)),
918
+            disappearedCount: Int(sqlite3_column_int(statement, 2)),
919
+            earliestStartDate: sqlite3_column_double(statement, 3),
920
+            latestEndDate: sqlite3_column_double(statement, 4),
921
+            valueSum: sqlite3_column_double(statement, 5),
922
+            valueMax: sqlite3_column_double(statement, 6)
923
+        )
924
+    }
925
+
783 926
     private func sampleVersionDebugRows(at url: URL) throws -> String {
784 927
         var db: OpaquePointer?
785 928
         guard sqlite3_open_v2(url.path, &db, SQLITE_OPEN_READONLY | SQLITE_OPEN_FULLMUTEX, nil) == SQLITE_OK else {