Showing 2 changed files with 50 additions and 103 deletions
+14 -17
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -597,7 +597,8 @@ rows exist".
597 597
 | 2026-06-05 | `6041bac` | Split daily aggregate finalization timings. | The first finalization phase report identified daily aggregate work as the remaining changed-type bottleneck, but `finalizeDailyAggregateElapsed` still mixed affected-bucket lookup, previous aggregate copy, destination delete, affected-bucket rebuild, replacement insert, and residual SQL/transaction overhead. Diagnostics now emit aggregate and per-type daily subphase fields: bucket lookup, copy, delete, rebuild, insert, and other. Follow-up report with `buildFingerprint: 1.0(1)-1780618540-92064` completed in `23.5s`, with `127/127` complete, `CaptureModes: unchangedDelta=118, delta=9`, and `DeltaEvents: 97`. Finalization was `10.5s`, daily aggregate work was `7.4s`, and daily rebuild alone was `6.9s`; daily copy was only `0.5s`. Heart Rate had `40` delta events and spent `4.8s` finalizing, of which `3.8s` was daily aggregate rebuild. Conclusion: copying previous materialized daily rows is not the bottleneck; affected-bucket rebuild scans are. |
598 598
 | 2026-06-05 | `bf5a861` | Rebuild changed daily aggregate buckets from time-ranged versions. | Confirmed on two full-profile repeated captures with `buildFingerprint: 1.0(1)-1780640325-92064`. The overnight-data run completed in `21.8s` with `127/127` complete, `CaptureModes: unchangedDelta=111, delta=16`, and `DeltaEvents: 322`; daily aggregate rebuild dropped from `6.9s` to `0.0s`, with daily aggregate work now only `0.6s` copy. The low-delta run completed in `6.6s` with `CaptureModes: unchangedDelta=125, delta=2`; daily aggregate rebuild again stayed `0.0s`. Conclusion: the time-ranged `sample_versions(start_date, sample_id)` query solved the affected-bucket rebuild bottleneck. Continue monitoring first-import insert timing because the new index is a write-path tradeoff. |
599 599
 | 2026-06-05 | `e7d45a2` | Count observation events from the event table first. | Confirmed on a full-profile repeated capture with `buildFingerprint: 1.0(1)-1780646299-92064`: wall clock `15.6s`, `127/127` complete, `CaptureModes: unchangedDelta=114, delta=13`, and `DeltaEvents: 99`. `SummedFinalizeEventCountElapsed` dropped from the post-`bf5a861` overnight baseline `2.5s` to `0.0s`; total finalize dropped to `1.7s`, while daily aggregate rebuild stayed `0.0s`. Remaining cost is processing: `9.1s` total, led by Heart Rate `4.6s` for `18` events, Active Energy `1.8s` for `11` events, and Basal Energy `1.5s` for `5` events. |
600
-| 2026-06-05 | pending | Split processing timing diagnostics. | With finalization reduced to `1.7s`, `SummedProcessingElapsed` is the active repeated-capture bottleneck. Diagnostics now split processing into delta apply, compact record archive rebuild, initial record processing, compact archive/hash finalization, and other. Expected signal: quantify how much of Heart Rate's `4.6s` processing is the legacy compact archive rebuild versus other processing. |
600
+| 2026-06-05 | `cfd9de8` | Split processing timing diagnostics. | Confirmed on a full-profile repeated capture with `buildFingerprint: 1.0(1)-1780683224-92064`: wall clock `14.7s`, `127/127` complete, `CaptureModes: unchangedDelta=120, delta=7`, and `DeltaEvents: 11`. `SummedProcessingElapsed` was `8.5s` and `SummedProcessingRecordArchiveRebuildElapsed` was also `8.5s`; delta apply, initial record processing, record archive finalization, and processing other all rounded to `0.0s`. Per-type rebuild cost dominated changed high-volume metrics: Heart Rate `4.4s`, Active Energy `1.7s`, Basal Energy `1.5s`, Steps `0.4s`, and Walking + Running Distance `0.4s`. Conclusion: the repeated-capture bottleneck is no longer SQLite finalization; it is the legacy compact `recordArchiveData` rebuild for changed types. |
601
+| 2026-06-05 | pending | Skip legacy compact archive rebuild for SQLite-backed deltas. | Changed delta captures now derive the saved count/date range/hash state from the previous SQLite-backed capture state plus the current HealthKit delta events, without decoding and rewriting the legacy compact record archive. TypeCount no longer writes an empty compact archive when no legacy records are present. Expected signal: `SummedProcessingRecordArchiveRebuildElapsed` should drop from the `cfd9de8` baseline `8.5s` toward `0.0s`, with Heart Rate no longer spending about `4.4s` to process tiny deltas. |
601 602
 
602 603
 ## Current Diagnosis
603 604
 
@@ -734,11 +735,11 @@ The likely bottleneck is per-row SQLite work:
734 735
 - The `e7d45a2` follow-up report validated the event-count query shape:
735 736
   `SummedFinalizeEventCountElapsed` dropped to `0.0s` and finalization is no
736 737
   longer the active repeated-capture bottleneck.
737
-- The active repeated-capture bottleneck is now processing, specifically the
738
-  legacy compact record archive/hash path for changed high-volume metrics. The
739
-  latest report spent `9.1s` in processing, including Heart Rate `4.6s` for
740
-  only `18` delta events. The next diagnostic split must quantify compact
741
-  archive rebuild directly before replacing it with a SQLite-derived state path.
738
+- The `cfd9de8` diagnostic split confirmed the active repeated-capture
739
+  bottleneck is the legacy compact record archive/hash path for changed
740
+  high-volume metrics. The latest report spent `8.5s` in processing and all of
741
+  it was `processingRecordArchiveRebuildElapsed`, including Heart Rate `4.4s`
742
+  for a tiny delta.
742 743
 - A large older-build first import on an `8.4M`-record database completed but
743 744
   took `166m10s`, with `137m31s` summed insert time. This confirms that full
744 745
   authorized backup volume can be much larger than the original 15-type test
@@ -802,20 +803,16 @@ Prioritize experiments in this order:
802 803
    identity unless the build provenance is otherwise certain. `sourceCommit`
803 804
    and `sourceDirty` are useful when present, but may be `unknown` for normal
804 805
    Xcode test installs.
805
-8. Run a repeated full-profile capture after processing subphase diagnostics.
806
-   Compare `SummedProcessingRecordArchiveRebuildElapsed`,
807
-   `SummedProcessingDeltaApplyElapsed`,
808
-   `SummedProcessingRecordArchiveFinalizeElapsed`, and per-type values against
809
-   the `e7d45a2` processing baseline: `9.1s` total, Heart Rate `4.6s`, Active
810
-   Energy `1.8s`, Basal Energy `1.5s`.
806
+8. Run a repeated full-profile capture after skipping legacy compact archive
807
+   rebuild for SQLite-backed deltas. Compare
808
+   `SummedProcessingRecordArchiveRebuildElapsed` against the `cfd9de8` baseline:
809
+   `8.5s` total, Heart Rate `4.4s`, Active Energy `1.7s`, Basal Energy `1.5s`.
811 810
 9. Keep watching first-import insert timing on the next clean large-database
812 811
    import because the new `sample_versions(start_date, sample_id)` index from
813 812
    `bf5a861` is a write-path tradeoff.
814
-10. Investigate replacing legacy compact `recordArchiveData` delta rebuild with
815
-   a SQLite-derived capture-state/hash path. The current repeated full-profile
816
-   reports still spend about `4s` processing Heart Rate for tiny deltas because
817
-   the Swift compact archive is decoded and rewritten for the whole 900k-row
818
-   type.
813
+10. After the compact archive rebuild skip is validated, decide whether the
814
+   legacy `recordArchiveData` field can be retired for new SwiftData TypeCount
815
+   rows or must remain populated only for first-import compatibility.
819 816
 11. Investigate full-profile empty anchored-query cost for zero-count types.
820 817
    Compare slow empty types across reports before changing behavior; any skip or
821 818
    lower-frequency strategy must preserve the promise that full authorized
+36 -86
HealthProbe/Services/HealthKitService.swift
@@ -1296,10 +1296,12 @@ final class HealthKitService {
1296 1296
         }
1297 1297
 
1298 1298
         let archiveRebuildStartedAt = Date()
1299
-        let rebuiltArchive = try Self.rebuildRecordArchive(
1299
+        let rebuiltArchive = Self.makeDeltaDistributionState(
1300 1300
             typeIdentifier: typeIdentifier,
1301 1301
             previousDistribution: previousDistribution,
1302 1302
             sampleType: sampleType,
1303
+            earliestDate: earliestDate,
1304
+            latestDate: latestDate,
1303 1305
             deltaPages: deltaPages
1304 1306
         )
1305 1307
         let archiveRebuildElapsed = Date().timeIntervalSince(archiveRebuildStartedAt)
@@ -1354,94 +1356,61 @@ final class HealthKitService {
1354 1356
             records: [],
1355 1357
             contentHash: rebuiltArchive.contentHash,
1356 1358
             yearlyCounts: nil,
1357
-            recordArchiveData: rebuiltArchive.recordArchiveData,
1359
+            recordArchiveData: nil,
1358 1360
             captureMode: .delta,
1359 1361
             deltaEventCount: processedEventCount,
1360 1362
             timingBreakdown: captureTimings.importBreakdown
1361 1363
         )
1362 1364
     }
1363 1365
 
1364
-    private static func rebuildRecordArchive(
1366
+    private static func makeDeltaDistributionState(
1365 1367
         typeIdentifier: String,
1366 1368
         previousDistribution: PreviousDistributionState,
1367 1369
         sampleType: HKSampleType,
1370
+        earliestDate: Date?,
1371
+        latestDate: Date?,
1368 1372
         deltaPages: [SampleDistributionPage]
1369
-    ) throws -> RebuiltRecordArchive {
1373
+    ) -> RebuiltRecordArchive {
1370 1374
         let deltaSampleCount = deltaPages.reduce(0) { $0 + $1.samples.count }
1371 1375
         let deltaDeletedCount = deltaPages.reduce(0) { $0 + $1.deletedObjects.count }
1372
-        var patches: [String: RecordArchivePatch] = [:]
1373
-        patches.reserveCapacity(deltaSampleCount + deltaDeletedCount)
1374
-        var replacementOrder: [String] = []
1375
-        replacementOrder.reserveCapacity(deltaSampleCount)
1376
+        var replacementFingerprints: [String] = []
1377
+        replacementFingerprints.reserveCapacity(deltaSampleCount)
1378
+        var deletedUUIDHashes: [String] = []
1379
+        deletedUUIDHashes.reserveCapacity(deltaDeletedCount)
1376 1380
 
1377 1381
         for page in deltaPages {
1378 1382
             for deletedObject in page.deletedObjects {
1379
-                patches[HashService.sampleUUIDHash(deletedObject.uuid.uuidString)] = .deleted
1383
+                deletedUUIDHashes.append(HashService.sampleUUIDHash(deletedObject.uuid.uuidString))
1380 1384
             }
1381 1385
 
1382 1386
             for sample in page.samples {
1383 1387
                 let value = recordValue(for: sample, sampleType: sampleType, typeIdentifier: typeIdentifier)
1384
-                if case .replacement = patches[value.sampleUUIDHash] {
1385
-                    // Keep the first replacement order for deterministic archive output.
1386
-                } else {
1387
-                    replacementOrder.append(value.sampleUUIDHash)
1388
-                }
1389
-                patches[value.sampleUUIDHash] = .replacement(value)
1388
+                replacementFingerprints.append(value.recordFingerprint)
1390 1389
             }
1391 1390
         }
1392 1391
 
1393
-        var writer = HealthRecordArchive.makeCompactWriter(
1394
-            typeIdentifier: typeIdentifier,
1395
-            estimatedRecordCount: max(0, previousDistribution.count + deltaSampleCount - deltaDeletedCount)
1392
+        let recordCount = max(0, previousDistribution.count + deltaSampleCount - deltaDeletedCount)
1393
+        let effectiveEarliestDate = earliestDate ?? previousDistribution.earliestRecordDate
1394
+        let effectiveLatestDate = latestDate ?? previousDistribution.latestRecordDate
1395
+        let contentHash = HashService.archiveContentHash(
1396
+            domain: "hp:sqlite_delta_type_state:v1",
1397
+            parts: [
1398
+                typeIdentifier,
1399
+                previousDistribution.contentHash,
1400
+                String(previousDistribution.count),
1401
+                String(recordCount),
1402
+                effectiveEarliestDate.map { String(format: "%.6f", $0.timeIntervalSince1970) },
1403
+                effectiveLatestDate.map { String(format: "%.6f", $0.timeIntervalSince1970) },
1404
+                replacementFingerprints.sorted().joined(separator: "|"),
1405
+                deletedUUIDHashes.sorted().joined(separator: "|")
1406
+            ]
1396 1407
         )
1397
-        var hashBuilder = HashService.TypeHashBuilder(typeIdentifier: typeIdentifier)
1398
-        var earliestDate: Date?
1399
-        var latestDate: Date?
1400
-        var recordCount = 0
1401
-
1402
-        func append(_ value: HealthRecordValue) {
1403
-            hashBuilder.append(recordFingerprint: value.recordFingerprint)
1404
-            earliestDate = min(earliestDate ?? value.startDate, value.startDate)
1405
-            latestDate = max(latestDate ?? value.endDate, value.endDate)
1406
-            writer.append(value)
1407
-            recordCount += 1
1408
-        }
1409
-
1410
-        if let recordArchiveData = previousDistribution.recordArchiveData {
1411
-            let didRead = HealthRecordArchive.forEachRecord(in: recordArchiveData) { record in
1412
-                if let patch = patches.removeValue(forKey: record.sampleUUIDHash) {
1413
-                    switch patch {
1414
-                    case .replacement(let replacement):
1415
-                        append(replacement)
1416
-                    case .deleted:
1417
-                        break
1418
-                    }
1419
-                } else {
1420
-                    append(record)
1421
-                }
1422
-            }
1423
-            guard didRead else {
1424
-                throw HealthRecordArchiveReadError.decodeFailed(typeIdentifier: previousDistribution.typeIdentifier)
1425
-            }
1426
-        } else if previousDistribution.count > 0 {
1427
-            throw HealthRecordArchiveReadError.missingArchive(
1428
-                typeIdentifier: previousDistribution.typeIdentifier,
1429
-                count: previousDistribution.count
1430
-            )
1431
-        }
1432
-
1433
-        for sampleUUIDHash in replacementOrder {
1434
-            if case .replacement(let value) = patches.removeValue(forKey: sampleUUIDHash) {
1435
-                append(value)
1436
-            }
1437
-        }
1438
-
1439 1408
         return RebuiltRecordArchive(
1440 1409
             count: recordCount,
1441
-            contentHash: hashBuilder.finalize(),
1442
-            earliestDate: earliestDate,
1443
-            latestDate: latestDate,
1444
-            recordArchiveData: writer.finalize()
1410
+            contentHash: contentHash,
1411
+            earliestDate: effectiveEarliestDate,
1412
+            latestDate: effectiveLatestDate,
1413
+            recordArchiveData: nil
1445 1414
         )
1446 1415
     }
1447 1416
 
@@ -2557,26 +2526,7 @@ private struct RebuiltRecordArchive: Sendable {
2557 2526
     let contentHash: String
2558 2527
     let earliestDate: Date?
2559 2528
     let latestDate: Date?
2560
-    let recordArchiveData: Data
2561
-}
2562
-
2563
-private enum RecordArchivePatch: Sendable {
2564
-    case replacement(HealthRecordValue)
2565
-    case deleted
2566
-}
2567
-
2568
-private enum HealthRecordArchiveReadError: LocalizedError {
2569
-    case missingArchive(typeIdentifier: String, count: Int)
2570
-    case decodeFailed(typeIdentifier: String)
2571
-
2572
-    var errorDescription: String? {
2573
-        switch self {
2574
-        case .missingArchive(let typeIdentifier, let count):
2575
-            return "Missing record archive for \(typeIdentifier) with \(count) records."
2576
-        case .decodeFailed(let typeIdentifier):
2577
-            return "Could not decode previous record archive for \(typeIdentifier)."
2578
-        }
2579
-    }
2529
+    let recordArchiveData: Data?
2580 2530
 }
2581 2531
 
2582 2532
 private final class HealthKitQueryContinuationBox<Value: Sendable>: @unchecked Sendable {
@@ -2682,7 +2632,7 @@ private struct PreviousDistributionState: Sendable {
2682 2632
     }
2683 2633
 
2684 2634
     var canApplyDeltaChanges: Bool {
2685
-        count == 0 || recordArchiveData != nil
2635
+        count == 0 || !contentHash.isEmpty
2686 2636
     }
2687 2637
 
2688 2638
     init(typeCount: TypeCount?, archiveState: HealthArchiveTypeCaptureState?) {
@@ -2861,7 +2811,7 @@ private struct TypeCountFetchResult: Sendable {
2861 2811
 
2862 2812
         if let recordArchiveData {
2863 2813
             typeCount.recordArchiveData = recordArchiveData
2864
-        } else {
2814
+        } else if !records.isEmpty {
2865 2815
             typeCount.setRecordValues(records.map { recordData in
2866 2816
                 HealthRecordValue(
2867 2817
                     typeIdentifier: recordData.typeIdentifier,