Showing 2 changed files with 108 additions and 101 deletions
+17 -14
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -585,7 +585,8 @@ rows exist".
585 585
 | 2026-06-03 | pending | Add delta-event counts to diagnostics. | A full-profile follow-up completed in `47.4s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=115, delta=12, initialImport=0`, `SummedProcessingElapsed: 25.9s`, `SummedInsertElapsed: 0.2s`, and `SummedFinalizeElapsed: 16.0s`. This confirms anchors work and no full import ran. Remaining cost is delta application for large metrics: Heart Rate `23.5s` total (`14.1s` processing, `8.8s` finalize), Active Energy `7.1s`, and Basal Energy `6.0s`. Diagnostics now report aggregate/per-metric `DeltaEvents` so future logs can separate true HealthKit delta size from the final visible record count. |
586 586
 | 2026-06-04 | `a676df1` | Rebuild delta compact archives without large intermediate record arrays. | Follow-up full-profile delta report completed in `42.1s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=108, delta=19, initialImport=0`, and `DeltaEvents: 582`. Compared with the prior `47.4s` run, `SummedProcessingElapsed` dropped `25.9s -> 12.8s`; Heart Rate processing dropped `14.1s -> 6.4s` despite `187` delta events; Active Energy processing dropped `4.9s -> 2.5s` with `100` delta events; Basal Energy processing dropped `4.1s -> 2.0s` with `82` delta events. `SummedFinalizeElapsed` rose `16.0s -> 19.6s`, so the remaining bottleneck is now archive finalization / aggregate rebuild for changed high-volume types, not Swift archive reconstruction. |
587 587
 | 2026-06-04 | `457fd80` | Incrementally replace changed daily aggregate buckets. | Follow-up full-profile delta report completed in `31.2s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=119, delta=8, initialImport=0`, and `DeltaEvents: 22`. Compared with the prior `42.1s` run, `SummedFinalizeElapsed` dropped `19.6s -> 15.5s`, Active Energy finalize dropped `4.8s -> 1.8s`, and total wall clock dropped `42.1s -> 31.2s`. Heart Rate finalize barely moved (`8.9s -> 8.7s`) despite only `5` delta events, proving that daily aggregate replacement helped some changed metrics but Heart Rate is still dominated by type summary `visibleAggregate` full scans. |
588
-| 2026-06-04 | pending | Incrementally update changed type summaries. | Heart Rate still spent `8.7s` in finalize with only `5` delta events because changed verification computed `visibleRecordCount`, range, value sum, and max by scanning all visible Heart Rate rows. Changed metrics now attempt to derive the new type summary from the previous materialized summary plus appeared/removed version deltas, falling back to the full visible scan if a removed row may have been the previous earliest, latest, or max value. Expected signal: Heart Rate finalize time drops sharply when a small recent delta does not remove historical extrema. |
588
+| 2026-06-04 | `2ebfab3` | Incrementally update changed type summaries. | Follow-up full-profile delta report completed in `27.5s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=118, delta=9, initialImport=0`, and `DeltaEvents: 46`. Compared with the prior `31.2s` run, `SummedFinalizeElapsed` dropped `15.5s -> 11.7s`; Heart Rate finalize dropped `8.7s -> 4.8s`; Active Energy finalize stayed bounded at `1.7s`. Remaining cost moved back to delta archive processing: `SummedProcessingElapsed` was `11.6s`, with Heart Rate processing `6.1s`, Active Energy `2.3s`, and Basal Energy `1.9s` for small deltas. |
589
+| 2026-06-04 | pending | Patch compact archives from delta without full record maps. | The previous delta path decoded the whole compact archive into a UUID dictionary before applying a small HealthKit delta, so Heart Rate still paid high allocation/hash-map cost for only `9` delta events. Delta processing now keeps only changed samples/deletions in memory, scans the previous compact archive sequentially, replaces/deletes matching UUIDs, appends new records, and preserves existing per-type hash semantics. Expected signal: lower `SummedProcessingElapsed` and lower Heart Rate processing than the `6.1s` baseline when `DeltaEvents` stays small. |
589 590
 
590 591
 ## Current Diagnosis
591 592
 
@@ -646,15 +647,17 @@ The likely bottleneck is per-row SQLite work:
646 647
   removes avoidable Swift allocations. A larger future optimization would need a
647 648
   deliberate replacement for the legacy per-type fingerprint hash, not an ad hoc
648 649
   switch to SQLite aggregate hashes.
649
-- The `a676df1` streaming archive rebuild achieved the intended processing
650
-  reduction. Do not repeat this experiment unless a regression appears. The next
651
-  high-value target is `markVerification` / daily aggregate rebuild for changed
652
-  high-volume delta types, where Heart Rate still spent `8.9s` finalizing and
653
-  Active Energy spent `4.8s`.
654
-- Incremental daily aggregate replacement helped Active Energy but did not move
655
-  Heart Rate enough. The next bottleneck is the changed-type `visibleAggregate`
656
-  summary query, which still scans all visible rows for the type before daily
657
-  aggregate replacement runs.
650
+- Incremental daily aggregate replacement and type-summary updates achieved the
651
+  intended finalization reduction. Do not repeat those experiments unless a
652
+  regression appears. The latest measured bottleneck is again delta archive
653
+  processing: Heart Rate spent `6.1s` processing only `9` delta events because
654
+  the compact archive was still materialized into a large UUID dictionary before
655
+  being rebuilt.
656
+- The current compact-archive patch keeps checksum/hash semantics unchanged and
657
+  only removes avoidable Swift dictionary materialization. If processing remains
658
+  high after this patch, the next larger step would be a deliberate redesign of
659
+  the legacy per-type compact archive/hash maintenance, not more HealthKit page
660
+  tuning.
658 661
 
659 662
 ## Open Issues / Observations
660 663
 
@@ -693,10 +696,10 @@ Prioritize experiments in this order:
693 696
    `CaptureModes`, and high-volume type timings. Treat stable checksum, a high
694 697
    `unchangedDelta` count, and zero processing/insert as the main unchanged-path
695 698
    signal.
696
-4. Run a full-profile repeated capture after incremental type-summary updates.
697
-   Compare `SummedFinalizeElapsed`, Heart Rate finalize time, and `DeltaEvents`.
698
-   Expected success is Heart Rate finalize below the previous `8.7s` when its
699
-   delta does not remove the previous earliest/latest/max row.
699
+4. Run a full-profile repeated capture after compact-delta archive patching.
700
+   Compare `SummedProcessingElapsed`, Heart Rate processing time, and
701
+   `DeltaEvents`. Expected success is Heart Rate processing below the previous
702
+   `6.1s` baseline when its delta remains small.
700 703
 5. Keep using `DeltaEvents` to quantify changed high-volume metrics, especially
701 704
    Heart Rate, Active Energy, and Basal Energy. If delta events are small while
702 705
    finalize remains large, optimize aggregate rebuild/finalization rather than
+91 -87
HealthProbe/Services/HealthKitService.swift
@@ -1198,14 +1198,15 @@ final class HealthKitService {
1198 1198
             )
1199 1199
         }
1200 1200
 
1201
-        let recordMapStartedAt = Date()
1202
-        var recordMap = startedFromAnchor ? try previousDistribution.makeRecordMap() : [:]
1203
-        captureTimings.processingElapsedSeconds += Date().timeIntervalSince(recordMapStartedAt)
1204 1201
         var shouldFetchNextPage = true
1202
+        var deltaPages: [SampleDistributionPage] = []
1203
+        deltaPages.reserveCapacity(1)
1204
+        var estimatedRecordCount = previousDistribution.count
1205 1205
 
1206 1206
         if let firstDeltaPage {
1207 1207
             let firstPageApplyStartedAt = Date()
1208
-            applyDistributionPage(firstDeltaPage, sampleType: sampleType, to: &recordMap)
1208
+            deltaPages.append(firstDeltaPage)
1209
+            estimatedRecordCount += firstDeltaPage.samples.count - firstDeltaPage.deletedObjects.count
1209 1210
             captureTimings.processingElapsedSeconds += Date().timeIntervalSince(firstPageApplyStartedAt)
1210 1211
             processedEventCount += pageEventCount(firstDeltaPage)
1211 1212
             shouldFetchNextPage = firstDeltaPage.samples.count + firstDeltaPage.deletedObjects.count >= incrementalStrategy.queryPageLimit
@@ -1216,7 +1217,7 @@ final class HealthKitService {
1216 1217
                     pageNumber: pageNumber,
1217 1218
                     estimatedPageCount: estimatedPageCount
1218 1219
                 ),
1219
-                recordCount: recordMap.count,
1220
+                recordCount: max(0, estimatedRecordCount),
1220 1221
                 elapsedSeconds: Date().timeIntervalSince(progressStarted),
1221 1222
                 samplesPerSecond: Self.samplesPerSecond(
1222 1223
                     processedCount: processedEventCount,
@@ -1234,7 +1235,7 @@ final class HealthKitService {
1234 1235
                     pageNumber: pageNumber,
1235 1236
                     estimatedPageCount: anchor == nil ? nil : estimatedPageCount
1236 1237
                 ),
1237
-                recordCount: recordMap.count,
1238
+                recordCount: max(0, estimatedRecordCount),
1238 1239
                 elapsedSeconds: Date().timeIntervalSince(progressStarted),
1239 1240
                 samplesPerSecond: Self.samplesPerSecond(
1240 1241
                     processedCount: processedEventCount,
@@ -1259,7 +1260,7 @@ final class HealthKitService {
1259 1260
                 progress: progress,
1260 1261
                 typeIdentifier: typeIdentifier,
1261 1262
                 progressDetail: "Persisting delta page \(pageNumber)",
1262
-                recordCount: recordMap.count,
1263
+                recordCount: max(0, estimatedRecordCount),
1263 1264
                 progressStarted: progressStarted,
1264 1265
                 processedEventCount: processedEventCount,
1265 1266
                 persistenceState: persistenceState
@@ -1269,7 +1270,8 @@ final class HealthKitService {
1269 1270
             anchor = page.anchor
1270 1271
 
1271 1272
             let applyStartedAt = Date()
1272
-            applyDistributionPage(page, sampleType: sampleType, to: &recordMap)
1273
+            deltaPages.append(page)
1274
+            estimatedRecordCount += page.samples.count - page.deletedObjects.count
1273 1275
             captureTimings.processingElapsedSeconds += Date().timeIntervalSince(applyStartedAt)
1274 1276
             processedEventCount += pageEventCount(page)
1275 1277
             shouldFetchNextPage = page.samples.count + page.deletedObjects.count >= incrementalStrategy.queryPageLimit
@@ -1280,7 +1282,7 @@ final class HealthKitService {
1280 1282
                     pageNumber: pageNumber,
1281 1283
                     estimatedPageCount: anchor == nil ? nil : estimatedPageCount
1282 1284
                 ),
1283
-                recordCount: recordMap.count,
1285
+                recordCount: max(0, estimatedRecordCount),
1284 1286
                 elapsedSeconds: Date().timeIntervalSince(progressStarted),
1285 1287
                 samplesPerSecond: Self.samplesPerSecond(
1286 1288
                     processedCount: processedEventCount,
@@ -1289,23 +1291,25 @@ final class HealthKitService {
1289 1291
             )
1290 1292
         }
1291 1293
 
1294
+        let archiveRebuildStartedAt = Date()
1295
+        let rebuiltArchive = try Self.rebuildRecordArchive(
1296
+            typeIdentifier: typeIdentifier,
1297
+            previousDistribution: previousDistribution,
1298
+            sampleType: sampleType,
1299
+            deltaPages: deltaPages
1300
+        )
1301
+        captureTimings.processingElapsedSeconds += Date().timeIntervalSince(archiveRebuildStartedAt)
1302
+
1292 1303
         captureTimings.finalizeElapsedSeconds += try await finalizeArchiveVerification(
1293 1304
             sampleType: sampleType,
1294 1305
             typeIdentifier: typeIdentifier,
1295
-            recordCount: recordMap.count,
1306
+            recordCount: rebuiltArchive.count,
1296 1307
             progressStarted: progressStarted,
1297 1308
             processedEventCount: processedEventCount,
1298 1309
             archiveObservationID: archiveObservationID,
1299 1310
             progress: progress
1300 1311
         )
1301 1312
 
1302
-        let archiveRebuildStartedAt = Date()
1303
-        let rebuiltArchive = Self.rebuildRecordArchive(
1304
-            typeIdentifier: typeIdentifier,
1305
-            recordMap: recordMap
1306
-        )
1307
-        captureTimings.processingElapsedSeconds += Date().timeIntervalSince(archiveRebuildStartedAt)
1308
-
1309 1313
         progress?.updateBlockProgress(
1310 1314
             typeIdentifier,
1311 1315
             detail: pageNumber == 1 ? "Imported 1 page" : "Imported \(pageNumber) pages",
@@ -1353,35 +1357,82 @@ final class HealthKitService {
1353 1357
 
1354 1358
     private static func rebuildRecordArchive(
1355 1359
         typeIdentifier: String,
1356
-        recordMap: [String: SampleRecordPayload]
1357
-    ) -> RebuiltRecordArchive {
1360
+        previousDistribution: PreviousDistributionState,
1361
+        sampleType: HKSampleType,
1362
+        deltaPages: [SampleDistributionPage]
1363
+    ) throws -> RebuiltRecordArchive {
1364
+        let deltaSampleCount = deltaPages.reduce(0) { $0 + $1.samples.count }
1365
+        let deltaDeletedCount = deltaPages.reduce(0) { $0 + $1.deletedObjects.count }
1366
+        var patches: [String: RecordArchivePatch] = [:]
1367
+        patches.reserveCapacity(deltaSampleCount + deltaDeletedCount)
1368
+        var replacementOrder: [String] = []
1369
+        replacementOrder.reserveCapacity(deltaSampleCount)
1370
+
1371
+        for page in deltaPages {
1372
+            for deletedObject in page.deletedObjects {
1373
+                patches[HashService.sampleUUIDHash(deletedObject.uuid.uuidString)] = .deleted
1374
+            }
1375
+
1376
+            for sample in page.samples {
1377
+                let value = recordValue(for: sample, sampleType: sampleType, typeIdentifier: typeIdentifier)
1378
+                if case .replacement = patches[value.sampleUUIDHash] {
1379
+                    // Keep the first replacement order for deterministic archive output.
1380
+                } else {
1381
+                    replacementOrder.append(value.sampleUUIDHash)
1382
+                }
1383
+                patches[value.sampleUUIDHash] = .replacement(value)
1384
+            }
1385
+        }
1386
+
1358 1387
         var writer = HealthRecordArchive.makeCompactWriter(
1359 1388
             typeIdentifier: typeIdentifier,
1360
-            estimatedRecordCount: recordMap.count
1389
+            estimatedRecordCount: max(0, previousDistribution.count + deltaSampleCount - deltaDeletedCount)
1361 1390
         )
1362 1391
         var recordFingerprints: [String] = []
1363
-        recordFingerprints.reserveCapacity(recordMap.count)
1392
+        recordFingerprints.reserveCapacity(max(0, previousDistribution.count + deltaSampleCount - deltaDeletedCount))
1364 1393
         var earliestDate: Date?
1365 1394
         var latestDate: Date?
1395
+        var recordCount = 0
1366 1396
 
1367
-        for (sampleUUIDHash, record) in recordMap {
1368
-            recordFingerprints.append(record.recordFingerprint)
1369
-            earliestDate = min(earliestDate ?? record.startDate, record.startDate)
1370
-            latestDate = max(latestDate ?? record.endDate, record.endDate)
1371
-            writer.append(
1372
-                HealthRecordValue(
1373
-                    typeIdentifier: typeIdentifier,
1374
-                    sampleUUIDHash: sampleUUIDHash,
1375
-                    recordFingerprint: record.recordFingerprint,
1376
-                    startDate: record.startDate,
1377
-                    endDate: record.endDate,
1378
-                    displayValue: record.displayValue
1379
-                )
1397
+        func append(_ value: HealthRecordValue) {
1398
+            recordFingerprints.append(value.recordFingerprint)
1399
+            earliestDate = min(earliestDate ?? value.startDate, value.startDate)
1400
+            latestDate = max(latestDate ?? value.endDate, value.endDate)
1401
+            writer.append(value)
1402
+            recordCount += 1
1403
+        }
1404
+
1405
+        if let recordArchiveData = previousDistribution.recordArchiveData {
1406
+            let didRead = HealthRecordArchive.forEachRecord(in: recordArchiveData) { record in
1407
+                if let patch = patches.removeValue(forKey: record.sampleUUIDHash) {
1408
+                    switch patch {
1409
+                    case .replacement(let replacement):
1410
+                        append(replacement)
1411
+                    case .deleted:
1412
+                        break
1413
+                    }
1414
+                } else {
1415
+                    append(record)
1416
+                }
1417
+            }
1418
+            guard didRead else {
1419
+                throw HealthRecordArchiveReadError.decodeFailed(typeIdentifier: previousDistribution.typeIdentifier)
1420
+            }
1421
+        } else if previousDistribution.count > 0 {
1422
+            throw HealthRecordArchiveReadError.missingArchive(
1423
+                typeIdentifier: previousDistribution.typeIdentifier,
1424
+                count: previousDistribution.count
1380 1425
             )
1381 1426
         }
1382 1427
 
1428
+        for sampleUUIDHash in replacementOrder {
1429
+            if case .replacement(let value) = patches.removeValue(forKey: sampleUUIDHash) {
1430
+                append(value)
1431
+            }
1432
+        }
1433
+
1383 1434
         return RebuiltRecordArchive(
1384
-            count: recordMap.count,
1435
+            count: recordCount,
1385 1436
             contentHash: HashService.typeHash(
1386 1437
                 typeIdentifier: typeIdentifier,
1387 1438
                 recordFingerprints: recordFingerprints
@@ -1589,27 +1640,6 @@ final class HealthKitService {
1589 1640
         )
1590 1641
     }
1591 1642
 
1592
-    private func applyDistributionPage(
1593
-        _ page: SampleDistributionPage,
1594
-        sampleType: HKSampleType,
1595
-        to recordMap: inout [String: SampleRecordPayload]
1596
-    ) {
1597
-        for deletedObject in page.deletedObjects {
1598
-            recordMap.removeValue(forKey: HashService.sampleUUIDHash(deletedObject.uuid.uuidString))
1599
-        }
1600
-
1601
-        for sample in page.samples {
1602
-            let value = Self.recordValue(for: sample, sampleType: sampleType, typeIdentifier: sampleType.identifier)
1603
-            let uuidHash = value.sampleUUIDHash
1604
-            recordMap[uuidHash] = SampleRecordPayload(
1605
-                recordFingerprint: value.recordFingerprint,
1606
-                startDate: value.startDate,
1607
-                endDate: value.endDate,
1608
-                displayValue: value.displayValue
1609
-            )
1610
-        }
1611
-    }
1612
-
1613 1643
     private static func recordValue(
1614 1644
         for sample: HKSample,
1615 1645
         sampleType: HKSampleType,
@@ -2511,13 +2541,6 @@ private struct SampleDistributionPage: Sendable {
2511 2541
     let anchor: HKQueryAnchor?
2512 2542
 }
2513 2543
 
2514
-private struct SampleRecordPayload: Sendable {
2515
-    let recordFingerprint: String
2516
-    let startDate: Date
2517
-    let endDate: Date
2518
-    let displayValue: String?
2519
-}
2520
-
2521 2544
 private struct RebuiltRecordArchive: Sendable {
2522 2545
     let count: Int
2523 2546
     let contentHash: String
@@ -2526,6 +2549,11 @@ private struct RebuiltRecordArchive: Sendable {
2526 2549
     let recordArchiveData: Data
2527 2550
 }
2528 2551
 
2552
+private enum RecordArchivePatch: Sendable {
2553
+    case replacement(HealthRecordValue)
2554
+    case deleted
2555
+}
2556
+
2529 2557
 private enum HealthRecordArchiveReadError: LocalizedError {
2530 2558
     case missingArchive(typeIdentifier: String, count: Int)
2531 2559
     case decodeFailed(typeIdentifier: String)
@@ -2723,30 +2751,6 @@ private struct PreviousDistributionState: Sendable {
2723 2751
         )
2724 2752
     }
2725 2753
 
2726
-    func makeRecordMap() throws -> [String: SampleRecordPayload] {
2727
-        guard let recordArchiveData else {
2728
-            if count > 0 {
2729
-                throw HealthRecordArchiveReadError.missingArchive(typeIdentifier: typeIdentifier, count: count)
2730
-            }
2731
-            return [:]
2732
-        }
2733
-
2734
-        var recordMap: [String: SampleRecordPayload] = [:]
2735
-        recordMap.reserveCapacity(count)
2736
-        let didRead = HealthRecordArchive.forEachRecord(in: recordArchiveData) { record in
2737
-            recordMap[record.sampleUUIDHash] = SampleRecordPayload(
2738
-                recordFingerprint: record.recordFingerprint,
2739
-                startDate: record.startDate,
2740
-                endDate: record.endDate,
2741
-                displayValue: record.displayValue
2742
-            )
2743
-        }
2744
-        guard didRead else {
2745
-            throw HealthRecordArchiveReadError.decodeFailed(typeIdentifier: typeIdentifier)
2746
-        }
2747
-        return recordMap
2748
-    }
2749
-
2750 2754
     private nonisolated static func archiveAnchor(_ anchor: HKQueryAnchor) -> Data? {
2751 2755
         try? NSKeyedArchiver.archivedData(withRootObject: anchor, requiringSecureCoding: true)
2752 2756
     }