@@ -583,6 +583,7 @@ rows exist". |
||
| 583 | 583 |
| 2026-06-03 | pending | Clarify capture mode in record-import diagnostics. | Two full-profile repeated snapshots after SQLite capture-state persistence completed successfully with stable checksum and identical record count (`2,646,613`). The first ran in `6.3s` with `SummedProcessingElapsed: 0.0s`, `SummedInsertElapsed: 0.0s`, `SummedFinalizeElapsed: 2.6s`; the second ran in `6.4s` with `0.0s` processing/insert and `2.7s` finalize. Heart Rate `922,526` and Active Energy `346,478` each completed around `0.2s`, proving the heavy full reimport path was avoided. The report wording still said "`N` samples in 1 anchored segment", which is ambiguous for inherited unchanged summaries; diagnostics now label unchanged empty-delta, delta-apply, and full-import modes explicitly. | |
| 584 | 584 |
| 2026-06-03 | pending | Add capture-mode summary to diagnostics. | Repeated full-profile captures rarely produce a perfect no-delta report because at least one metric can change between manual runs. Diagnostic reports now include aggregate `CaptureModes` counts plus per-metric `captureMode`, so comparisons can separate unchanged empty-delta metrics from delta-applied metrics and full imports without manually reading every `record_import` line. Expected signal: stable checksum plus high `unchangedDelta` count and zero summed processing/insert confirms the fast path even when a few metrics changed. | |
| 585 | 585 |
| 2026-06-03 | pending | Add delta-event counts to diagnostics. | A full-profile follow-up completed in `47.4s` with `127/127` complete, `0` degraded, `CaptureModes: unchangedDelta=115, delta=12, initialImport=0`, `SummedProcessingElapsed: 25.9s`, `SummedInsertElapsed: 0.2s`, and `SummedFinalizeElapsed: 16.0s`. This confirms anchors work and no full import ran. Remaining cost is delta application for large metrics: Heart Rate `23.5s` total (`14.1s` processing, `8.8s` finalize), Active Energy `7.1s`, and Basal Energy `6.0s`. Diagnostics now report aggregate/per-metric `DeltaEvents` so future logs can separate true HealthKit delta size from the final visible record count. | |
| 586 |
+| 2026-06-04 | pending | Rebuild delta compact archives without large intermediate record arrays. | Delta captures still need the legacy per-type content hash and compact record archive until the SwiftData bridge is fully retired. The delta path now streams the previous compact archive into the UUID map without first decoding a `[HealthRecordValue]`, and rebuilds the new compact archive/hash directly from the map instead of sorting and materializing a second large record array. Expected signal: lower `processingElapsed` for high-volume delta metrics such as Heart Rate, Active Energy, and Basal Energy while `snapshotChecksum` remains stable for equivalent content. | |
|
| 586 | 587 |
|
| 587 | 588 |
## Current Diagnosis |
| 588 | 589 |
|
@@ -639,6 +640,10 @@ The likely bottleneck is per-row SQLite work: |
||
| 639 | 640 |
reconstruction of legacy compact record archives and type hashes. Future logs |
| 640 | 641 |
should compare `DeltaEvents` against processing/finalize time before changing |
| 641 | 642 |
page sizes or HealthKit query strategy. |
| 643 |
+- The current delta optimization keeps checksum semantics unchanged and only |
|
| 644 |
+ removes avoidable Swift allocations. A larger future optimization would need a |
|
| 645 |
+ deliberate replacement for the legacy per-type fingerprint hash, not an ad hoc |
|
| 646 |
+ switch to SQLite aggregate hashes. |
|
| 642 | 647 |
|
| 643 | 648 |
## Open Issues / Observations |
| 644 | 649 |
|
@@ -681,17 +686,20 @@ Prioritize experiments in this order: |
||
| 681 | 686 |
Rate, Active Energy, and Basal Energy. If delta events are small while |
| 682 | 687 |
processing/finalize remain large, optimize legacy compact archive/hash |
| 683 | 688 |
reconstruction rather than HealthKit fetch or SQLite insert. |
| 684 |
-5. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`. |
|
| 685 |
-6. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans. |
|
| 686 |
-7. Profile whether index maintenance dominates first-import insert cost. |
|
| 687 |
-8. Consider a guarded bulk-import mode for first observations: |
|
| 689 |
+5. Compare the next full-profile delta report against the 2026-06-03 `47.4s` |
|
| 690 |
+ run: Heart Rate processing `14.1s`, Active Energy processing `4.9s`, Basal |
|
| 691 |
+ Energy processing `4.1s`, and `SummedProcessingElapsed: 25.9s`. |
|
| 692 |
+6. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`. |
|
| 693 |
+7. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans. |
|
| 694 |
+8. Profile whether index maintenance dominates first-import insert cost. |
|
| 695 |
+9. Consider a guarded bulk-import mode for first observations: |
|
| 688 | 696 |
- keep archive semantics unchanged; |
| 689 | 697 |
- only relax work that can be safely reconstructed or validated; |
| 690 | 698 |
- re-enable normal idempotent paths for incremental observations. |
| 691 |
-9. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 692 |
-10. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 693 |
-11. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 694 |
-12. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 699 |
+10. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 700 |
+11. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 701 |
+12. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 702 |
+13. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 695 | 703 |
|
| 696 | 704 |
## Verification Checklist For Each Optimization |
| 697 | 705 |
|
@@ -1299,45 +1299,20 @@ final class HealthKitService {
|
||
| 1299 | 1299 |
progress: progress |
| 1300 | 1300 |
) |
| 1301 | 1301 |
|
| 1302 |
- let sortedRecordsStartedAt = Date() |
|
| 1303 |
- let sortedKeys = recordMap.keys.sorted {
|
|
| 1304 |
- guard let left = recordMap[$0], |
|
| 1305 |
- let right = recordMap[$1] else {
|
|
| 1306 |
- return $0 < $1 |
|
| 1307 |
- } |
|
| 1308 |
- if left.startDate != right.startDate {
|
|
| 1309 |
- return left.startDate < right.startDate |
|
| 1310 |
- } |
|
| 1311 |
- return left.recordFingerprint < right.recordFingerprint |
|
| 1312 |
- } |
|
| 1313 |
- var sortedRecords: [HealthRecordValue] = [] |
|
| 1314 |
- sortedRecords.reserveCapacity(sortedKeys.count) |
|
| 1315 |
- for sampleUUIDHash in sortedKeys {
|
|
| 1316 |
- guard let record = recordMap[sampleUUIDHash] else { continue }
|
|
| 1317 |
- sortedRecords.append( |
|
| 1318 |
- HealthRecordValue( |
|
| 1319 |
- typeIdentifier: typeIdentifier, |
|
| 1320 |
- sampleUUIDHash: sampleUUIDHash, |
|
| 1321 |
- recordFingerprint: record.recordFingerprint, |
|
| 1322 |
- startDate: record.startDate, |
|
| 1323 |
- endDate: record.endDate, |
|
| 1324 |
- displayValue: record.displayValue |
|
| 1325 |
- ) |
|
| 1326 |
- ) |
|
| 1327 |
- } |
|
| 1328 |
- let contentHash = HashService.typeHash( |
|
| 1302 |
+ let archiveRebuildStartedAt = Date() |
|
| 1303 |
+ let rebuiltArchive = Self.rebuildRecordArchive( |
|
| 1329 | 1304 |
typeIdentifier: typeIdentifier, |
| 1330 |
- recordFingerprints: sortedRecords.map(\.recordFingerprint) |
|
| 1305 |
+ recordMap: recordMap |
|
| 1331 | 1306 |
) |
| 1332 |
- captureTimings.processingElapsedSeconds += Date().timeIntervalSince(sortedRecordsStartedAt) |
|
| 1307 |
+ captureTimings.processingElapsedSeconds += Date().timeIntervalSince(archiveRebuildStartedAt) |
|
| 1333 | 1308 |
|
| 1334 | 1309 |
progress?.updateBlockProgress( |
| 1335 | 1310 |
typeIdentifier, |
| 1336 | 1311 |
detail: pageNumber == 1 ? "Imported 1 page" : "Imported \(pageNumber) pages", |
| 1337 |
- recordCount: sortedRecords.count |
|
| 1312 |
+ recordCount: rebuiltArchive.count |
|
| 1338 | 1313 |
) |
| 1339 | 1314 |
|
| 1340 |
- guard !sortedRecords.isEmpty || anchor != nil else {
|
|
| 1315 |
+ guard rebuiltArchive.count > 0 || anchor != nil else {
|
|
| 1341 | 1316 |
return SampleDistribution( |
| 1342 | 1317 |
totalCount: 0, |
| 1343 | 1318 |
bins: [], |
@@ -1351,31 +1326,72 @@ final class HealthKitService {
|
||
| 1351 | 1326 |
) |
| 1352 | 1327 |
} |
| 1353 | 1328 |
|
| 1354 |
- let binStart = earliestDate ?? sortedRecords.first?.startDate ?? previousDistribution.earliestRecordDate ?? Date() |
|
| 1355 |
- let rawBinEnd = latestDate ?? sortedRecords.last?.endDate ?? previousDistribution.latestRecordDate ?? binStart |
|
| 1329 |
+ let binStart = earliestDate ?? rebuiltArchive.earliestDate ?? previousDistribution.earliestRecordDate ?? Date() |
|
| 1330 |
+ let rawBinEnd = latestDate ?? rebuiltArchive.latestDate ?? previousDistribution.latestRecordDate ?? binStart |
|
| 1356 | 1331 |
let binEnd = rawBinEnd > binStart ? rawBinEnd : binStart.addingTimeInterval(1) |
| 1357 | 1332 |
|
| 1358 | 1333 |
return SampleDistribution( |
| 1359 |
- totalCount: sortedRecords.count, |
|
| 1334 |
+ totalCount: rebuiltArchive.count, |
|
| 1360 | 1335 |
bins: [ |
| 1361 | 1336 |
SampleDistribution.Bin( |
| 1362 | 1337 |
start: binStart, |
| 1363 | 1338 |
end: binEnd, |
| 1364 |
- count: sortedRecords.count, |
|
| 1365 |
- contentHash: contentHash, |
|
| 1339 |
+ count: rebuiltArchive.count, |
|
| 1340 |
+ contentHash: rebuiltArchive.contentHash, |
|
| 1366 | 1341 |
anchorData: anchor.flatMap(Self.archiveAnchor(_:)) |
| 1367 | 1342 |
) |
| 1368 | 1343 |
], |
| 1369 |
- records: sortedRecords, |
|
| 1370 |
- contentHash: contentHash, |
|
| 1344 |
+ records: [], |
|
| 1345 |
+ contentHash: rebuiltArchive.contentHash, |
|
| 1371 | 1346 |
yearlyCounts: nil, |
| 1372 |
- recordArchiveData: nil, |
|
| 1347 |
+ recordArchiveData: rebuiltArchive.recordArchiveData, |
|
| 1373 | 1348 |
captureMode: .delta, |
| 1374 | 1349 |
deltaEventCount: processedEventCount, |
| 1375 | 1350 |
timingBreakdown: captureTimings.importBreakdown |
| 1376 | 1351 |
) |
| 1377 | 1352 |
} |
| 1378 | 1353 |
|
| 1354 |
+ private static func rebuildRecordArchive( |
|
| 1355 |
+ typeIdentifier: String, |
|
| 1356 |
+ recordMap: [String: SampleRecordPayload] |
|
| 1357 |
+ ) -> RebuiltRecordArchive {
|
|
| 1358 |
+ var writer = HealthRecordArchive.makeCompactWriter( |
|
| 1359 |
+ typeIdentifier: typeIdentifier, |
|
| 1360 |
+ estimatedRecordCount: recordMap.count |
|
| 1361 |
+ ) |
|
| 1362 |
+ var recordFingerprints: [String] = [] |
|
| 1363 |
+ recordFingerprints.reserveCapacity(recordMap.count) |
|
| 1364 |
+ var earliestDate: Date? |
|
| 1365 |
+ var latestDate: Date? |
|
| 1366 |
+ |
|
| 1367 |
+ for (sampleUUIDHash, record) in recordMap {
|
|
| 1368 |
+ recordFingerprints.append(record.recordFingerprint) |
|
| 1369 |
+ earliestDate = min(earliestDate ?? record.startDate, record.startDate) |
|
| 1370 |
+ latestDate = max(latestDate ?? record.endDate, record.endDate) |
|
| 1371 |
+ writer.append( |
|
| 1372 |
+ HealthRecordValue( |
|
| 1373 |
+ typeIdentifier: typeIdentifier, |
|
| 1374 |
+ sampleUUIDHash: sampleUUIDHash, |
|
| 1375 |
+ recordFingerprint: record.recordFingerprint, |
|
| 1376 |
+ startDate: record.startDate, |
|
| 1377 |
+ endDate: record.endDate, |
|
| 1378 |
+ displayValue: record.displayValue |
|
| 1379 |
+ ) |
|
| 1380 |
+ ) |
|
| 1381 |
+ } |
|
| 1382 |
+ |
|
| 1383 |
+ return RebuiltRecordArchive( |
|
| 1384 |
+ count: recordMap.count, |
|
| 1385 |
+ contentHash: HashService.typeHash( |
|
| 1386 |
+ typeIdentifier: typeIdentifier, |
|
| 1387 |
+ recordFingerprints: recordFingerprints |
|
| 1388 |
+ ), |
|
| 1389 |
+ earliestDate: earliestDate, |
|
| 1390 |
+ latestDate: latestDate, |
|
| 1391 |
+ recordArchiveData: writer.finalize() |
|
| 1392 |
+ ) |
|
| 1393 |
+ } |
|
| 1394 |
+ |
|
| 1379 | 1395 |
private func fetchInitialDistributionStreaming( |
| 1380 | 1396 |
for sampleType: HKSampleType, |
| 1381 | 1397 |
typeIdentifier: String, |
@@ -2502,6 +2518,14 @@ private struct SampleRecordPayload: Sendable {
|
||
| 2502 | 2518 |
let displayValue: String? |
| 2503 | 2519 |
} |
| 2504 | 2520 |
|
| 2521 |
+private struct RebuiltRecordArchive: Sendable {
|
|
| 2522 |
+ let count: Int |
|
| 2523 |
+ let contentHash: String |
|
| 2524 |
+ let earliestDate: Date? |
|
| 2525 |
+ let latestDate: Date? |
|
| 2526 |
+ let recordArchiveData: Data |
|
| 2527 |
+} |
|
| 2528 |
+ |
|
| 2505 | 2529 |
private enum HealthRecordArchiveReadError: LocalizedError {
|
| 2506 | 2530 |
case missingArchive(typeIdentifier: String, count: Int) |
| 2507 | 2531 |
case decodeFailed(typeIdentifier: String) |
@@ -2707,13 +2731,9 @@ private struct PreviousDistributionState: Sendable {
|
||
| 2707 | 2731 |
return [:] |
| 2708 | 2732 |
} |
| 2709 | 2733 |
|
| 2710 |
- guard let records = HealthRecordArchive.decode(recordArchiveData) else {
|
|
| 2711 |
- throw HealthRecordArchiveReadError.decodeFailed(typeIdentifier: typeIdentifier) |
|
| 2712 |
- } |
|
| 2713 |
- |
|
| 2714 | 2734 |
var recordMap: [String: SampleRecordPayload] = [:] |
| 2715 |
- recordMap.reserveCapacity(records.count) |
|
| 2716 |
- for record in records {
|
|
| 2735 |
+ recordMap.reserveCapacity(count) |
|
| 2736 |
+ let didRead = HealthRecordArchive.forEachRecord(in: recordArchiveData) { record in
|
|
| 2717 | 2737 |
recordMap[record.sampleUUIDHash] = SampleRecordPayload( |
| 2718 | 2738 |
recordFingerprint: record.recordFingerprint, |
| 2719 | 2739 |
startDate: record.startDate, |
@@ -2721,6 +2741,9 @@ private struct PreviousDistributionState: Sendable {
|
||
| 2721 | 2741 |
displayValue: record.displayValue |
| 2722 | 2742 |
) |
| 2723 | 2743 |
} |
| 2744 |
+ guard didRead else {
|
|
| 2745 |
+ throw HealthRecordArchiveReadError.decodeFailed(typeIdentifier: typeIdentifier) |
|
| 2746 |
+ } |
|
| 2724 | 2747 |
return recordMap |
| 2725 | 2748 |
} |
| 2726 | 2749 |
|