@@ -142,6 +142,41 @@ Conclusion: direct inserts for brand-new dependent rows produced a valid but |
||
| 142 | 142 |
modest first-import gain. The large reimport improvement was not representative |
| 143 | 143 |
of a clean first snapshot. SQLite insert remains the dominant bottleneck. |
| 144 | 144 |
|
| 145 |
+### 2026-06-02 Non-Chain-Start Full Scan After Index Removal |
|
| 146 |
+ |
|
| 147 |
+Commit context: after `ff59257` (`Drop unused sample import indexes`) |
|
| 148 |
+Source: user-provided diagnostic report with `previousSnapshotID` present and |
|
| 149 |
+`isChainStart: false`. |
|
| 150 |
+ |
|
| 151 |
+This is not a comparable first-import benchmark for the unused-index removal, |
|
| 152 |
+but it is important because it shows that non-initial captures can be slower |
|
| 153 |
+than first imports when the app performs a full-history scan. |
|
| 154 |
+ |
|
| 155 |
+| Metric | Value | |
|
| 156 |
+|--------|-------| |
|
| 157 |
+| Wall clock | 22m 33s | |
|
| 158 |
+| Summed metric total | 22m 14s | |
|
| 159 |
+| Summed fetch | 52.0s | |
|
| 160 |
+| Summed processing | 2m 23s | |
|
| 161 |
+| Summed insert | 18m 44s | |
|
| 162 |
+| Summed finalize | 11.5s | |
|
| 163 |
+| Heart Rate count | 922,440 | |
|
| 164 |
+| Heart Rate total | 13m 30s | |
|
| 165 |
+| Heart Rate fetch | 24.3s | |
|
| 166 |
+| Heart Rate processing | 1m 29s | |
|
| 167 |
+| Heart Rate insert | 11m 25s | |
|
| 168 |
+| Active Energy count | 348,698 | |
|
| 169 |
+| Active Energy insert | 4m 44s | |
|
| 170 |
+| Steps insert | 40.4s | |
|
| 171 |
+| Walking + Running Distance insert | 36.0s | |
|
| 172 |
+ |
|
| 173 |
+Conclusion: this run should not be used to judge first-import index removal. |
|
| 174 |
+However, it indicates a separate bottleneck: subsequent full scans still spend |
|
| 175 |
+most of their time in SQLite writes, likely because unchanged samples are still |
|
| 176 |
+touching the archive write path. The next implementation target should reduce |
|
| 177 |
+per-sample work for unchanged existing samples during verification/full-scan |
|
| 178 |
+captures. |
|
| 179 |
+ |
|
| 145 | 180 |
## Optimization Iterations |
| 146 | 181 |
|
| 147 | 182 |
| Date | Commit | Change | Result / Status | |
@@ -159,6 +194,8 @@ of a clean first snapshot. SQLite insert remains the dominant bottleneck. |
||
| 159 | 194 |
| 2026-06-02 | `c138b7b` | Increased initial import write chunk sizes. | Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. | |
| 160 | 195 |
| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Confirmed modest first-import gain: wall clock 18m30s -> 17m13s, summed insert 15m24s -> 14m38s, Heart Rate insert 9m58s -> 8m59s. | |
| 161 | 196 |
| 2026-06-02 | `ff59257` | Removed unused `samples` indexes on global UUID hash and semantic fingerprint. | Awaiting comparable first-import report. Expected signal is lower `SummedInsertElapsed`; deleted-object lookup remains covered by `(sample_type_id, sample_uuid_hash)`. | |
| 197 |
+| 2026-06-02 | pending | Captured non-chain-start full-scan report after index removal. | Not comparable for first-import performance; reveals a separate full-scan/unchanged-sample write bottleneck. | |
|
| 198 |
+| 2026-06-02 | pending | Stopped writing `verified` observation events for unchanged existing samples. | Awaiting comparable non-chain-start/full-scan report. Expected signal is lower `SummedInsertElapsed` and especially lower Heart Rate insert time when most rows are unchanged. | |
|
| 162 | 199 |
|
| 163 | 200 |
## Current Diagnosis |
| 164 | 201 |
|
@@ -183,21 +220,24 @@ The likely bottleneck is per-row SQLite work: |
||
| 183 | 220 |
- A previous Heart Rate import appeared to stall for long periods around roughly 900k records, but later progress resumed; avoid classifying this as a hard timeout without report evidence. |
| 184 | 221 |
- After a completed import, the app may remain unresponsive for more than one minute. This needs separate timing around post-import cache rebuild, UI refresh, report generation, and main-thread work. |
| 185 | 222 |
- Partial / old imported observations can pollute comparisons. Fresh first-snapshot performance comparisons should use a confirmed reset database. |
| 223 |
+- Non-chain-start full scans can be slower than first imports if unchanged existing samples still write per-sample archive evidence. |
|
| 186 | 224 |
|
| 187 | 225 |
## Next Experiments |
| 188 | 226 |
|
| 189 | 227 |
Prioritize experiments in this order: |
| 190 | 228 |
|
| 191 | 229 |
1. Add explicit post-import timings if the app is still unresponsive after the operation reports success. |
| 192 |
-2. Profile whether index maintenance dominates first-import insert cost. |
|
| 193 |
-3. Consider a guarded bulk-import mode for first observations: |
|
| 230 |
+2. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`. |
|
| 231 |
+3. Reduce remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans, especially open visibility-range existence checks. |
|
| 232 |
+4. Profile whether index maintenance dominates first-import insert cost. |
|
| 233 |
+5. Consider a guarded bulk-import mode for first observations: |
|
| 194 | 234 |
- keep archive semantics unchanged; |
| 195 | 235 |
- only relax work that can be safely reconstructed or validated; |
| 196 | 236 |
- re-enable normal idempotent paths for incremental observations. |
| 197 |
-4. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 198 |
-5. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 199 |
-6. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 200 |
-7. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 237 |
+6. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 238 |
+7. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 239 |
+8. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 240 |
+9. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 201 | 241 |
|
| 202 | 242 |
## Verification Checklist For Each Optimization |
| 203 | 243 |
|
@@ -2139,26 +2139,19 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
|
||
| 2139 | 2139 |
statementCache: statementCache |
| 2140 | 2140 |
) |
| 2141 | 2141 |
|
| 2142 |
- let writeKind: ArchiveV2SampleWriteKind |
|
| 2143 |
- let eventKind: String |
|
| 2144 | 2142 |
if versionResult.inserted {
|
| 2145 |
- writeKind = .updated |
|
| 2146 |
- eventKind = "representationChanged" |
|
| 2147 |
- } else {
|
|
| 2148 |
- writeKind = .unchanged |
|
| 2149 |
- eventKind = "verified" |
|
| 2143 |
+ try insertObservationEvent( |
|
| 2144 |
+ observationID: observationID, |
|
| 2145 |
+ sampleID: sampleResult.id, |
|
| 2146 |
+ versionID: versionResult.id, |
|
| 2147 |
+ eventKind: "representationChanged", |
|
| 2148 |
+ evidenceKind: "healthkit_sample", |
|
| 2149 |
+ observedAt: row.observedAt, |
|
| 2150 |
+ db: db, |
|
| 2151 |
+ statementCache: statementCache |
|
| 2152 |
+ ) |
|
| 2150 | 2153 |
} |
| 2151 | 2154 |
|
| 2152 |
- try insertObservationEvent( |
|
| 2153 |
- observationID: observationID, |
|
| 2154 |
- sampleID: sampleResult.id, |
|
| 2155 |
- versionID: versionResult.id, |
|
| 2156 |
- eventKind: eventKind, |
|
| 2157 |
- evidenceKind: "healthkit_sample", |
|
| 2158 |
- observedAt: row.observedAt, |
|
| 2159 |
- db: db, |
|
| 2160 |
- statementCache: statementCache |
|
| 2161 |
- ) |
|
| 2162 | 2155 |
if versionResult.inserted {
|
| 2163 | 2156 |
try closeOpenVisibilityRanges( |
| 2164 | 2157 |
sampleID: sampleResult.id, |
@@ -2195,7 +2188,10 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
|
||
| 2195 | 2188 |
) |
| 2196 | 2189 |
} |
| 2197 | 2190 |
|
| 2198 |
- return ArchiveV2SampleWriteResult(sampleTypeID: sampleTypeID, kind: writeKind) |
|
| 2191 |
+ return ArchiveV2SampleWriteResult( |
|
| 2192 |
+ sampleTypeID: sampleTypeID, |
|
| 2193 |
+ kind: versionResult.inserted ? .updated : .unchanged |
|
| 2194 |
+ ) |
|
| 2199 | 2195 |
} |
| 2200 | 2196 |
|
| 2201 | 2197 |
private func createObservation( |
@@ -72,6 +72,9 @@ final class SQLiteHealthArchiveStoreTests: XCTestCase {
|
||
| 72 | 72 |
XCTAssertEqual(firstWrite.unchangedCount, 0) |
| 73 | 73 |
XCTAssertEqual(try countRows(in: "samples", at: url), 1) |
| 74 | 74 |
XCTAssertEqual(try countRows(in: "sample_versions", at: url), 1, versionDebugRows) |
| 75 |
+ XCTAssertEqual(try countRows(in: "sample_observation_events", at: url), 1) |
|
| 76 |
+ XCTAssertEqual(try countRows(in: "sample_observation_events WHERE event_kind = 'appeared'", at: url), 1) |
|
| 77 |
+ XCTAssertEqual(try countRows(in: "sample_observation_events WHERE event_kind = 'verified'", at: url), 0) |
|
| 75 | 78 |
XCTAssertEqual(try countRows(in: "sample_visibility_ranges", at: url), 1, visibilityDebugRows) |
| 76 | 79 |
XCTAssertEqual(try countRows(in: "source_revisions", at: url), 1) |
| 77 | 80 |
XCTAssertFalse(try tableExists("archive_samples", at: url))
|