Showing 3 changed files with 63 additions and 24 deletions
+46 -6
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -142,6 +142,41 @@ Conclusion: direct inserts for brand-new dependent rows produced a valid but
142 142
 modest first-import gain. The large reimport improvement was not representative
143 143
 of a clean first snapshot. SQLite insert remains the dominant bottleneck.
144 144
 
145
+### 2026-06-02 Non-Chain-Start Full Scan After Index Removal
146
+
147
+Commit context: after `ff59257` (`Drop unused sample import indexes`)
148
+Source: user-provided diagnostic report with `previousSnapshotID` present and
149
+`isChainStart: false`.
150
+
151
+This is not a comparable first-import benchmark for the unused-index removal,
152
+but it is important because it shows that non-initial captures can be slower
153
+than first imports when the app performs a full-history scan.
154
+
155
+| Metric | Value |
156
+|--------|-------|
157
+| Wall clock | 22m 33s |
158
+| Summed metric total | 22m 14s |
159
+| Summed fetch | 52.0s |
160
+| Summed processing | 2m 23s |
161
+| Summed insert | 18m 44s |
162
+| Summed finalize | 11.5s |
163
+| Heart Rate count | 922,440 |
164
+| Heart Rate total | 13m 30s |
165
+| Heart Rate fetch | 24.3s |
166
+| Heart Rate processing | 1m 29s |
167
+| Heart Rate insert | 11m 25s |
168
+| Active Energy count | 348,698 |
169
+| Active Energy insert | 4m 44s |
170
+| Steps insert | 40.4s |
171
+| Walking + Running Distance insert | 36.0s |
172
+
173
+Conclusion: this run should not be used to judge first-import index removal.
174
+However, it indicates a separate bottleneck: subsequent full scans still spend
175
+most of their time in SQLite writes, likely because unchanged samples are still
176
+touching the archive write path. The next implementation target should reduce
177
+per-sample work for unchanged existing samples during verification/full-scan
178
+captures.
179
+
145 180
 ## Optimization Iterations
146 181
 
147 182
 | Date | Commit | Change | Result / Status |
@@ -159,6 +194,8 @@ of a clean first snapshot. SQLite insert remains the dominant bottleneck.
159 194
 | 2026-06-02 | `c138b7b` | Increased initial import write chunk sizes. | Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. |
160 195
 | 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Confirmed modest first-import gain: wall clock 18m30s -> 17m13s, summed insert 15m24s -> 14m38s, Heart Rate insert 9m58s -> 8m59s. |
161 196
 | 2026-06-02 | `ff59257` | Removed unused `samples` indexes on global UUID hash and semantic fingerprint. | Awaiting comparable first-import report. Expected signal is lower `SummedInsertElapsed`; deleted-object lookup remains covered by `(sample_type_id, sample_uuid_hash)`. |
197
+| 2026-06-02 | pending | Captured non-chain-start full-scan report after index removal. | Not comparable for first-import performance; reveals a separate full-scan/unchanged-sample write bottleneck. |
198
+| 2026-06-02 | pending | Stopped writing `verified` observation events for unchanged existing samples. | Awaiting comparable non-chain-start/full-scan report. Expected signal is lower `SummedInsertElapsed` and especially lower Heart Rate insert time when most rows are unchanged. |
162 199
 
163 200
 ## Current Diagnosis
164 201
 
@@ -183,21 +220,24 @@ The likely bottleneck is per-row SQLite work:
183 220
 - A previous Heart Rate import appeared to stall for long periods around roughly 900k records, but later progress resumed; avoid classifying this as a hard timeout without report evidence.
184 221
 - After a completed import, the app may remain unresponsive for more than one minute. This needs separate timing around post-import cache rebuild, UI refresh, report generation, and main-thread work.
185 222
 - Partial / old imported observations can pollute comparisons. Fresh first-snapshot performance comparisons should use a confirmed reset database.
223
+- Non-chain-start full scans can be slower than first imports if unchanged existing samples still write per-sample archive evidence.
186 224
 
187 225
 ## Next Experiments
188 226
 
189 227
 Prioritize experiments in this order:
190 228
 
191 229
 1. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
192
-2. Profile whether index maintenance dominates first-import insert cost.
193
-3. Consider a guarded bulk-import mode for first observations:
230
+2. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
231
+3. Reduce remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans, especially open visibility-range existence checks.
232
+4. Profile whether index maintenance dominates first-import insert cost.
233
+5. Consider a guarded bulk-import mode for first observations:
194 234
    - keep archive semantics unchanged;
195 235
    - only relax work that can be safely reconstructed or validated;
196 236
    - re-enable normal idempotent paths for incremental observations.
197
-4. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
198
-5. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
199
-6. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
200
-7. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
237
+6. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
238
+7. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
239
+8. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
240
+9. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
201 241
 
202 242
 ## Verification Checklist For Each Optimization
203 243
 
+14 -18
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -2139,26 +2139,19 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
2139 2139
             statementCache: statementCache
2140 2140
         )
2141 2141
 
2142
-        let writeKind: ArchiveV2SampleWriteKind
2143
-        let eventKind: String
2144 2142
         if versionResult.inserted {
2145
-            writeKind = .updated
2146
-            eventKind = "representationChanged"
2147
-        } else {
2148
-            writeKind = .unchanged
2149
-            eventKind = "verified"
2143
+            try insertObservationEvent(
2144
+                observationID: observationID,
2145
+                sampleID: sampleResult.id,
2146
+                versionID: versionResult.id,
2147
+                eventKind: "representationChanged",
2148
+                evidenceKind: "healthkit_sample",
2149
+                observedAt: row.observedAt,
2150
+                db: db,
2151
+                statementCache: statementCache
2152
+            )
2150 2153
         }
2151 2154
 
2152
-        try insertObservationEvent(
2153
-            observationID: observationID,
2154
-            sampleID: sampleResult.id,
2155
-            versionID: versionResult.id,
2156
-            eventKind: eventKind,
2157
-            evidenceKind: "healthkit_sample",
2158
-            observedAt: row.observedAt,
2159
-            db: db,
2160
-            statementCache: statementCache
2161
-        )
2162 2155
         if versionResult.inserted {
2163 2156
             try closeOpenVisibilityRanges(
2164 2157
                 sampleID: sampleResult.id,
@@ -2195,7 +2188,10 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
2195 2188
             )
2196 2189
         }
2197 2190
 
2198
-        return ArchiveV2SampleWriteResult(sampleTypeID: sampleTypeID, kind: writeKind)
2191
+        return ArchiveV2SampleWriteResult(
2192
+            sampleTypeID: sampleTypeID,
2193
+            kind: versionResult.inserted ? .updated : .unchanged
2194
+        )
2199 2195
     }
2200 2196
 
2201 2197
     private func createObservation(
+3 -0
HealthProbeTests/SQLiteHealthArchiveStoreTests.swift
@@ -72,6 +72,9 @@ final class SQLiteHealthArchiveStoreTests: XCTestCase {
72 72
         XCTAssertEqual(firstWrite.unchangedCount, 0)
73 73
         XCTAssertEqual(try countRows(in: "samples", at: url), 1)
74 74
         XCTAssertEqual(try countRows(in: "sample_versions", at: url), 1, versionDebugRows)
75
+        XCTAssertEqual(try countRows(in: "sample_observation_events", at: url), 1)
76
+        XCTAssertEqual(try countRows(in: "sample_observation_events WHERE event_kind = 'appeared'", at: url), 1)
77
+        XCTAssertEqual(try countRows(in: "sample_observation_events WHERE event_kind = 'verified'", at: url), 0)
75 78
         XCTAssertEqual(try countRows(in: "sample_visibility_ranges", at: url), 1, visibilityDebugRows)
76 79
         XCTAssertEqual(try countRows(in: "source_revisions", at: url), 1)
77 80
         XCTAssertFalse(try tableExists("archive_samples", at: url))