Showing 1 changed files with 45 additions and 17 deletions
+45 -17
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -103,16 +103,44 @@ Conclusion: larger chunks gave only marginal gains. Further optimization should
103 103
 ### 2026-06-02 After Direct Inserts For New Archive Samples
104 104
 
105 105
 Commit: `44d9ebd` (`Use direct inserts for new archive samples`)  
106
-Source: no comparable first-import real-device report yet.
106
+Source: user-provided first-import diagnostic report after database reset.
107 107
 
108
-Note: a later user-provided report looked significantly faster, but it was a
109
-reimport rather than a fresh first snapshot. Do not use it as a direct comparison
110
-against first-import runs.
108
+Note: an earlier user-provided report looked significantly faster, but it was a
109
+reimport rather than a fresh first snapshot and is not used as a direct
110
+comparison against first-import runs.
111 111
 
112
-Expected signal:
113
-- `Heart Rate insertElapsed` should drop first;
114
-- `SummedInsertElapsed` should drop if most first-import rows are new;
115
-- no semantic change should appear in diff counts or repeated-page idempotency.
112
+| Metric | Value |
113
+|--------|-------|
114
+| Wall clock | 17m 13s |
115
+| Summed metric total | 17m 13s |
116
+| Summed fetch | 43.0s |
117
+| Summed processing | 1m 40s |
118
+| Summed insert | 14m 38s |
119
+| Summed finalize | 9.5s |
120
+| Heart Rate count | 922,431 |
121
+| Heart Rate total | 10m 25s |
122
+| Heart Rate fetch | 20.8s |
123
+| Heart Rate processing | 57.0s |
124
+| Heart Rate insert | 8m 59s |
125
+| Active Energy count | 348,669 |
126
+| Active Energy insert | 3m 54s |
127
+| Steps insert | 24.2s |
128
+| Walking + Running Distance insert | 20.7s |
129
+
130
+Comparison against the previous comparable first-import run (`c138b7b`):
131
+
132
+| Metric | Previous | Current | Change |
133
+|--------|----------|---------|--------|
134
+| Wall clock | 18m 30s | 17m 13s | -1m 17s / -7% |
135
+| Summed insert | 15m 24s | 14m 38s | -46s / -5% |
136
+| Heart Rate insert | 9m 58s | 8m 59s | -59s / -10% |
137
+| Active Energy insert | 3m 48s | 3m 54s | +6s / +3% |
138
+| Steps insert | 24.2s | 24.2s | flat |
139
+| Walking + Running Distance insert | 20.0s | 20.7s | +0.7s / +4% |
140
+
141
+Conclusion: direct inserts for brand-new dependent rows produced a valid but
142
+modest first-import gain. The large reimport improvement was not representative
143
+of a clean first snapshot. SQLite insert remains the dominant bottleneck.
116 144
 
117 145
 ## Optimization Iterations
118 146
 
@@ -129,16 +157,16 @@ Expected signal:
129 157
 | 2026-06-02 | `bcbf9a5` | Cleaned up import diagnostic timings. | Corrected date-fetch wall-clock measurement and report text. |
130 158
 | 2026-06-02 | `a026566` | Batched initial import archive writes across several fetched pages. | Wall clock improved from about 20m25s to 18m21s on the measured first import. |
131 159
 | 2026-06-02 | `c138b7b` | Increased initial import write chunk sizes. | Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. |
132
-| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Awaiting comparable first-import real-device report. Tests passed. |
160
+| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Confirmed modest first-import gain: wall clock 18m30s -> 17m13s, summed insert 15m24s -> 14m38s, Heart Rate insert 9m58s -> 8m59s. |
133 161
 
134 162
 ## Current Diagnosis
135 163
 
136 164
 The import is no longer primarily a HealthKit fetch problem. On the latest comparable first-import measured run:
137 165
 
138
-- total wall clock was 18m30s;
139
-- summed fetch was only 46.8s;
140
-- summed insert was 15m24s;
141
-- Heart Rate alone spent 9m58s inserting.
166
+- total wall clock was 17m13s after the latest direct-insert optimization;
167
+- summed fetch was only 43.0s;
168
+- summed insert was 14m38s;
169
+- Heart Rate alone spent 8m59s inserting.
142 170
 
143 171
 The likely bottleneck is per-row SQLite work:
144 172
 - uniqueness checks on hot tables;
@@ -159,13 +187,13 @@ The likely bottleneck is per-row SQLite work:
159 187
 
160 188
 Prioritize experiments in this order:
161 189
 
162
-1. Run a fresh first-snapshot import after `44d9ebd` and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
163
-2. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
164
-3. Profile whether index maintenance dominates first-import insert cost.
165
-4. Consider a guarded bulk-import mode for first observations:
190
+1. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
191
+2. Profile whether index maintenance dominates first-import insert cost.
192
+3. Consider a guarded bulk-import mode for first observations:
166 193
    - keep archive semantics unchanged;
167 194
    - only relax work that can be safely reconstructed or validated;
168 195
    - re-enable normal idempotent paths for incremental observations.
196
+4. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
169 197
 5. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
170 198
 6. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
171 199