Record reset import benchmark · c695281

+45 -17

HealthProbe/Doc/04-project/Import-Optimization-Log.md

@@ -103,16 +103,44 @@ Conclusion: larger chunks gave only marginal gains. Further optimization should
 ### 2026-06-02 After Direct Inserts For New Archive Samples
 
 Commit: `44d9ebd` (`Use direct inserts for new archive samples`)  
-Source: no comparable first-import real-device report yet.
+Source: user-provided first-import diagnostic report after database reset.
 
-Note: a later user-provided report looked significantly faster, but it was a
-reimport rather than a fresh first snapshot. Do not use it as a direct comparison
-against first-import runs.
+Note: an earlier user-provided report looked significantly faster, but it was a
+reimport rather than a fresh first snapshot and is not used as a direct
+comparison against first-import runs.
 
-Expected signal:
-- `Heart Rate insertElapsed` should drop first;
-- `SummedInsertElapsed` should drop if most first-import rows are new;
-- no semantic change should appear in diff counts or repeated-page idempotency.
+| Metric | Value |
+|--------|-------|
+| Wall clock | 17m 13s |
+| Summed metric total | 17m 13s |
+| Summed fetch | 43.0s |
+| Summed processing | 1m 40s |
+| Summed insert | 14m 38s |
+| Summed finalize | 9.5s |
+| Heart Rate count | 922,431 |
+| Heart Rate total | 10m 25s |
+| Heart Rate fetch | 20.8s |
+| Heart Rate processing | 57.0s |
+| Heart Rate insert | 8m 59s |
+| Active Energy count | 348,669 |
+| Active Energy insert | 3m 54s |
+| Steps insert | 24.2s |
+| Walking + Running Distance insert | 20.7s |
+
+Comparison against the previous comparable first-import run (`c138b7b`):
+
+| Metric | Previous | Current | Change |
+|--------|----------|---------|--------|
+| Wall clock | 18m 30s | 17m 13s | -1m 17s / -7% |
+| Summed insert | 15m 24s | 14m 38s | -46s / -5% |
+| Heart Rate insert | 9m 58s | 8m 59s | -59s / -10% |
+| Active Energy insert | 3m 48s | 3m 54s | +6s / +3% |
+| Steps insert | 24.2s | 24.2s | flat |
+| Walking + Running Distance insert | 20.0s | 20.7s | +0.7s / +4% |
+
+Conclusion: direct inserts for brand-new dependent rows produced a valid but
+modest first-import gain. The large reimport improvement was not representative
+of a clean first snapshot. SQLite insert remains the dominant bottleneck.
 
 ## Optimization Iterations
 
@@ -129,16 +157,16 @@ Expected signal:
 | 2026-06-02 | `bcbf9a5` | Cleaned up import diagnostic timings. | Corrected date-fetch wall-clock measurement and report text. |
 | 2026-06-02 | `a026566` | Batched initial import archive writes across several fetched pages. | Wall clock improved from about 20m25s to 18m21s on the measured first import. |
 | 2026-06-02 | `c138b7b` | Increased initial import write chunk sizes. | Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. |
-| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Awaiting comparable first-import real-device report. Tests passed. |
+| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Confirmed modest first-import gain: wall clock 18m30s -> 17m13s, summed insert 15m24s -> 14m38s, Heart Rate insert 9m58s -> 8m59s. |
 
 ## Current Diagnosis
 
 The import is no longer primarily a HealthKit fetch problem. On the latest comparable first-import measured run:
 
-- total wall clock was 18m30s;
-- summed fetch was only 46.8s;
-- summed insert was 15m24s;
-- Heart Rate alone spent 9m58s inserting.
+- total wall clock was 17m13s after the latest direct-insert optimization;
+- summed fetch was only 43.0s;
+- summed insert was 14m38s;
+- Heart Rate alone spent 8m59s inserting.
 
 The likely bottleneck is per-row SQLite work:
 - uniqueness checks on hot tables;
@@ -159,13 +187,13 @@ The likely bottleneck is per-row SQLite work:
 
 Prioritize experiments in this order:
 
-1. Run a fresh first-snapshot import after `44d9ebd` and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
-2. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
-3. Profile whether index maintenance dominates first-import insert cost.
-4. Consider a guarded bulk-import mode for first observations:
+1. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
+2. Profile whether index maintenance dominates first-import insert cost.
+3. Consider a guarded bulk-import mode for first observations:
    - keep archive semantics unchanged;
    - only relax work that can be safely reconstructed or validated;
    - re-enable normal idempotent paths for incremental observations.
+4. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
 5. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
 6. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
 


	@@ -103,16 +103,44 @@ Conclusion: larger chunks gave only marginal gains. Further optimization should
103	103	### 2026-06-02 After Direct Inserts For New Archive Samples
104	104
105	105	Commit: `44d9ebd` (`Use direct inserts for new archive samples`)
106		-Source: no comparable first-import real-device report yet.
	106	+Source: user-provided first-import diagnostic report after database reset.
107	107
108		-Note: a later user-provided report looked significantly faster, but it was a
109		-reimport rather than a fresh first snapshot. Do not use it as a direct comparison
110		-against first-import runs.
	108	+Note: an earlier user-provided report looked significantly faster, but it was a
	109	+reimport rather than a fresh first snapshot and is not used as a direct
	110	+comparison against first-import runs.
111	111
112		-Expected signal:
113		-- `Heart Rate insertElapsed` should drop first;
114		-- `SummedInsertElapsed` should drop if most first-import rows are new;
115		-- no semantic change should appear in diff counts or repeated-page idempotency.
	112	+\| Metric \| Value \|
	113	+\|--------\|-------\|
	114	+\| Wall clock \| 17m 13s \|
	115	+\| Summed metric total \| 17m 13s \|
	116	+\| Summed fetch \| 43.0s \|
	117	+\| Summed processing \| 1m 40s \|
	118	+\| Summed insert \| 14m 38s \|
	119	+\| Summed finalize \| 9.5s \|
	120	+\| Heart Rate count \| 922,431 \|
	121	+\| Heart Rate total \| 10m 25s \|
	122	+\| Heart Rate fetch \| 20.8s \|
	123	+\| Heart Rate processing \| 57.0s \|
	124	+\| Heart Rate insert \| 8m 59s \|
	125	+\| Active Energy count \| 348,669 \|
	126	+\| Active Energy insert \| 3m 54s \|
	127	+\| Steps insert \| 24.2s \|
	128	+\| Walking + Running Distance insert \| 20.7s \|
	129	+
	130	+Comparison against the previous comparable first-import run (`c138b7b`):
	131	+
	132	+\| Metric \| Previous \| Current \| Change \|
	133	+\|--------\|----------\|---------\|--------\|
	134	+\| Wall clock \| 18m 30s \| 17m 13s \| -1m 17s / -7% \|
	135	+\| Summed insert \| 15m 24s \| 14m 38s \| -46s / -5% \|
	136	+\| Heart Rate insert \| 9m 58s \| 8m 59s \| -59s / -10% \|
	137	+\| Active Energy insert \| 3m 48s \| 3m 54s \| +6s / +3% \|
	138	+\| Steps insert \| 24.2s \| 24.2s \| flat \|
	139	+\| Walking + Running Distance insert \| 20.0s \| 20.7s \| +0.7s / +4% \|
	140	+
	141	+Conclusion: direct inserts for brand-new dependent rows produced a valid but
	142	+modest first-import gain. The large reimport improvement was not representative
	143	+of a clean first snapshot. SQLite insert remains the dominant bottleneck.
116	144
117	145	## Optimization Iterations
118	146
	@@ -129,16 +157,16 @@ Expected signal:
129	157	\| 2026-06-02 \| `bcbf9a5` \| Cleaned up import diagnostic timings. \| Corrected date-fetch wall-clock measurement and report text. \|
130	158	\| 2026-06-02 \| `a026566` \| Batched initial import archive writes across several fetched pages. \| Wall clock improved from about 20m25s to 18m21s on the measured first import. \|
131	159	\| 2026-06-02 \| `c138b7b` \| Increased initial import write chunk sizes. \| Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. \|
132		-\| 2026-06-02 \| `44d9ebd` \| Used direct inserts for dependent rows when `samples` creates a new sample. \| Awaiting comparable first-import real-device report. Tests passed. \|
	160	+\| 2026-06-02 \| `44d9ebd` \| Used direct inserts for dependent rows when `samples` creates a new sample. \| Confirmed modest first-import gain: wall clock 18m30s -> 17m13s, summed insert 15m24s -> 14m38s, Heart Rate insert 9m58s -> 8m59s. \|
133	161
134	162	## Current Diagnosis
135	163
136	164	The import is no longer primarily a HealthKit fetch problem. On the latest comparable first-import measured run:
137	165
138		-- total wall clock was 18m30s;
139		-- summed fetch was only 46.8s;
140		-- summed insert was 15m24s;
141		-- Heart Rate alone spent 9m58s inserting.
	166	+- total wall clock was 17m13s after the latest direct-insert optimization;
	167	+- summed fetch was only 43.0s;
	168	+- summed insert was 14m38s;
	169	+- Heart Rate alone spent 8m59s inserting.
142	170
143	171	The likely bottleneck is per-row SQLite work:
144	172	- uniqueness checks on hot tables;
	@@ -159,13 +187,13 @@ The likely bottleneck is per-row SQLite work:
159	187
160	188	Prioritize experiments in this order:
161	189
162		-1. Run a fresh first-snapshot import after `44d9ebd` and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
163		-2. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
164		-3. Profile whether index maintenance dominates first-import insert cost.
165		-4. Consider a guarded bulk-import mode for first observations:
	190	+1. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
	191	+2. Profile whether index maintenance dominates first-import insert cost.
	192	+3. Consider a guarded bulk-import mode for first observations:
166	193	- keep archive semantics unchanged;
167	194	- only relax work that can be safely reconstructed or validated;
168	195	- re-enable normal idempotent paths for incremental observations.
	196	+4. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
169	197	5. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
170	198	6. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
171	199