Showing 3 changed files with 181 additions and 1 deletions
+2 -1
HealthProbe/Doc/04-project/IMPLEMENTATION_STATUS.md
@@ -1,6 +1,6 @@
1 1
 # HealthProbe - Implementation Status
2 2
 
3
-**Last Updated:** 2026-05-31
3
+**Last Updated:** 2026-06-02
4 4
 
5 5
 ## Current Reality
6 6
 
@@ -37,6 +37,7 @@ There are no real deployments, only test installations. Existing prototype datab
37 37
 ## Refactoring Priorities
38 38
 
39 39
 Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
40
+Import performance iterations and measured reports live in [`Import-Optimization-Log.md`](Import-Optimization-Log.md).
40 41
 
41 42
 1. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
42 43
 2. Add targeted cache invalidation for affected observation/type ranges.
+175 -0
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -0,0 +1,175 @@
1
+# HealthProbe Import Optimization Log
2
+
3
+**Canonical path:** `HealthProbe/Doc/04-project/Import-Optimization-Log.md`  
4
+**Created:** 2026-06-02  
5
+**Purpose:** Track import performance work, measured results, regressions, and next experiments.
6
+
7
+This is a living project log. Update it after each import optimization commit and after each real-device import report.
8
+
9
+## Scope
10
+
11
+The current optimization target is the initial / full-history HealthKit import into the SQLite archive.
12
+
13
+Primary goals:
14
+- complete large first-run imports without app freeze or watchdog-like stalls;
15
+- keep memory bounded for low-end devices;
16
+- reduce wall-clock duration enough to make future background / scheduled collection realistic;
17
+- keep archive writes idempotent and differential;
18
+- preserve SQLite as the source of truth.
19
+
20
+Non-goals for this log:
21
+- redesigning HealthKit background collection strategy;
22
+- changing archive semantics;
23
+- optimizing UI rendering after import, except when post-import work blocks the app.
24
+
25
+## Measurement Fields
26
+
27
+Use the HealthProbe diagnostic report fields below for comparisons:
28
+
29
+| Field | Meaning |
30
+|-------|---------|
31
+| `WallClockDuration` / `Duration` | User-visible total operation time. |
32
+| `SummedFetchElapsed` | Time spent fetching HealthKit samples. Per-metric sums may overlap. |
33
+| `SummedProcessingElapsed` | Time spent converting HealthKit samples into archive rows. |
34
+| `SummedInsertElapsed` | Time spent writing rows to SQLite. Current main bottleneck. |
35
+| `SummedFinalizeElapsed` | Type verification, aggregate rebuild, and finalization cost. |
36
+| Per-type `insertElapsed` | Most useful field for high-volume types such as Heart Rate and Active Energy. |
37
+
38
+Important interpretation:
39
+- per-metric timing sums can exceed wall-clock time because type fetches overlap;
40
+- progress rates shown during import may overestimate throughput if overhead is not included;
41
+- compare first snapshots only against first snapshots after a database reset.
42
+
43
+## Real-Device Results
44
+
45
+### 2026-06-02 Baseline Before Latest Batch/Chunk Work
46
+
47
+Source: user-provided diagnostic report.
48
+
49
+| Metric | Value |
50
+|--------|-------|
51
+| Wall clock | 20m 25s |
52
+| Summed fetch | 1m 03s |
53
+| Summed processing | 1m 31s |
54
+| Summed insert | 17m 02s |
55
+| Summed finalize | 9.2s |
56
+| Heart Rate count | 923,466 |
57
+| Heart Rate insert | 10m 45s |
58
+| Active Energy insert | 4m 29s |
59
+| Steps insert | 27.1s |
60
+| Walking + Running Distance insert | 21.8s |
61
+
62
+Conclusion: SQLite insert dominated the run. HealthKit fetch was not the limiting factor.
63
+
64
+### 2026-06-02 After Batched Initial Archive Writes
65
+
66
+Source: user-provided diagnostic report after commit `a026566`.
67
+
68
+| Metric | Value |
69
+|--------|-------|
70
+| Wall clock | 18m 21s |
71
+| Summed insert | 15m 44s |
72
+| Heart Rate insert | 10m 03s |
73
+| Active Energy insert | 3m 51s |
74
+| Steps insert | 28.1s |
75
+| Walking + Running Distance insert | 25.4s |
76
+
77
+Conclusion: batching produced a useful improvement, but insert remained dominant.
78
+
79
+### 2026-06-02 After Larger Initial Write Chunks
80
+
81
+Source: user-provided diagnostic report after commit `c138b7b`.
82
+
83
+| Metric | Value |
84
+|--------|-------|
85
+| Wall clock | 18m 30s |
86
+| Summed metric total | 18m 02s |
87
+| Summed fetch | 46.8s |
88
+| Summed processing | 1m 37s |
89
+| Summed insert | 15m 24s |
90
+| Summed finalize | 10.5s |
91
+| Heart Rate count | 922,404 |
92
+| Heart Rate total | 11m 23s |
93
+| Heart Rate fetch | 21.2s |
94
+| Heart Rate processing | 56.1s |
95
+| Heart Rate insert | 9m 58s |
96
+| Active Energy count | 348,635 |
97
+| Active Energy insert | 3m 48s |
98
+| Steps insert | 24.2s |
99
+| Walking + Running Distance insert | 20.0s |
100
+
101
+Conclusion: larger chunks gave only marginal gains. Further optimization should reduce per-sample SQLite work rather than only increasing page/chunk size.
102
+
103
+### 2026-06-02 After Direct Inserts For New Archive Samples
104
+
105
+Commit: `44d9ebd` (`Use direct inserts for new archive samples`)  
106
+Source: no real-device import report yet.
107
+
108
+Expected signal:
109
+- `Heart Rate insertElapsed` should drop first;
110
+- `SummedInsertElapsed` should drop if most first-import rows are new;
111
+- no semantic change should appear in diff counts or repeated-page idempotency.
112
+
113
+## Optimization Iterations
114
+
115
+| Date | Commit | Change | Result / Status |
116
+|------|--------|--------|-----------------|
117
+| 2026-06-02 | `fd08ded` | Added explicit fetch / processing / insert / finalize timings to reports. | Made phase comparisons possible without inferring from UI progress. |
118
+| 2026-06-02 | `87f1a85` | Cached repeated SQLite write-path lookups within grouped imports. | Reduced repeated id lookup pressure in hot path. |
119
+| 2026-06-02 | `7294a01` | Fast-pathed visibility writes for new archive samples. | Removed redundant visibility close/existence work for brand-new samples. |
120
+| 2026-06-02 | `585d77f` | Tightened archive verification aggregate queries. | Reduced finalization / verification rescans. |
121
+| 2026-06-02 | `2dd279c` | Used rowid fast path for new archive sample rows. | Avoided follow-up id lookup queries when SQLite confirmed new inserts. |
122
+| 2026-06-02 | `f569b6c` | Fixed scheduled test database reset. | Restored ability to compare fresh first-snapshot imports. |
123
+| 2026-06-02 | `986f343` | Increased Heart Rate import write chunks. | Early attempt to reduce paging/write overhead for the largest metric. |
124
+| 2026-06-02 | `c1ebd37` | Sped up archive verification finalization. | Reduced finalize pressure; insert remained dominant. |
125
+| 2026-06-02 | `bcbf9a5` | Cleaned up import diagnostic timings. | Corrected date-fetch wall-clock measurement and report text. |
126
+| 2026-06-02 | `a026566` | Batched initial import archive writes across several fetched pages. | Wall clock improved from about 20m25s to 18m21s on the measured first import. |
127
+| 2026-06-02 | `c138b7b` | Increased initial import write chunk sizes. | Marginal improvement: summed insert from 15m44s to 15m24s on the next comparable run. |
128
+| 2026-06-02 | `44d9ebd` | Used direct inserts for dependent rows when `samples` creates a new sample. | Awaiting real-device report. Tests passed. |
129
+
130
+## Current Diagnosis
131
+
132
+The import is no longer primarily a HealthKit fetch problem. On the latest measured run:
133
+
134
+- total wall clock was 18m30s;
135
+- summed fetch was only 46.8s;
136
+- summed insert was 15m24s;
137
+- Heart Rate alone spent 9m58s inserting.
138
+
139
+The likely bottleneck is per-row SQLite work:
140
+- uniqueness checks on hot tables;
141
+- index maintenance while importing high-volume rows;
142
+- multiple dependent writes per sample;
143
+- commit / transaction shape;
144
+- Core Data or UI refresh work after SQLite completes, if the app remains unresponsive after import.
145
+
146
+## Open Issues / Observations
147
+
148
+- Very small pages reduced freeze risk but introduced visible overhead.
149
+- Some progress timing displayed in the UI did not include overhead, so elapsed time and rates looked better than the real operation.
150
+- A previous Heart Rate import appeared to stall for long periods around roughly 900k records, but later progress resumed; avoid classifying this as a hard timeout without report evidence.
151
+- After a completed import, the app may remain unresponsive for more than one minute. This needs separate timing around post-import cache rebuild, UI refresh, report generation, and main-thread work.
152
+- Partial / old imported observations can pollute comparisons. Fresh first-snapshot performance comparisons should use a confirmed reset database.
153
+
154
+## Next Experiments
155
+
156
+Prioritize experiments in this order:
157
+
158
+1. Run a fresh first-snapshot import after `44d9ebd` and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
159
+2. Add explicit post-import timings if the app is still unresponsive after the operation reports success.
160
+3. Profile whether index maintenance dominates first-import insert cost.
161
+4. Consider a guarded bulk-import mode for first observations:
162
+   - keep archive semantics unchanged;
163
+   - only relax work that can be safely reconstructed or validated;
164
+   - re-enable normal idempotent paths for incremental observations.
165
+5. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
166
+6. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
167
+
168
+## Verification Checklist For Each Optimization
169
+
170
+- [ ] `git diff --check` passes.
171
+- [ ] SQLite archive store tests pass.
172
+- [ ] Import configuration tests pass if capture strategy changed.
173
+- [ ] Repeated-page/idempotency behavior remains covered.
174
+- [ ] A real-device report is attached or summarized in this log.
175
+- [ ] The next experiment is recorded before moving on.
+4 -0
HealthProbe/Doc/README.md
@@ -36,6 +36,7 @@ Use the chapter map below. Send agents to the narrowest document that matches th
36 36
 | General agent ownership and handoff rules | [`00-agent-guides/AGENTS.md`](00-agent-guides/AGENTS.md) |
37 37
 | SwiftUI/UI work | [`00-agent-guides/CLAUDE.md`](00-agent-guides/CLAUDE.md) |
38 38
 | Refactoring milestones and sequencing | [`04-project/Refactoring-Plan.md`](04-project/Refactoring-Plan.md) |
39
+| Import performance iterations and real-device timing comparisons | [`04-project/Import-Optimization-Log.md`](04-project/Import-Optimization-Log.md) |
39 40
 | Project status and refactoring priorities | [`04-project/IMPLEMENTATION_STATUS.md`](04-project/IMPLEMENTATION_STATUS.md) |
40 41
 | SwiftData retirement inventory | [`04-project/SwiftData-Retirement-Inventory.md`](04-project/SwiftData-Retirement-Inventory.md) |
41 42
 | Historical UI notes only | [`99-archive/`](99-archive/) |
@@ -85,6 +86,9 @@ Use the chapter map below. Send agents to the narrowest document that matches th
85 86
 - [`04-project/Refactoring-Plan.md`](04-project/Refactoring-Plan.md)
86 87
   Checkable milestone plan for the database-led refactor from prototype architecture to SQLite archive v2 + Core Data cache.
87 88
 
89
+- [`04-project/Import-Optimization-Log.md`](04-project/Import-Optimization-Log.md)
90
+  Living log for first-import optimization work, measured real-device reports, bottleneck diagnosis, and next experiments.
91
+
88 92
 - [`04-project/IMPLEMENTATION_STATUS.md`](04-project/IMPLEMENTATION_STATUS.md)
89 93
   Current implementation status and refactoring priorities.
90 94