Showing 6 changed files with 97 additions and 27 deletions
+23 -2
HealthProbe/Doc/01-product/Product-Specification.md
@@ -38,12 +38,30 @@ Because of this, record-by-record cross-device comparison is out of scope and co
38 38
 ### 2.3 Current Objective
39 39
 
40 40
 HealthProbe is now a single-device local Health DB Time Machine:
41
-- capture selected HealthKit-accessible data as it exists at observation time
41
+- capture HealthKit-accessible data as it exists at observation time
42 42
 - reconstruct how the local Health database looked at a chosen date
43 43
 - show additions, removals, representation changes, and aggregate changes between observations
44 44
 - preserve local evidence that HealthKit may later aggregate or no longer export
45 45
 - export scoped historical views for personal backup, support, research, or external analysis
46 46
 
47
+### 2.3.1 Full Authorized Archive Direction
48
+
49
+The 15-type capture profile used during the first archive-v2 refactor is a v1
50
+test/profile constraint, not the v2 product objective.
51
+
52
+For v2, HealthProbe should aim to archive every HealthKit sample type that all
53
+of the following allow:
54
+- HealthKit exposes the type through public read APIs on the current OS/device;
55
+- the user grants read permission;
56
+- the current archive schema can preserve the type without losing essential
57
+  value, date, source, metadata, or relationship information.
58
+
59
+The app may offer exclusions for privacy, performance, and device constraints,
60
+but the architectural default is "backup all authorized HealthKit-accessible
61
+data", not "backup only a small monitored subset". Unsupported, unauthorized,
62
+excluded, or schema-limited types must be reported explicitly in coverage and
63
+export manifests.
64
+
47 65
 ### 2.4 Interpretation Model
48 66
 
49 67
 HealthProbe describes changes neutrally:
@@ -309,7 +327,10 @@ The archive must preserve as much HealthKit information as the API exposes:
309 327
 - first-seen / last-seen / last-verified observations
310 328
 - fingerprints suitable for matching against Apple Health XML exports and extracted backup databases
311 329
 
312
-The archive is selected by data type for performance and privacy, but it is stored in **one schema** so later analysis can follow relationships between types.
330
+The archive may be excluded by data type for performance and privacy, but it is
331
+stored in **one schema** so later analysis can follow relationships between
332
+types. Full authorized backup is the v2 direction; scoped type selection is a
333
+control surface and test profile, not the final archive boundary.
313 334
 
314 335
 ### 6.2 Reports and Point Exports
315 336
 
+23 -1
HealthProbe/Doc/02-architecture/Database-Design.md
@@ -86,7 +86,7 @@ An observation records:
86 86
 - when capture started/ended;
87 87
 - app/schema/OS context;
88 88
 - timezone context at observation time;
89
-- selected type registry;
89
+- requested/authorized/excluded type registry and coverage status;
90 90
 - per-type capture quality;
91 91
 - HealthKit anchors;
92 92
 - events and aggregate changes observed during the capture.
@@ -147,6 +147,28 @@ SQLite stores materialized aggregates because many reports and screens need expe
147 147
 
148 148
 Aggregates are archive-derived evidence, not the source of truth. They must be rebuildable from sample/version/event tables.
149 149
 
150
+### Full Dataset Discovery
151
+
152
+Archive-v2 quality cannot be judged from the original 15-type test profile. The
153
+storage design must be validated against the complete set of HealthKit sample
154
+types exposed by the current OS/device and authorized by the user.
155
+
156
+HealthProbe therefore needs a full dataset discovery pass before declaring the
157
+import/store mechanism complete:
158
+- enumerate all known HealthKit quantity/category/workout types the app can ask
159
+  to read;
160
+- record whether each type is supported, unauthorized, excluded, schema-limited,
161
+  empty, or archived;
162
+- collect count, earliest/latest date, fetch timing, and import timing per type;
163
+- identify sample classes or relationships that the current schema cannot yet
164
+  preserve without loss;
165
+- keep the v1 15-type profile available only as a benchmark/debug subset.
166
+
167
+The v2 archive target is "all authorized HealthKit-accessible data that the
168
+schema can faithfully preserve", with user exclusions and coverage reporting.
169
+Unsupported or inaccessible data must be visible in diagnostics and export
170
+manifests instead of silently disappearing from the backup boundary.
171
+
150 172
 ## 5. Target SQLite Schema
151 173
 
152 174
 Exact names may evolve, but the shape and constraints should remain.
+17 -6
HealthProbe/Doc/02-architecture/Implementation-Guide.md
@@ -29,7 +29,7 @@ HealthProbe is a single-device local archive and time-machine app for HealthKit-
29 29
 The implementation must prioritize:
30 30
 - point-in-time reconstruction of local HealthKit observations
31 31
 - neutral change explanation between observations
32
-- preservation of selected details before HealthKit aggregation/consolidation makes them unavailable
32
+- preservation of authorized HealthKit-accessible details before HealthKit aggregation/consolidation makes them unavailable
33 33
 - scoped user exports
34 34
 - no HealthProbe CloudKit/iCloud sync
35 35
 - no cross-device record-by-record comparison
@@ -69,11 +69,22 @@ Use:
69 69
 Capture flow:
70 70
 1. Resolve the current local device chain ID.
71 71
 2. Start one archive observation record for the user-visible capture and keep its id.
72
-3. For each selected sample type, run anchored queries.
73
-4. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
74
-5. Update materialized aggregate tables in SQLite.
75
-6. Save/rebuild derived Core Data cache rows only after archive writes succeed.
76
-7. Compute summary/diff caches for UI and reports.
72
+3. Resolve the capture profile. The v1 profile uses the original tested core
73
+   types; the v2/full-backup direction uses every HealthKit sample type that is
74
+   supported, authorized, not user-excluded, and representable by the archive
75
+   schema.
76
+4. For each requested sample type, run anchored queries or mark an explicit
77
+   coverage status when unsupported, unauthorized, excluded, empty, timed out, or
78
+   schema-limited.
79
+5. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
80
+6. Update materialized aggregate tables in SQLite.
81
+7. Save/rebuild derived Core Data cache rows only after archive writes succeed.
82
+8. Compute summary/diff caches for UI and reports.
83
+
84
+The import/store mechanism is not considered complete until it has been tested
85
+against the full HealthKit-accessible dataset on real devices. The original
86
+15-type profile is useful for iteration speed, but it is not representative
87
+enough to validate archive completeness or worst-case performance.
77 88
 
78 89
 Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.
79 90
 
+13 -6
HealthProbe/Doc/04-project/IMPLEMENTATION_STATUS.md
@@ -13,6 +13,8 @@ The product direction has changed. The target architecture is now:
13 13
 - Core Data UI/report cache;
14 14
 - Time Machine UI and scoped exports;
15 15
 - recovery-compatible archive/export format;
16
+- v2 full authorized HealthKit backup direction, with explicit user exclusions
17
+  and coverage reporting;
16 18
 - no in-app restore, backup patching, or HealthKit re-publication.
17 19
 
18 20
 Current SwiftData models and anomaly-oriented naming are legacy/prototype implementation details.
@@ -24,7 +26,7 @@ There are no real deployments, only test installations. Existing prototype datab
24 26
 | Area | Current Status | Target / Next Work |
25 27
 |------|----------------|--------------------|
26 28
 | Product docs | Updated | Keep `HealthProbe/Doc/README.md` as canonical index |
27
-| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, and persists large HealthKit pages in smaller archive chunks while using type-specific import strategies: conservative paging for the heaviest metrics, more aggressive pages/chunks for ordinary metrics, adaptive write chunk sizing, batched deleted-object persistence, explicit task yields, and lower-allocation streaming loops to avoid long monolithic SQLite stalls | Continue moving UI/cache reads to archive-backed observation ids and revisit full checkpoint/resume and background collection separately |
29
+| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, persists large HealthKit pages in smaller archive chunks while using type-specific import strategies, and has an expanded HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default | Run full dataset discovery/coverage on real devices before declaring import/storage complete; then revisit full checkpoint/resume and background collection |
28 30
 | SQLite archive | Archive v2 schema, snapshot-level observation grouping, differential write path, v2 verification/delete bookkeeping, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; the legacy `archive_samples` mirror has been removed, the hot write path now reuses prepared SQLite statements within grouped page writes instead of reparsing the same SQL for every sample, caches repeated sample-type/source/source-revision/device/metadata id lookups within grouped writes, skips redundant visibility close/existence checks when grouped imports create a brand-new sample or payload version, skips follow-up id lookup queries when SQLite confirms new sample/sample-version inserts, reuses verification aggregates instead of rescanning them twice, drives per-type finalize queries from sample-type-filtered sample ids, processes sample rows in a lower-allocation streaming loop, batches same-page deleted-object evidence in one transaction, adds composite indexes for visibility-range and sample-uuid hot lookups, and opens SQLite connections with import-friendly busy timeout / synchronous / temp-store pragmas | Continue moving capture/Dashboard actions to archive/cache DTOs |
29 31
 | Core Data cache | Initial programmatic Core Data model, full-cache rebuild service, read DTOs for observation/type/diff/health rows, and Dashboard archive-cache status wiring are in place | Move remaining export/report paths to cache DTOs and add targeted partial invalidation |
30 32
 | SwiftData cache | Exists; test builds now reset legacy prototype UI/archive/cache stores once for archive v2 so old SwiftData-only snapshots are not treated as backed-up observations. Metric timeout calibration, local device profile settings, operation logging, ContentView preview, Settings data maintenance, legacy detail/PDF views, unused legacy repair/observer services, Dashboard view/view-model access, and legacy anomaly/count-drop review have moved outside SwiftData or been removed. Remaining SwiftData imports are inventoried in [`SwiftData-Retirement-Inventory.md`](SwiftData-Retirement-Inventory.md) | Treat as disposable prototype data; stop returning/storing `HealthSnapshot` bridge handles before removing `ModelContainer` |
@@ -39,11 +41,13 @@ There are no real deployments, only test installations. Existing prototype datab
39 41
 Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
40 42
 Import performance iterations and measured reports live in [`Import-Optimization-Log.md`](Import-Optimization-Log.md).
41 43
 
42
-1. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
43
-2. Add targeted cache invalidation for affected observation/type ranges.
44
-3. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
45
-4. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
46
-5. Remove SwiftData dependency and validate lower deployment targets.
44
+1. Run full dataset discovery/coverage with the expanded HealthKit registry and
45
+   document unsupported/unauthorized/schema-limited types.
46
+2. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
47
+3. Add targeted cache invalidation for affected observation/type ranges.
48
+4. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
49
+5. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
50
+6. Remove SwiftData dependency and validate lower deployment targets.
47 51
 
48 52
 ## Known Prototype Mismatches
49 53
 
@@ -54,6 +58,9 @@ Import performance iterations and measured reports live in [`Import-Optimization
54 58
 - Legacy SwiftData-only snapshots are reset for archive v2 test installs rather than migrated.
55 59
 - Capture strategy and some legacy SwiftData transition paths may still decode or cache too much data for low-end devices.
56 60
 - Very large first-run HealthKit imports may still require adaptive paging, retryable partial progress, and background-friendly collection beyond the current smaller pages, chunked persistence, and prepared-statement reuse. Diagnostic import reports now also expose explicit per-metric and aggregate fetch / processing / insert / finalize timings so large import runs can be compared without inferring phases from progress counters.
61
+- The currently validated performance data comes from the original 15-type v1
62
+  profile. It is not enough to prove that archive-v2 import/storage works for
63
+  the full HealthKit-accessible dataset.
57 64
 - Old prototype database compatibility is no longer required.
58 65
 - Initial SQLite archive tests cover open/init/reset/idempotency, snapshot-level observation grouping, legacy mirror removal, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, consolidation-evidence labels, export preview, paged JSON output, and manifest row persistence.
59 66
 - Initial Core Data cache tests cover full rebuild from SQLite and delete-cache-then-rebuild without losing archive data.
+19 -11
HealthProbe/Doc/04-project/Import-Optimization-Log.md
@@ -573,6 +573,7 @@ rows exist".
573 573
 | 2026-06-03 | `2a82f67` | Load snapshot/type detail UI from SQLite materialized summaries instead of Core Data cache. | Triggered by successful snapshots whose detail screens showed no data types after automatic cache rebuild was disabled. Expected signal: Snapshot detail, Data Types, per-type drilldown, and evolution chart show current archive details without rebuilding Core Data cache. |
574 574
 | 2026-06-03 | `ec7ee29` | Add explicit loading states for Dashboard, Snapshots, and Data Types archive rows. | Triggered by false "no observations/no snapshots/not enough data" states during the first few seconds after app launch. Expected signal: startup shows loading state until SQLite rows are available, then shows real archive data without flicker. |
575 575
 | 2026-06-03 | `e231eaf` | Use the HealthKit registry as SQLite sample type display-name fallback. | Triggered by Snapshot detail showing raw identifiers such as `HKCategoryTypeIdentifierAppleStandHour` after UI moved from Core Data cache to SQLite summaries. Expected signal: existing and new archive rows show human-readable names such as `Stand Hours` without requiring reset/reimport. |
576
+| 2026-06-03 | `5fafcdd` | Expand the HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default. | Triggered by the decision that import/storage cannot be considered complete based only on the restricted v1 dataset. Expected signal: Settings/authorization can expose a much broader quantity/category/workout catalog, unsupported types are explicit, and real-device coverage reports can measure full authorized backup volume. |
576 577
 
577 578
 ## Current Diagnosis
578 579
 
@@ -604,6 +605,9 @@ The likely bottleneck is per-row SQLite work:
604 605
   archive finalization and should be the primary UI source for fresh snapshots.
605 606
 - UI state should distinguish loading from empty results. A nil or empty in-memory
606 607
   row list during app launch is not evidence that the archive is empty.
608
+- The validated import metrics are based on the original 15-type profile. The
609
+  next correctness/performance question is full-dataset coverage and volume, not
610
+  further confidence from the restricted sample alone.
607 611
 
608 612
 ## Open Issues / Observations
609 613
 
@@ -624,24 +628,28 @@ The likely bottleneck is per-row SQLite work:
624 628
 
625 629
 Prioritize experiments in this order:
626 630
 
627
-1. Run an incremental snapshot after removing automatic Core Data cache rebuild.
631
+1. Run full-dataset discovery with the expanded registry:
632
+   request/refresh permissions, inspect supported vs unsupported types, and run
633
+   a capture with all desired supported types enabled on a real device. Record
634
+   type count, total records, failed/unsupported/empty types, and phase timings.
635
+2. Run an incremental snapshot after removing automatic Core Data cache rebuild.
628 636
    Confirm there are no `healthKit.detailCache.buildBegin` logs, copying the
629 637
    diagnostic report does not freeze the app, and Dashboard/Snapshots show the
630 638
    latest observation from SQLite. Also verify Snapshot detail and Data Types
631 639
    show per-type summaries without a manual cache rebuild.
632
-2. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
633
-3. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
634
-4. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
635
-5. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
636
-6. Profile whether index maintenance dominates first-import insert cost.
637
-7. Consider a guarded bulk-import mode for first observations:
640
+3. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
641
+4. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
642
+5. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
643
+6. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
644
+7. Profile whether index maintenance dominates first-import insert cost.
645
+8. Consider a guarded bulk-import mode for first observations:
638 646
    - keep archive semantics unchanged;
639 647
    - only relax work that can be safely reconstructed or validated;
640 648
    - re-enable normal idempotent paths for incremental observations.
641
-8. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
642
-9. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
643
-10. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
644
-11. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
649
+9. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
650
+10. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
651
+11. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
652
+12. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
645 653
 
646 654
 ## Verification Checklist For Each Optimization
647 655
 
+2 -1
HealthProbe/Doc/README.md
@@ -7,7 +7,7 @@ This directory is the only place for substantive HealthProbe documentation. Root
7 7
 ## Current Product Direction
8 8
 
9 9
 HealthProbe is a single-device, local Health DB Time Machine:
10
-- capture selected HealthKit-accessible observations over time;
10
+- capture HealthKit-accessible observations over time, aiming for full authorized backup in v2;
11 11
 - reconstruct how the local Health database looked at a chosen observation date;
12 12
 - explain local changes with consolidation-aware labels;
13 13
 - preserve recovery-compatible archives and exports;
@@ -19,6 +19,7 @@ Target storage architecture:
19 19
 - Core Data is the rebuildable UI/report cache for expensive counts and summaries;
20 20
 - SwiftData is legacy/prototype only and should not be expanded;
21 21
 - existing prototype/test databases are disposable and may be reset for archive v2.
22
+- the original 15-type capture profile is a v1/test subset, not proof that v2 import/storage is complete.
22 23
 
23 24
 ## How To Point Agents
24 25