Document full dataset discovery direction · 12fba2d

Document full dataset discovery direction
Browse files

bogdan committed 4 days ago

main

1 parent 5fafcdd

commit 12fba2d

Showing 6 changed files with 97 additions and 27 deletions

+23 -2

HealthProbe/Doc/01-product/Product-Specification.md

@@ -38,12 +38,30 @@ Because of this, record-by-record cross-device comparison is out of scope and co
 ### 2.3 Current Objective
 
 HealthProbe is now a single-device local Health DB Time Machine:
-- capture selected HealthKit-accessible data as it exists at observation time
+- capture HealthKit-accessible data as it exists at observation time
 - reconstruct how the local Health database looked at a chosen date
 - show additions, removals, representation changes, and aggregate changes between observations
 - preserve local evidence that HealthKit may later aggregate or no longer export
 - export scoped historical views for personal backup, support, research, or external analysis
 
+### 2.3.1 Full Authorized Archive Direction
+
+The 15-type capture profile used during the first archive-v2 refactor is a v1
+test/profile constraint, not the v2 product objective.
+
+For v2, HealthProbe should aim to archive every HealthKit sample type that all
+of the following allow:
+- HealthKit exposes the type through public read APIs on the current OS/device;
+- the user grants read permission;
+- the current archive schema can preserve the type without losing essential
+  value, date, source, metadata, or relationship information.
+
+The app may offer exclusions for privacy, performance, and device constraints,
+but the architectural default is "backup all authorized HealthKit-accessible
+data", not "backup only a small monitored subset". Unsupported, unauthorized,
+excluded, or schema-limited types must be reported explicitly in coverage and
+export manifests.
+
 ### 2.4 Interpretation Model
 
 HealthProbe describes changes neutrally:
@@ -309,7 +327,10 @@ The archive must preserve as much HealthKit information as the API exposes:
 - first-seen / last-seen / last-verified observations
 - fingerprints suitable for matching against Apple Health XML exports and extracted backup databases
 
-The archive is selected by data type for performance and privacy, but it is stored in **one schema** so later analysis can follow relationships between types.
+The archive may be excluded by data type for performance and privacy, but it is
+stored in **one schema** so later analysis can follow relationships between
+types. Full authorized backup is the v2 direction; scoped type selection is a
+control surface and test profile, not the final archive boundary.
 
 ### 6.2 Reports and Point Exports
 


+23 -1

HealthProbe/Doc/02-architecture/Database-Design.md

View

@@ -86,7 +86,7 @@ An observation records:
 - when capture started/ended;
 - app/schema/OS context;
 - timezone context at observation time;
-- selected type registry;
+- requested/authorized/excluded type registry and coverage status;
 - per-type capture quality;
 - HealthKit anchors;
 - events and aggregate changes observed during the capture.
@@ -147,6 +147,28 @@ SQLite stores materialized aggregates because many reports and screens need expe
 
 Aggregates are archive-derived evidence, not the source of truth. They must be rebuildable from sample/version/event tables.
 
+### Full Dataset Discovery
+
+Archive-v2 quality cannot be judged from the original 15-type test profile. The
+storage design must be validated against the complete set of HealthKit sample
+types exposed by the current OS/device and authorized by the user.
+
+HealthProbe therefore needs a full dataset discovery pass before declaring the
+import/store mechanism complete:
+- enumerate all known HealthKit quantity/category/workout types the app can ask
+  to read;
+- record whether each type is supported, unauthorized, excluded, schema-limited,
+  empty, or archived;
+- collect count, earliest/latest date, fetch timing, and import timing per type;
+- identify sample classes or relationships that the current schema cannot yet
+  preserve without loss;
+- keep the v1 15-type profile available only as a benchmark/debug subset.
+
+The v2 archive target is "all authorized HealthKit-accessible data that the
+schema can faithfully preserve", with user exclusions and coverage reporting.
+Unsupported or inaccessible data must be visible in diagnostics and export
+manifests instead of silently disappearing from the backup boundary.
+
 ## 5. Target SQLite Schema
 
 Exact names may evolve, but the shape and constraints should remain.


+17 -6

HealthProbe/Doc/02-architecture/Implementation-Guide.md

View

@@ -29,7 +29,7 @@ HealthProbe is a single-device local archive and time-machine app for HealthKit-
 The implementation must prioritize:
 - point-in-time reconstruction of local HealthKit observations
 - neutral change explanation between observations
-- preservation of selected details before HealthKit aggregation/consolidation makes them unavailable
+- preservation of authorized HealthKit-accessible details before HealthKit aggregation/consolidation makes them unavailable
 - scoped user exports
 - no HealthProbe CloudKit/iCloud sync
 - no cross-device record-by-record comparison
@@ -69,11 +69,22 @@ Use:
 Capture flow:
 1. Resolve the current local device chain ID.
 2. Start one archive observation record for the user-visible capture and keep its id.
-3. For each selected sample type, run anchored queries.
-4. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
-5. Update materialized aggregate tables in SQLite.
-6. Save/rebuild derived Core Data cache rows only after archive writes succeed.
-7. Compute summary/diff caches for UI and reports.
+3. Resolve the capture profile. The v1 profile uses the original tested core
+   types; the v2/full-backup direction uses every HealthKit sample type that is
+   supported, authorized, not user-excluded, and representable by the archive
+   schema.
+4. For each requested sample type, run anchored queries or mark an explicit
+   coverage status when unsupported, unauthorized, excluded, empty, timed out, or
+   schema-limited.
+5. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
+6. Update materialized aggregate tables in SQLite.
+7. Save/rebuild derived Core Data cache rows only after archive writes succeed.
+8. Compute summary/diff caches for UI and reports.
+
+The import/store mechanism is not considered complete until it has been tested
+against the full HealthKit-accessible dataset on real devices. The original
+15-type profile is useful for iteration speed, but it is not representative
+enough to validate archive completeness or worst-case performance.
 
 Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.
 


+13 -6

HealthProbe/Doc/04-project/IMPLEMENTATION_STATUS.md

View

@@ -13,6 +13,8 @@ The product direction has changed. The target architecture is now:
 - Core Data UI/report cache;
 - Time Machine UI and scoped exports;
 - recovery-compatible archive/export format;
+- v2 full authorized HealthKit backup direction, with explicit user exclusions
+  and coverage reporting;
 - no in-app restore, backup patching, or HealthKit re-publication.
 
 Current SwiftData models and anomaly-oriented naming are legacy/prototype implementation details.
@@ -24,7 +26,7 @@ There are no real deployments, only test installations. Existing prototype datab
 | Area | Current Status | Target / Next Work |
 |------|----------------|--------------------|
 | Product docs | Updated | Keep `HealthProbe/Doc/README.md` as canonical index |
-| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, and persists large HealthKit pages in smaller archive chunks while using type-specific import strategies: conservative paging for the heaviest metrics, more aggressive pages/chunks for ordinary metrics, adaptive write chunk sizing, batched deleted-object persistence, explicit task yields, and lower-allocation streaming loops to avoid long monolithic SQLite stalls | Continue moving UI/cache reads to archive-backed observation ids and revisit full checkpoint/resume and background collection separately |
+| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, persists large HealthKit pages in smaller archive chunks while using type-specific import strategies, and has an expanded HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default | Run full dataset discovery/coverage on real devices before declaring import/storage complete; then revisit full checkpoint/resume and background collection |
 | SQLite archive | Archive v2 schema, snapshot-level observation grouping, differential write path, v2 verification/delete bookkeeping, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; the legacy `archive_samples` mirror has been removed, the hot write path now reuses prepared SQLite statements within grouped page writes instead of reparsing the same SQL for every sample, caches repeated sample-type/source/source-revision/device/metadata id lookups within grouped writes, skips redundant visibility close/existence checks when grouped imports create a brand-new sample or payload version, skips follow-up id lookup queries when SQLite confirms new sample/sample-version inserts, reuses verification aggregates instead of rescanning them twice, drives per-type finalize queries from sample-type-filtered sample ids, processes sample rows in a lower-allocation streaming loop, batches same-page deleted-object evidence in one transaction, adds composite indexes for visibility-range and sample-uuid hot lookups, and opens SQLite connections with import-friendly busy timeout / synchronous / temp-store pragmas | Continue moving capture/Dashboard actions to archive/cache DTOs |
 | Core Data cache | Initial programmatic Core Data model, full-cache rebuild service, read DTOs for observation/type/diff/health rows, and Dashboard archive-cache status wiring are in place | Move remaining export/report paths to cache DTOs and add targeted partial invalidation |
 | SwiftData cache | Exists; test builds now reset legacy prototype UI/archive/cache stores once for archive v2 so old SwiftData-only snapshots are not treated as backed-up observations. Metric timeout calibration, local device profile settings, operation logging, ContentView preview, Settings data maintenance, legacy detail/PDF views, unused legacy repair/observer services, Dashboard view/view-model access, and legacy anomaly/count-drop review have moved outside SwiftData or been removed. Remaining SwiftData imports are inventoried in [`SwiftData-Retirement-Inventory.md`](SwiftData-Retirement-Inventory.md) | Treat as disposable prototype data; stop returning/storing `HealthSnapshot` bridge handles before removing `ModelContainer` |
@@ -39,11 +41,13 @@ There are no real deployments, only test installations. Existing prototype datab
 Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
 Import performance iterations and measured reports live in [`Import-Optimization-Log.md`](Import-Optimization-Log.md).
 
-1. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
-2. Add targeted cache invalidation for affected observation/type ranges.
-3. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
-4. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
-5. Remove SwiftData dependency and validate lower deployment targets.
+1. Run full dataset discovery/coverage with the expanded HealthKit registry and
+   document unsupported/unauthorized/schema-limited types.
+2. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
+3. Add targeted cache invalidation for affected observation/type ranges.
+4. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
+5. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
+6. Remove SwiftData dependency and validate lower deployment targets.
 
 ## Known Prototype Mismatches
 
@@ -54,6 +58,9 @@ Import performance iterations and measured reports live in [`Import-Optimization
 - Legacy SwiftData-only snapshots are reset for archive v2 test installs rather than migrated.
 - Capture strategy and some legacy SwiftData transition paths may still decode or cache too much data for low-end devices.
 - Very large first-run HealthKit imports may still require adaptive paging, retryable partial progress, and background-friendly collection beyond the current smaller pages, chunked persistence, and prepared-statement reuse. Diagnostic import reports now also expose explicit per-metric and aggregate fetch / processing / insert / finalize timings so large import runs can be compared without inferring phases from progress counters.
+- The currently validated performance data comes from the original 15-type v1
+  profile. It is not enough to prove that archive-v2 import/storage works for
+  the full HealthKit-accessible dataset.
 - Old prototype database compatibility is no longer required.
 - Initial SQLite archive tests cover open/init/reset/idempotency, snapshot-level observation grouping, legacy mirror removal, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, consolidation-evidence labels, export preview, paged JSON output, and manifest row persistence.
 - Initial Core Data cache tests cover full rebuild from SQLite and delete-cache-then-rebuild without losing archive data.


+19 -11

HealthProbe/Doc/04-project/Import-Optimization-Log.md

View

@@ -573,6 +573,7 @@ rows exist".
 | 2026-06-03 | `2a82f67` | Load snapshot/type detail UI from SQLite materialized summaries instead of Core Data cache. | Triggered by successful snapshots whose detail screens showed no data types after automatic cache rebuild was disabled. Expected signal: Snapshot detail, Data Types, per-type drilldown, and evolution chart show current archive details without rebuilding Core Data cache. |
 | 2026-06-03 | `ec7ee29` | Add explicit loading states for Dashboard, Snapshots, and Data Types archive rows. | Triggered by false "no observations/no snapshots/not enough data" states during the first few seconds after app launch. Expected signal: startup shows loading state until SQLite rows are available, then shows real archive data without flicker. |
 | 2026-06-03 | `e231eaf` | Use the HealthKit registry as SQLite sample type display-name fallback. | Triggered by Snapshot detail showing raw identifiers such as `HKCategoryTypeIdentifierAppleStandHour` after UI moved from Core Data cache to SQLite summaries. Expected signal: existing and new archive rows show human-readable names such as `Stand Hours` without requiring reset/reimport. |
+| 2026-06-03 | `5fafcdd` | Expand the HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default. | Triggered by the decision that import/storage cannot be considered complete based only on the restricted v1 dataset. Expected signal: Settings/authorization can expose a much broader quantity/category/workout catalog, unsupported types are explicit, and real-device coverage reports can measure full authorized backup volume. |
 
 ## Current Diagnosis
 
@@ -604,6 +605,9 @@ The likely bottleneck is per-row SQLite work:
   archive finalization and should be the primary UI source for fresh snapshots.
 - UI state should distinguish loading from empty results. A nil or empty in-memory
   row list during app launch is not evidence that the archive is empty.
+- The validated import metrics are based on the original 15-type profile. The
+  next correctness/performance question is full-dataset coverage and volume, not
+  further confidence from the restricted sample alone.
 
 ## Open Issues / Observations
 
@@ -624,24 +628,28 @@ The likely bottleneck is per-row SQLite work:
 
 Prioritize experiments in this order:
 
-1. Run an incremental snapshot after removing automatic Core Data cache rebuild.
+1. Run full-dataset discovery with the expanded registry:
+   request/refresh permissions, inspect supported vs unsupported types, and run
+   a capture with all desired supported types enabled on a real device. Record
+   type count, total records, failed/unsupported/empty types, and phase timings.
+2. Run an incremental snapshot after removing automatic Core Data cache rebuild.
    Confirm there are no `healthKit.detailCache.buildBegin` logs, copying the
    diagnostic report does not freeze the app, and Dashboard/Snapshots show the
    latest observation from SQLite. Also verify Snapshot detail and Data Types
    show per-type summaries without a manual cache rebuild.
-2. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
-3. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
-4. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
-5. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
-6. Profile whether index maintenance dominates first-import insert cost.
-7. Consider a guarded bulk-import mode for first observations:
+3. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
+4. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
+5. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
+6. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
+7. Profile whether index maintenance dominates first-import insert cost.
+8. Consider a guarded bulk-import mode for first observations:
    - keep archive semantics unchanged;
    - only relax work that can be safely reconstructed or validated;
    - re-enable normal idempotent paths for incremental observations.
-8. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
-9. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
-10. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
-11. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
+9. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
+10. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
+11. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
+12. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
 
 ## Verification Checklist For Each Optimization
 


+2 -1

HealthProbe/Doc/README.md

View

@@ -7,7 +7,7 @@ This directory is the only place for substantive HealthProbe documentation. Root
 ## Current Product Direction
 
 HealthProbe is a single-device, local Health DB Time Machine:
-- capture selected HealthKit-accessible observations over time;
+- capture HealthKit-accessible observations over time, aiming for full authorized backup in v2;
 - reconstruct how the local Health database looked at a chosen observation date;
 - explain local changes with consolidation-aware labels;
 - preserve recovery-compatible archives and exports;
@@ -19,6 +19,7 @@ Target storage architecture:
 - Core Data is the rebuildable UI/report cache for expensive counts and summaries;
 - SwiftData is legacy/prototype only and should not be expanded;
 - existing prototype/test databases are disposable and may be reset for archive v2.
+- the original 15-type capture profile is a v1/test subset, not proof that v2 import/storage is complete.
 
 ## How To Point Agents
 


	@@ -86,7 +86,7 @@ An observation records:
86	86	- when capture started/ended;
87	87	- app/schema/OS context;
88	88	- timezone context at observation time;
89		-- selected type registry;
	89	+- requested/authorized/excluded type registry and coverage status;
90	90	- per-type capture quality;
91	91	- HealthKit anchors;
92	92	- events and aggregate changes observed during the capture.
	@@ -147,6 +147,28 @@ SQLite stores materialized aggregates because many reports and screens need expe
147	147
148	148	Aggregates are archive-derived evidence, not the source of truth. They must be rebuildable from sample/version/event tables.
149	149
	150	+### Full Dataset Discovery
	151	+
	152	+Archive-v2 quality cannot be judged from the original 15-type test profile. The
	153	+storage design must be validated against the complete set of HealthKit sample
	154	+types exposed by the current OS/device and authorized by the user.
	155	+
	156	+HealthProbe therefore needs a full dataset discovery pass before declaring the
	157	+import/store mechanism complete:
	158	+- enumerate all known HealthKit quantity/category/workout types the app can ask
	159	+ to read;
	160	+- record whether each type is supported, unauthorized, excluded, schema-limited,
	161	+ empty, or archived;
	162	+- collect count, earliest/latest date, fetch timing, and import timing per type;
	163	+- identify sample classes or relationships that the current schema cannot yet
	164	+ preserve without loss;
	165	+- keep the v1 15-type profile available only as a benchmark/debug subset.
	166	+
	167	+The v2 archive target is "all authorized HealthKit-accessible data that the
	168	+schema can faithfully preserve", with user exclusions and coverage reporting.
	169	+Unsupported or inaccessible data must be visible in diagnostics and export
	170	+manifests instead of silently disappearing from the backup boundary.
	171	+
150	172	## 5. Target SQLite Schema
151	173
152	174	Exact names may evolve, but the shape and constraints should remain.

	@@ -573,6 +573,7 @@ rows exist".
573	573	\| 2026-06-03 \| `2a82f67` \| Load snapshot/type detail UI from SQLite materialized summaries instead of Core Data cache. \| Triggered by successful snapshots whose detail screens showed no data types after automatic cache rebuild was disabled. Expected signal: Snapshot detail, Data Types, per-type drilldown, and evolution chart show current archive details without rebuilding Core Data cache. \|
574	574	\| 2026-06-03 \| `ec7ee29` \| Add explicit loading states for Dashboard, Snapshots, and Data Types archive rows. \| Triggered by false "no observations/no snapshots/not enough data" states during the first few seconds after app launch. Expected signal: startup shows loading state until SQLite rows are available, then shows real archive data without flicker. \|
575	575	\| 2026-06-03 \| `e231eaf` \| Use the HealthKit registry as SQLite sample type display-name fallback. \| Triggered by Snapshot detail showing raw identifiers such as `HKCategoryTypeIdentifierAppleStandHour` after UI moved from Core Data cache to SQLite summaries. Expected signal: existing and new archive rows show human-readable names such as `Stand Hours` without requiring reset/reimport. \|
	576	+\| 2026-06-03 \| `5fafcdd` \| Expand the HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default. \| Triggered by the decision that import/storage cannot be considered complete based only on the restricted v1 dataset. Expected signal: Settings/authorization can expose a much broader quantity/category/workout catalog, unsupported types are explicit, and real-device coverage reports can measure full authorized backup volume. \|
576	577
577	578	## Current Diagnosis
578	579
	@@ -604,6 +605,9 @@ The likely bottleneck is per-row SQLite work:
604	605	archive finalization and should be the primary UI source for fresh snapshots.
605	606	- UI state should distinguish loading from empty results. A nil or empty in-memory
606	607	row list during app launch is not evidence that the archive is empty.
	608	+- The validated import metrics are based on the original 15-type profile. The
	609	+ next correctness/performance question is full-dataset coverage and volume, not
	610	+ further confidence from the restricted sample alone.
607	611
608	612	## Open Issues / Observations
609	613
	@@ -624,24 +628,28 @@ The likely bottleneck is per-row SQLite work:
624	628
625	629	Prioritize experiments in this order:
626	630
627		-1. Run an incremental snapshot after removing automatic Core Data cache rebuild.
	631	+1. Run full-dataset discovery with the expanded registry:
	632	+ request/refresh permissions, inspect supported vs unsupported types, and run
	633	+ a capture with all desired supported types enabled on a real device. Record
	634	+ type count, total records, failed/unsupported/empty types, and phase timings.
	635	+2. Run an incremental snapshot after removing automatic Core Data cache rebuild.
628	636	Confirm there are no `healthKit.detailCache.buildBegin` logs, copying the
629	637	diagnostic report does not freeze the app, and Dashboard/Snapshots show the
630	638	latest observation from SQLite. Also verify Snapshot detail and Data Types
631	639	show per-type summaries without a manual cache rebuild.
632		-2. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
633		-3. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
634		-4. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
635		-5. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
636		-6. Profile whether index maintenance dominates first-import insert cost.
637		-7. Consider a guarded bulk-import mode for first observations:
	640	+3. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock.
	641	+4. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks.
	642	+5. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
	643	+6. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
	644	+7. Profile whether index maintenance dominates first-import insert cost.
	645	+8. Consider a guarded bulk-import mode for first observations:
638	646	- keep archive semantics unchanged;
639	647	- only relax work that can be safely reconstructed or validated;
640	648	- re-enable normal idempotent paths for incremental observations.
641		-8. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
642		-9. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
643		-10. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
644		-11. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
	649	+9. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
	650	+10. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
	651	+11. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
	652	+12. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
645	653
646	654	## Verification Checklist For Each Optimization
647	655

	@@ -38,12 +38,30 @@ Because of this, record-by-record cross-device comparison is out of scope and co
38	38	### 2.3 Current Objective
39	39
40	40	HealthProbe is now a single-device local Health DB Time Machine:
41		-- capture selected HealthKit-accessible data as it exists at observation time
	41	+- capture HealthKit-accessible data as it exists at observation time
42	42	- reconstruct how the local Health database looked at a chosen date
43	43	- show additions, removals, representation changes, and aggregate changes between observations
44	44	- preserve local evidence that HealthKit may later aggregate or no longer export
45	45	- export scoped historical views for personal backup, support, research, or external analysis
46	46
	47	+### 2.3.1 Full Authorized Archive Direction
	48	+
	49	+The 15-type capture profile used during the first archive-v2 refactor is a v1
	50	+test/profile constraint, not the v2 product objective.
	51	+
	52	+For v2, HealthProbe should aim to archive every HealthKit sample type that all
	53	+of the following allow:
	54	+- HealthKit exposes the type through public read APIs on the current OS/device;
	55	+- the user grants read permission;
	56	+- the current archive schema can preserve the type without losing essential
	57	+ value, date, source, metadata, or relationship information.
	58	+
	59	+The app may offer exclusions for privacy, performance, and device constraints,
	60	+but the architectural default is "backup all authorized HealthKit-accessible
	61	+data", not "backup only a small monitored subset". Unsupported, unauthorized,
	62	+excluded, or schema-limited types must be reported explicitly in coverage and
	63	+export manifests.
	64	+
47	65	### 2.4 Interpretation Model
48	66
49	67	HealthProbe describes changes neutrally:
	@@ -309,7 +327,10 @@ The archive must preserve as much HealthKit information as the API exposes:
309	327	- first-seen / last-seen / last-verified observations
310	328	- fingerprints suitable for matching against Apple Health XML exports and extracted backup databases
311	329
312		-The archive is selected by data type for performance and privacy, but it is stored in one schema so later analysis can follow relationships between types.
	330	+The archive may be excluded by data type for performance and privacy, but it is
	331	+stored in one schema so later analysis can follow relationships between
	332	+types. Full authorized backup is the v2 direction; scoped type selection is a
	333	+control surface and test profile, not the final archive boundary.
313	334
314	335	### 6.2 Reports and Point Exports
315	336

	@@ -29,7 +29,7 @@ HealthProbe is a single-device local archive and time-machine app for HealthKit-
29	29	The implementation must prioritize:
30	30	- point-in-time reconstruction of local HealthKit observations
31	31	- neutral change explanation between observations
32		-- preservation of selected details before HealthKit aggregation/consolidation makes them unavailable
	32	+- preservation of authorized HealthKit-accessible details before HealthKit aggregation/consolidation makes them unavailable
33	33	- scoped user exports
34	34	- no HealthProbe CloudKit/iCloud sync
35	35	- no cross-device record-by-record comparison
	@@ -69,11 +69,22 @@ Use:
69	69	Capture flow:
70	70	1. Resolve the current local device chain ID.
71	71	2. Start one archive observation record for the user-visible capture and keep its id.
72		-3. For each selected sample type, run anchored queries.
73		-4. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
74		-5. Update materialized aggregate tables in SQLite.
75		-6. Save/rebuild derived Core Data cache rows only after archive writes succeed.
76		-7. Compute summary/diff caches for UI and reports.
	72	+3. Resolve the capture profile. The v1 profile uses the original tested core
	73	+ types; the v2/full-backup direction uses every HealthKit sample type that is
	74	+ supported, authorized, not user-excluded, and representable by the archive
	75	+ schema.
	76	+4. For each requested sample type, run anchored queries or mark an explicit
	77	+ coverage status when unsupported, unauthorized, excluded, empty, timed out, or
	78	+ schema-limited.
	79	+5. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id.
	80	+6. Update materialized aggregate tables in SQLite.
	81	+7. Save/rebuild derived Core Data cache rows only after archive writes succeed.
	82	+8. Compute summary/diff caches for UI and reports.
	83	+
	84	+The import/store mechanism is not considered complete until it has been tested
	85	+against the full HealthKit-accessible dataset on real devices. The original
	86	+15-type profile is useful for iteration speed, but it is not representative
	87	+enough to validate archive completeness or worst-case performance.
77	88
78	89	Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.
79	90

	@@ -13,6 +13,8 @@ The product direction has changed. The target architecture is now:
13	13	- Core Data UI/report cache;
14	14	- Time Machine UI and scoped exports;
15	15	- recovery-compatible archive/export format;
	16	+- v2 full authorized HealthKit backup direction, with explicit user exclusions
	17	+ and coverage reporting;
16	18	- no in-app restore, backup patching, or HealthKit re-publication.
17	19
18	20	Current SwiftData models and anomaly-oriented naming are legacy/prototype implementation details.
	@@ -24,7 +26,7 @@ There are no real deployments, only test installations. Existing prototype datab
24	26	\| Area \| Current Status \| Target / Next Work \|
25	27	\|------\|----------------\|--------------------\|
26	28	\| Product docs \| Updated \| Keep `HealthProbe/Doc/README.md` as canonical index \|
27		-\| HealthKit capture \| Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, and persists large HealthKit pages in smaller archive chunks while using type-specific import strategies: conservative paging for the heaviest metrics, more aggressive pages/chunks for ordinary metrics, adaptive write chunk sizing, batched deleted-object persistence, explicit task yields, and lower-allocation streaming loops to avoid long monolithic SQLite stalls \| Continue moving UI/cache reads to archive-backed observation ids and revisit full checkpoint/resume and background collection separately \|
	29	+\| HealthKit capture \| Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, persists large HealthKit pages in smaller archive chunks while using type-specific import strategies, and has an expanded HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default \| Run full dataset discovery/coverage on real devices before declaring import/storage complete; then revisit full checkpoint/resume and background collection \|
28	30	\| SQLite archive \| Archive v2 schema, snapshot-level observation grouping, differential write path, v2 verification/delete bookkeeping, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; the legacy `archive_samples` mirror has been removed, the hot write path now reuses prepared SQLite statements within grouped page writes instead of reparsing the same SQL for every sample, caches repeated sample-type/source/source-revision/device/metadata id lookups within grouped writes, skips redundant visibility close/existence checks when grouped imports create a brand-new sample or payload version, skips follow-up id lookup queries when SQLite confirms new sample/sample-version inserts, reuses verification aggregates instead of rescanning them twice, drives per-type finalize queries from sample-type-filtered sample ids, processes sample rows in a lower-allocation streaming loop, batches same-page deleted-object evidence in one transaction, adds composite indexes for visibility-range and sample-uuid hot lookups, and opens SQLite connections with import-friendly busy timeout / synchronous / temp-store pragmas \| Continue moving capture/Dashboard actions to archive/cache DTOs \|
29	31	\| Core Data cache \| Initial programmatic Core Data model, full-cache rebuild service, read DTOs for observation/type/diff/health rows, and Dashboard archive-cache status wiring are in place \| Move remaining export/report paths to cache DTOs and add targeted partial invalidation \|
30	32	\| SwiftData cache \| Exists; test builds now reset legacy prototype UI/archive/cache stores once for archive v2 so old SwiftData-only snapshots are not treated as backed-up observations. Metric timeout calibration, local device profile settings, operation logging, ContentView preview, Settings data maintenance, legacy detail/PDF views, unused legacy repair/observer services, Dashboard view/view-model access, and legacy anomaly/count-drop review have moved outside SwiftData or been removed. Remaining SwiftData imports are inventoried in [`SwiftData-Retirement-Inventory.md`](SwiftData-Retirement-Inventory.md) \| Treat as disposable prototype data; stop returning/storing `HealthSnapshot` bridge handles before removing `ModelContainer` \|
	@@ -39,11 +41,13 @@ There are no real deployments, only test installations. Existing prototype datab
39	41	Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
40	42	Import performance iterations and measured reports live in [`Import-Optimization-Log.md`](Import-Optimization-Log.md).
41	43
42		-1. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
43		-2. Add targeted cache invalidation for affected observation/type ranges.
44		-3. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
45		-4. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
46		-5. Remove SwiftData dependency and validate lower deployment targets.
	44	+1. Run full dataset discovery/coverage with the expanded HealthKit registry and
	45	+ document unsupported/unauthorized/schema-limited types.
	46	+2. Stop writing prototype `HealthSnapshot` bridge rows during capture/review.
	47	+3. Add targeted cache invalidation for affected observation/type ranges.
	48	+4. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows.
	49	+5. Complete recovery-compatible export metadata, CSV output, and reproducibility checks.
	50	+6. Remove SwiftData dependency and validate lower deployment targets.
47	51
48	52	## Known Prototype Mismatches
49	53
	@@ -54,6 +58,9 @@ Import performance iterations and measured reports live in [`Import-Optimization
54	58	- Legacy SwiftData-only snapshots are reset for archive v2 test installs rather than migrated.
55	59	- Capture strategy and some legacy SwiftData transition paths may still decode or cache too much data for low-end devices.
56	60	- Very large first-run HealthKit imports may still require adaptive paging, retryable partial progress, and background-friendly collection beyond the current smaller pages, chunked persistence, and prepared-statement reuse. Diagnostic import reports now also expose explicit per-metric and aggregate fetch / processing / insert / finalize timings so large import runs can be compared without inferring phases from progress counters.
	61	+- The currently validated performance data comes from the original 15-type v1
	62	+ profile. It is not enough to prove that archive-v2 import/storage works for
	63	+ the full HealthKit-accessible dataset.
57	64	- Old prototype database compatibility is no longer required.
58	65	- Initial SQLite archive tests cover open/init/reset/idempotency, snapshot-level observation grouping, legacy mirror removal, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, consolidation-evidence labels, export preview, paged JSON output, and manifest row persistence.
59	66	- Initial Core Data cache tests cover full rebuild from SQLite and delete-cache-then-rebuild without losing archive data.

	@@ -7,7 +7,7 @@ This directory is the only place for substantive HealthProbe documentation. Root
7	7	## Current Product Direction
8	8
9	9	HealthProbe is a single-device, local Health DB Time Machine:
10		-- capture selected HealthKit-accessible observations over time;
	10	+- capture HealthKit-accessible observations over time, aiming for full authorized backup in v2;
11	11	- reconstruct how the local Health database looked at a chosen observation date;
12	12	- explain local changes with consolidation-aware labels;
13	13	- preserve recovery-compatible archives and exports;
	@@ -19,6 +19,7 @@ Target storage architecture:
19	19	- Core Data is the rebuildable UI/report cache for expensive counts and summaries;
20	20	- SwiftData is legacy/prototype only and should not be expanded;
21	21	- existing prototype/test databases are disposable and may be reset for archive v2.
	22	+- the original 15-type capture profile is a v1/test subset, not proof that v2 import/storage is complete.
22	23
23	24	## How To Point Agents
24	25