@@ -38,12 +38,30 @@ Because of this, record-by-record cross-device comparison is out of scope and co |
||
| 38 | 38 |
### 2.3 Current Objective |
| 39 | 39 |
|
| 40 | 40 |
HealthProbe is now a single-device local Health DB Time Machine: |
| 41 |
-- capture selected HealthKit-accessible data as it exists at observation time |
|
| 41 |
+- capture HealthKit-accessible data as it exists at observation time |
|
| 42 | 42 |
- reconstruct how the local Health database looked at a chosen date |
| 43 | 43 |
- show additions, removals, representation changes, and aggregate changes between observations |
| 44 | 44 |
- preserve local evidence that HealthKit may later aggregate or no longer export |
| 45 | 45 |
- export scoped historical views for personal backup, support, research, or external analysis |
| 46 | 46 |
|
| 47 |
+### 2.3.1 Full Authorized Archive Direction |
|
| 48 |
+ |
|
| 49 |
+The 15-type capture profile used during the first archive-v2 refactor is a v1 |
|
| 50 |
+test/profile constraint, not the v2 product objective. |
|
| 51 |
+ |
|
| 52 |
+For v2, HealthProbe should aim to archive every HealthKit sample type that all |
|
| 53 |
+of the following allow: |
|
| 54 |
+- HealthKit exposes the type through public read APIs on the current OS/device; |
|
| 55 |
+- the user grants read permission; |
|
| 56 |
+- the current archive schema can preserve the type without losing essential |
|
| 57 |
+ value, date, source, metadata, or relationship information. |
|
| 58 |
+ |
|
| 59 |
+The app may offer exclusions for privacy, performance, and device constraints, |
|
| 60 |
+but the architectural default is "backup all authorized HealthKit-accessible |
|
| 61 |
+data", not "backup only a small monitored subset". Unsupported, unauthorized, |
|
| 62 |
+excluded, or schema-limited types must be reported explicitly in coverage and |
|
| 63 |
+export manifests. |
|
| 64 |
+ |
|
| 47 | 65 |
### 2.4 Interpretation Model |
| 48 | 66 |
|
| 49 | 67 |
HealthProbe describes changes neutrally: |
@@ -309,7 +327,10 @@ The archive must preserve as much HealthKit information as the API exposes: |
||
| 309 | 327 |
- first-seen / last-seen / last-verified observations |
| 310 | 328 |
- fingerprints suitable for matching against Apple Health XML exports and extracted backup databases |
| 311 | 329 |
|
| 312 |
-The archive is selected by data type for performance and privacy, but it is stored in **one schema** so later analysis can follow relationships between types. |
|
| 330 |
+The archive may be excluded by data type for performance and privacy, but it is |
|
| 331 |
+stored in **one schema** so later analysis can follow relationships between |
|
| 332 |
+types. Full authorized backup is the v2 direction; scoped type selection is a |
|
| 333 |
+control surface and test profile, not the final archive boundary. |
|
| 313 | 334 |
|
| 314 | 335 |
### 6.2 Reports and Point Exports |
| 315 | 336 |
|
@@ -86,7 +86,7 @@ An observation records: |
||
| 86 | 86 |
- when capture started/ended; |
| 87 | 87 |
- app/schema/OS context; |
| 88 | 88 |
- timezone context at observation time; |
| 89 |
-- selected type registry; |
|
| 89 |
+- requested/authorized/excluded type registry and coverage status; |
|
| 90 | 90 |
- per-type capture quality; |
| 91 | 91 |
- HealthKit anchors; |
| 92 | 92 |
- events and aggregate changes observed during the capture. |
@@ -147,6 +147,28 @@ SQLite stores materialized aggregates because many reports and screens need expe |
||
| 147 | 147 |
|
| 148 | 148 |
Aggregates are archive-derived evidence, not the source of truth. They must be rebuildable from sample/version/event tables. |
| 149 | 149 |
|
| 150 |
+### Full Dataset Discovery |
|
| 151 |
+ |
|
| 152 |
+Archive-v2 quality cannot be judged from the original 15-type test profile. The |
|
| 153 |
+storage design must be validated against the complete set of HealthKit sample |
|
| 154 |
+types exposed by the current OS/device and authorized by the user. |
|
| 155 |
+ |
|
| 156 |
+HealthProbe therefore needs a full dataset discovery pass before declaring the |
|
| 157 |
+import/store mechanism complete: |
|
| 158 |
+- enumerate all known HealthKit quantity/category/workout types the app can ask |
|
| 159 |
+ to read; |
|
| 160 |
+- record whether each type is supported, unauthorized, excluded, schema-limited, |
|
| 161 |
+ empty, or archived; |
|
| 162 |
+- collect count, earliest/latest date, fetch timing, and import timing per type; |
|
| 163 |
+- identify sample classes or relationships that the current schema cannot yet |
|
| 164 |
+ preserve without loss; |
|
| 165 |
+- keep the v1 15-type profile available only as a benchmark/debug subset. |
|
| 166 |
+ |
|
| 167 |
+The v2 archive target is "all authorized HealthKit-accessible data that the |
|
| 168 |
+schema can faithfully preserve", with user exclusions and coverage reporting. |
|
| 169 |
+Unsupported or inaccessible data must be visible in diagnostics and export |
|
| 170 |
+manifests instead of silently disappearing from the backup boundary. |
|
| 171 |
+ |
|
| 150 | 172 |
## 5. Target SQLite Schema |
| 151 | 173 |
|
| 152 | 174 |
Exact names may evolve, but the shape and constraints should remain. |
@@ -29,7 +29,7 @@ HealthProbe is a single-device local archive and time-machine app for HealthKit- |
||
| 29 | 29 |
The implementation must prioritize: |
| 30 | 30 |
- point-in-time reconstruction of local HealthKit observations |
| 31 | 31 |
- neutral change explanation between observations |
| 32 |
-- preservation of selected details before HealthKit aggregation/consolidation makes them unavailable |
|
| 32 |
+- preservation of authorized HealthKit-accessible details before HealthKit aggregation/consolidation makes them unavailable |
|
| 33 | 33 |
- scoped user exports |
| 34 | 34 |
- no HealthProbe CloudKit/iCloud sync |
| 35 | 35 |
- no cross-device record-by-record comparison |
@@ -69,11 +69,22 @@ Use: |
||
| 69 | 69 |
Capture flow: |
| 70 | 70 |
1. Resolve the current local device chain ID. |
| 71 | 71 |
2. Start one archive observation record for the user-visible capture and keep its id. |
| 72 |
-3. For each selected sample type, run anchored queries. |
|
| 73 |
-4. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id. |
|
| 74 |
-5. Update materialized aggregate tables in SQLite. |
|
| 75 |
-6. Save/rebuild derived Core Data cache rows only after archive writes succeed. |
|
| 76 |
-7. Compute summary/diff caches for UI and reports. |
|
| 72 |
+3. Resolve the capture profile. The v1 profile uses the original tested core |
|
| 73 |
+ types; the v2/full-backup direction uses every HealthKit sample type that is |
|
| 74 |
+ supported, authorized, not user-excluded, and representable by the archive |
|
| 75 |
+ schema. |
|
| 76 |
+4. For each requested sample type, run anchored queries or mark an explicit |
|
| 77 |
+ coverage status when unsupported, unauthorized, excluded, empty, timed out, or |
|
| 78 |
+ schema-limited. |
|
| 79 |
+5. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id. |
|
| 80 |
+6. Update materialized aggregate tables in SQLite. |
|
| 81 |
+7. Save/rebuild derived Core Data cache rows only after archive writes succeed. |
|
| 82 |
+8. Compute summary/diff caches for UI and reports. |
|
| 83 |
+ |
|
| 84 |
+The import/store mechanism is not considered complete until it has been tested |
|
| 85 |
+against the full HealthKit-accessible dataset on real devices. The original |
|
| 86 |
+15-type profile is useful for iteration speed, but it is not representative |
|
| 87 |
+enough to validate archive completeness or worst-case performance. |
|
| 77 | 88 |
|
| 78 | 89 |
Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth. |
| 79 | 90 |
|
@@ -13,6 +13,8 @@ The product direction has changed. The target architecture is now: |
||
| 13 | 13 |
- Core Data UI/report cache; |
| 14 | 14 |
- Time Machine UI and scoped exports; |
| 15 | 15 |
- recovery-compatible archive/export format; |
| 16 |
+- v2 full authorized HealthKit backup direction, with explicit user exclusions |
|
| 17 |
+ and coverage reporting; |
|
| 16 | 18 |
- no in-app restore, backup patching, or HealthKit re-publication. |
| 17 | 19 |
|
| 18 | 20 |
Current SwiftData models and anomaly-oriented naming are legacy/prototype implementation details. |
@@ -24,7 +26,7 @@ There are no real deployments, only test installations. Existing prototype datab |
||
| 24 | 26 |
| Area | Current Status | Target / Next Work | |
| 25 | 27 |
|------|----------------|--------------------| |
| 26 | 28 |
| Product docs | Updated | Keep `HealthProbe/Doc/README.md` as canonical index | |
| 27 |
-| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, and persists large HealthKit pages in smaller archive chunks while using type-specific import strategies: conservative paging for the heaviest metrics, more aggressive pages/chunks for ordinary metrics, adaptive write chunk sizing, batched deleted-object persistence, explicit task yields, and lower-allocation streaming loops to avoid long monolithic SQLite stalls | Continue moving UI/cache reads to archive-backed observation ids and revisit full checkpoint/resume and background collection separately | |
|
| 29 |
+| HealthKit capture | Capture now opens one archive observation per user-visible snapshot, attaches HealthKit pages, deleted-object evidence, and type verification to that observation id before finishing it, no longer aborts initial full-history imports after a fixed 30-minute wall-clock cap while page-level HealthKit timeouts remain in place, defers grouped observation summary/daily aggregate rebuilds until per-type verification instead of rebuilding after every imported page, persists large HealthKit pages in smaller archive chunks while using type-specific import strategies, and has an expanded HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default | Run full dataset discovery/coverage on real devices before declaring import/storage complete; then revisit full checkpoint/resume and background collection | |
|
| 28 | 30 |
| SQLite archive | Archive v2 schema, snapshot-level observation grouping, differential write path, v2 verification/delete bookkeeping, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; the legacy `archive_samples` mirror has been removed, the hot write path now reuses prepared SQLite statements within grouped page writes instead of reparsing the same SQL for every sample, caches repeated sample-type/source/source-revision/device/metadata id lookups within grouped writes, skips redundant visibility close/existence checks when grouped imports create a brand-new sample or payload version, skips follow-up id lookup queries when SQLite confirms new sample/sample-version inserts, reuses verification aggregates instead of rescanning them twice, drives per-type finalize queries from sample-type-filtered sample ids, processes sample rows in a lower-allocation streaming loop, batches same-page deleted-object evidence in one transaction, adds composite indexes for visibility-range and sample-uuid hot lookups, and opens SQLite connections with import-friendly busy timeout / synchronous / temp-store pragmas | Continue moving capture/Dashboard actions to archive/cache DTOs | |
| 29 | 31 |
| Core Data cache | Initial programmatic Core Data model, full-cache rebuild service, read DTOs for observation/type/diff/health rows, and Dashboard archive-cache status wiring are in place | Move remaining export/report paths to cache DTOs and add targeted partial invalidation | |
| 30 | 32 |
| SwiftData cache | Exists; test builds now reset legacy prototype UI/archive/cache stores once for archive v2 so old SwiftData-only snapshots are not treated as backed-up observations. Metric timeout calibration, local device profile settings, operation logging, ContentView preview, Settings data maintenance, legacy detail/PDF views, unused legacy repair/observer services, Dashboard view/view-model access, and legacy anomaly/count-drop review have moved outside SwiftData or been removed. Remaining SwiftData imports are inventoried in [`SwiftData-Retirement-Inventory.md`](SwiftData-Retirement-Inventory.md) | Treat as disposable prototype data; stop returning/storing `HealthSnapshot` bridge handles before removing `ModelContainer` | |
@@ -39,11 +41,13 @@ There are no real deployments, only test installations. Existing prototype datab |
||
| 39 | 41 |
Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md). |
| 40 | 42 |
Import performance iterations and measured reports live in [`Import-Optimization-Log.md`](Import-Optimization-Log.md). |
| 41 | 43 |
|
| 42 |
-1. Stop writing prototype `HealthSnapshot` bridge rows during capture/review. |
|
| 43 |
-2. Add targeted cache invalidation for affected observation/type ranges. |
|
| 44 |
-3. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows. |
|
| 45 |
-4. Complete recovery-compatible export metadata, CSV output, and reproducibility checks. |
|
| 46 |
-5. Remove SwiftData dependency and validate lower deployment targets. |
|
| 44 |
+1. Run full dataset discovery/coverage with the expanded HealthKit registry and |
|
| 45 |
+ document unsupported/unauthorized/schema-limited types. |
|
| 46 |
+2. Stop writing prototype `HealthSnapshot` bridge rows during capture/review. |
|
| 47 |
+3. Add targeted cache invalidation for affected observation/type ranges. |
|
| 48 |
+4. Finish remaining UI language cleanup from anomaly/status to observation/diff/export where legacy model names still leak into active flows. |
|
| 49 |
+5. Complete recovery-compatible export metadata, CSV output, and reproducibility checks. |
|
| 50 |
+6. Remove SwiftData dependency and validate lower deployment targets. |
|
| 47 | 51 |
|
| 48 | 52 |
## Known Prototype Mismatches |
| 49 | 53 |
|
@@ -54,6 +58,9 @@ Import performance iterations and measured reports live in [`Import-Optimization |
||
| 54 | 58 |
- Legacy SwiftData-only snapshots are reset for archive v2 test installs rather than migrated. |
| 55 | 59 |
- Capture strategy and some legacy SwiftData transition paths may still decode or cache too much data for low-end devices. |
| 56 | 60 |
- Very large first-run HealthKit imports may still require adaptive paging, retryable partial progress, and background-friendly collection beyond the current smaller pages, chunked persistence, and prepared-statement reuse. Diagnostic import reports now also expose explicit per-metric and aggregate fetch / processing / insert / finalize timings so large import runs can be compared without inferring phases from progress counters. |
| 61 |
+- The currently validated performance data comes from the original 15-type v1 |
|
| 62 |
+ profile. It is not enough to prove that archive-v2 import/storage works for |
|
| 63 |
+ the full HealthKit-accessible dataset. |
|
| 57 | 64 |
- Old prototype database compatibility is no longer required. |
| 58 | 65 |
- Initial SQLite archive tests cover open/init/reset/idempotency, snapshot-level observation grouping, legacy mirror removal, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, consolidation-evidence labels, export preview, paged JSON output, and manifest row persistence. |
| 59 | 66 |
- Initial Core Data cache tests cover full rebuild from SQLite and delete-cache-then-rebuild without losing archive data. |
@@ -573,6 +573,7 @@ rows exist". |
||
| 573 | 573 |
| 2026-06-03 | `2a82f67` | Load snapshot/type detail UI from SQLite materialized summaries instead of Core Data cache. | Triggered by successful snapshots whose detail screens showed no data types after automatic cache rebuild was disabled. Expected signal: Snapshot detail, Data Types, per-type drilldown, and evolution chart show current archive details without rebuilding Core Data cache. | |
| 574 | 574 |
| 2026-06-03 | `ec7ee29` | Add explicit loading states for Dashboard, Snapshots, and Data Types archive rows. | Triggered by false "no observations/no snapshots/not enough data" states during the first few seconds after app launch. Expected signal: startup shows loading state until SQLite rows are available, then shows real archive data without flicker. | |
| 575 | 575 |
| 2026-06-03 | `e231eaf` | Use the HealthKit registry as SQLite sample type display-name fallback. | Triggered by Snapshot detail showing raw identifiers such as `HKCategoryTypeIdentifierAppleStandHour` after UI moved from Core Data cache to SQLite summaries. Expected signal: existing and new archive rows show human-readable names such as `Stand Hours` without requiring reset/reimport. | |
| 576 |
+| 2026-06-03 | `5fafcdd` | Expand the HealthKit type registry for full-dataset discovery while keeping the original 15-type profile as the tested default. | Triggered by the decision that import/storage cannot be considered complete based only on the restricted v1 dataset. Expected signal: Settings/authorization can expose a much broader quantity/category/workout catalog, unsupported types are explicit, and real-device coverage reports can measure full authorized backup volume. | |
|
| 576 | 577 |
|
| 577 | 578 |
## Current Diagnosis |
| 578 | 579 |
|
@@ -604,6 +605,9 @@ The likely bottleneck is per-row SQLite work: |
||
| 604 | 605 |
archive finalization and should be the primary UI source for fresh snapshots. |
| 605 | 606 |
- UI state should distinguish loading from empty results. A nil or empty in-memory |
| 606 | 607 |
row list during app launch is not evidence that the archive is empty. |
| 608 |
+- The validated import metrics are based on the original 15-type profile. The |
|
| 609 |
+ next correctness/performance question is full-dataset coverage and volume, not |
|
| 610 |
+ further confidence from the restricted sample alone. |
|
| 607 | 611 |
|
| 608 | 612 |
## Open Issues / Observations |
| 609 | 613 |
|
@@ -624,24 +628,28 @@ The likely bottleneck is per-row SQLite work: |
||
| 624 | 628 |
|
| 625 | 629 |
Prioritize experiments in this order: |
| 626 | 630 |
|
| 627 |
-1. Run an incremental snapshot after removing automatic Core Data cache rebuild. |
|
| 631 |
+1. Run full-dataset discovery with the expanded registry: |
|
| 632 |
+ request/refresh permissions, inspect supported vs unsupported types, and run |
|
| 633 |
+ a capture with all desired supported types enabled on a real device. Record |
|
| 634 |
+ type count, total records, failed/unsupported/empty types, and phase timings. |
|
| 635 |
+2. Run an incremental snapshot after removing automatic Core Data cache rebuild. |
|
| 628 | 636 |
Confirm there are no `healthKit.detailCache.buildBegin` logs, copying the |
| 629 | 637 |
diagnostic report does not freeze the app, and Dashboard/Snapshots show the |
| 630 | 638 |
latest observation from SQLite. Also verify Snapshot detail and Data Types |
| 631 | 639 |
show per-type summaries without a manual cache rebuild. |
| 632 |
-2. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock. |
|
| 633 |
-3. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks. |
|
| 634 |
-4. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`. |
|
| 635 |
-5. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans. |
|
| 636 |
-6. Profile whether index maintenance dominates first-import insert cost. |
|
| 637 |
-7. Consider a guarded bulk-import mode for first observations: |
|
| 640 |
+3. Run a repeated no-delta benchmark after copying unchanged metric summaries and daily aggregates. Compare `SummedFinalizeElapsed`, `Heart Rate finalizeElapsed`, `Active Energy finalizeElapsed`, and wall clock. |
|
| 641 |
+4. Add or inspect timing around per-record processing for changed high-volume metrics, especially Heart Rate, to separate sample DTO/fingerprint work from SQLite idempotency checks. |
|
| 642 |
+5. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`. |
|
| 643 |
+6. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans. |
|
| 644 |
+7. Profile whether index maintenance dominates first-import insert cost. |
|
| 645 |
+8. Consider a guarded bulk-import mode for first observations: |
|
| 638 | 646 |
- keep archive semantics unchanged; |
| 639 | 647 |
- only relax work that can be safely reconstructed or validated; |
| 640 | 648 |
- re-enable normal idempotent paths for incremental observations. |
| 641 |
-8. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 642 |
-9. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 643 |
-10. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 644 |
-11. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 649 |
+9. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`. |
|
| 650 |
+10. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity. |
|
| 651 |
+11. Revisit adaptive page sizes only after SQLite write-path costs are reduced. |
|
| 652 |
+12. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded. |
|
| 645 | 653 |
|
| 646 | 654 |
## Verification Checklist For Each Optimization |
| 647 | 655 |
|
@@ -7,7 +7,7 @@ This directory is the only place for substantive HealthProbe documentation. Root |
||
| 7 | 7 |
## Current Product Direction |
| 8 | 8 |
|
| 9 | 9 |
HealthProbe is a single-device, local Health DB Time Machine: |
| 10 |
-- capture selected HealthKit-accessible observations over time; |
|
| 10 |
+- capture HealthKit-accessible observations over time, aiming for full authorized backup in v2; |
|
| 11 | 11 |
- reconstruct how the local Health database looked at a chosen observation date; |
| 12 | 12 |
- explain local changes with consolidation-aware labels; |
| 13 | 13 |
- preserve recovery-compatible archives and exports; |
@@ -19,6 +19,7 @@ Target storage architecture: |
||
| 19 | 19 |
- Core Data is the rebuildable UI/report cache for expensive counts and summaries; |
| 20 | 20 |
- SwiftData is legacy/prototype only and should not be expanded; |
| 21 | 21 |
- existing prototype/test databases are disposable and may be reset for archive v2. |
| 22 |
+- the original 15-type capture profile is a v1/test subset, not proof that v2 import/storage is complete. |
|
| 22 | 23 |
|
| 23 | 24 |
## How To Point Agents |
| 24 | 25 |
|