1 contributor
298 lines | 12.698kb

HealthProbe - Database-Led Refactoring Plan

Last Updated: 2026-05-23 Status: Active planning document

Goal

Move HealthProbe from the current SwiftData/snapshot/anomaly prototype toward the target architecture:

  • SQLite archive/analysis database as source of truth;
  • differential observation storage;
  • SQL-first analysis for large datasets;
  • Core Data UI/report cache;
  • recovery-compatible exports;
  • iOS 15-era legacy-device support;
  • Time Machine UI over local observations.
  • destructive reset/reinitialization of prototype/test stores; old database compatibility is not required.

UI refactoring happens after the storage and query foundations exist.

Milestone 0 - Freeze Legacy Direction

Purpose: Stop work from deepening the old architecture.

Checklist: - [ ] Mark SwiftData as legacy/prototype in active implementation tickets. - [ ] Stop adding new SwiftData entities. - [ ] Stop adding features that require recurring complete snapshots. - [ ] Mark existing prototype/test installation data as disposable for archive v2. - [ ] Point all storage agents to ../02-architecture/Database-Design.md. - [ ] Confirm root docs only bootstrap into HealthProbe/Doc/.

Acceptance: - [ ] No active task describes SwiftData as target persistence. - [ ] No active task proposes full periodic snapshot storage. - [ ] No active task requires old prototype-store compatibility. - [ ] HealthProbe/Doc/README.md points DB work to Database-Design.md.

Milestone 1 - Lock Database Decisions

Purpose: Resolve irreversible archive choices before coding schema v2.

Checklist: - [x] Decide timestamp storage convention. - [x] Decide hash/salt/key strategy for source/device identifiers. - [x] Define strict fingerprint foundation. - [x] Define semantic/fuzzy fingerprint policy. - [x] Define timezone policy for daily/monthly aggregate buckets. - [x] Decide whether visibility ranges are maintained eagerly or rebuilt from events. - [x] Define relationship preservation policy for workouts/samples/events. - [x] Record prototype data policy: discard/reset old SwiftData and prototype SQLite stores; no compatibility migration. - [x] Define export manifest canonicalization and hash algorithm.

Acceptance: - [x] Database-Design.md open questions are answered or explicitly deferred. - [x] Schema v2 can be implemented without guessing. - [x] Test-install reset/reinitialization policy is documented. - [x] Privacy implications of identifiers/provenance are documented.

Milestone 2 - Synthetic Large-Data Test Harness

Purpose: Prove the new design can be tested before real HealthKit data is involved.

Checklist: - [ ] Create synthetic observation generator. - [ ] Generate low, medium, and high-volume sample sets. - [ ] Include appeared/disappeared/representationChanged scenarios. - [ ] Include consolidation-like high-frequency thinning scenarios. - [ ] Include source/device/metadata variation. - [ ] Include relationship fixtures. - [ ] Add memory/performance measurement for large diff/export operations.

Acceptance: - [ ] Tests can create a large synthetic archive without real health data. - [ ] Large diff test does not require loading all records into Swift arrays. - [ ] Export test streams/pages output. - [ ] Fixtures contain no personal, device, location, or real health data.

Milestone 3 - SQLite Archive V2 Schema

Purpose: Create the new archive foundation.

Checklist: - [x] Implement schema_migrations. - [x] Implement archive_metadata. - [x] Implement device_chains. - [x] Implement observations. - [x] Implement sample_types. - [x] Implement observation_type_runs. - [x] Implement sources. - [x] Implement source_revisions. - [x] Implement hk_devices. - [x] Implement metadata_blobs. - [x] Implement samples. - [x] Implement sample_versions. - [x] Implement sample_observation_events. - [x] Implement sample_visibility_ranges. - [x] Implement sample_relationships. - [x] Implement observation_type_summaries. - [x] Implement daily_type_aggregates. - [x] Implement export_manifests. - [x] Implement export_items. - [x] Add required indexes. - [x] Add archive integrity report for schema version, required tables, PRAGMA integrity_check, and foreign keys. - [x] Add SQLite integrity/open/schema-version tests.

Acceptance: - [x] Fresh archive initializes successfully. - [x] Schema version is recorded. - [x] Archive v2 can initialize after old prototype stores are removed or ignored. - [x] PRAGMA integrity_check passes. - [x] Required indexes exist. - [x] Empty archive queries return valid empty results.

Milestone 4 - Differential Write Path

Purpose: Write observations without storing full recurring snapshots.

Checklist: - [x] Create observation transaction wrapper. - [x] Upsert sample types. - [x] Upsert source/source revision/device/metadata rows. - [x] Upsert sample identity. - [x] Upsert sample payload version only when payload changes. - [x] Insert appeared/verified/representationChanged events. - [x] Record HKDeletedObject evidence by UUID hash. - [x] Close visibility ranges for disappeared/deleted samples. - [x] Maintain open visibility ranges for visible samples. - [x] Rebuild/update affected type summaries and daily aggregates after capture/delete observations. - [x] Commit SQLite before Core Data/cache work. - [x] Make repeated capture page writes idempotent. - [x] Stop writing the legacy archive_samples mirror during capture. - [x] Move verification/delete bookkeeping to archive v2 tables. - [x] Remove remaining archive_samples schema/update remnants.

Acceptance: - [x] Initial import stores identities and versions once. - [x] Re-running same page does not duplicate sample identities or payload versions. - [x] Representation change creates a new version, not a new logical sample. - [x] Disappearance closes visibility range. - [x] No full observation copy table is created or written.

Milestone 5 - SQL Analysis Layer

Purpose: Make the archive useful without RAM-heavy processing.

Checklist: - [x] Implement point-in-time visible-record query. - [x] Implement paged record table query. - [x] Implement appeared query between observations. - [x] Implement disappeared query between observations. - [x] Implement representationChanged query between observations. - [x] Implement diff counts using temp tables or equivalent SQL-first strategy. - [x] Implement aggregate comparison query. - [x] Implement consolidation-likely evidence query. - [x] Implement source/provenance breakdown query. - [x] Add large synthetic diff/pagination regression. - [x] Add formal query timing/memory metrics on synthetic large datasets.

Acceptance: - [x] Observation T can be reconstructed from ranges/events. - [x] Large diff returns counts and first page without loading all rows. - [x] Query results are deterministic and ordered. - [x] Consolidation evidence includes count, aggregate, coverage, density, and uncertainty data.

Milestone 6 - Core Data UI/Report Cache

Purpose: Cache expensive presentation/report values while keeping SQLite authoritative.

Checklist: - [ ] Define Core Data model for observation rows. - [ ] Define type summary cache entity. - [ ] Define daily/monthly aggregate cache entity. - [ ] Define diff summary cache entity. - [ ] Define export manifest/status cache entity. - [ ] Define archive health/status cache entity. - [ ] Implement cache rebuild from SQLite. - [ ] Implement cache invalidation by archive schema/cache schema/version/hash. - [ ] Implement delete-cache-and-rebuild flow. - [ ] Add cache schema/version and rebuild tests.

Acceptance: - [ ] Deleting Core Data cache does not lose forensic data. - [ ] Cache rebuild restores dashboard/timeline/report summaries. - [ ] Cache rows include source observation ids and archive/cache schema versions. - [ ] SQLite wins on disagreement.

Milestone 7 - Export Layer

Purpose: Produce scoped, recovery-compatible exports.

Checklist: - [ ] Define JSON export envelope. - [ ] Define CSV record-table export. - [ ] Define manifest hash algorithm. - [ ] Include archive/app/schema/observation metadata. - [ ] Include sample identity and payload version hashes. - [ ] Include values/dates/units/type fields. - [ ] Include source/provenance metadata where available and allowed. - [ ] Include relationships where available. - [ ] Include provenance-loss warning for external HealthKit re-publication. - [ ] Stream/page export from SQLite. - [ ] Store export manifest rows. - [ ] Add reproducibility test for export manifests.

Acceptance: - [ ] Large export does not materialize full record set in RAM. - [ ] Export can be verified against archive hashes. - [ ] Export contains enough structure for external recovery/salvage tooling. - [ ] App still does not perform restore, backup patching, or HealthKit re-publication.

Milestone 8 - UI/Data Flow Migration

Purpose: Move UI from prototype storage to target cache/query flow.

Checklist: - [ ] Replace direct SwiftData @Query dependencies for target screens. - [ ] Dashboard reads Core Data cache. - [ ] Observation timeline reads Core Data cache. - [ ] Observation detail uses cached summaries plus paged SQLite DTOs. - [ ] Diff detail uses cached summary plus paged SQLite DTOs. - [ ] Data type screens use target change labels. - [ ] Export preview uses export query/manifest APIs. - [ ] Archive status reflects SQLite/Core Data cache health. - [ ] Legacy/small-device UI mode simplifies heavy visualizations.

Acceptance: - [ ] Core Time Machine flows work without SwiftData as target persistence. - [ ] UI copy uses observation/diff/export language. - [ ] No count-only critical data loss messaging. - [ ] Large record tables are paged. - [ ] Legacy mode preserves capture/report/export.

Milestone 9 - Legacy SwiftData Retirement

Purpose: Remove prototype persistence from the target architecture.

Checklist: - [ ] Identify all remaining SwiftData imports. - [ ] Replace SwiftData models used by active flows. - [ ] Remove/disable ModelContainer as required for target builds. - [ ] Add prototype-store ignore/delete/reset path for test installs. - [ ] Verify no old-store compatibility layer remains in active flows. - [ ] Lower deployment target as far as dependencies allow. - [ ] Verify build for iOS 15-era target constraints.

Acceptance: - [ ] SwiftData is not required for normal app launch. - [ ] Active flows use SQLite + Core Data cache. - [ ] Prototype data handling is explicit: old stores are ignored/deleted/reset for test installs.

Milestone 10 - Acceptance Gate

Purpose: Decide whether the refactor is complete enough to build product features on top.

Checklist: - [ ] Point-in-time reconstruction works. - [ ] Large diff works SQL-first. - [ ] Materialized aggregates can be rebuilt and verified. - [ ] Core Data cache can be deleted and rebuilt. - [ ] Large export streams/pages. - [ ] Recovery-compatible manifest is present. - [ ] SQLite integrity checks pass. - [ ] Low-memory synthetic tests pass. - [ ] UI no longer depends on SwiftData as foundation. - [ ] Docs match implementation.

Acceptance: - [ ] Product can safely proceed to UI polish and higher-level workflows. - [ ] Database is no longer the main unresolved architectural risk.

Parallelization Guide

Can run in parallel after Milestone 1: - synthetic data harness; - schema implementation; - Core Data cache model drafting; - export format drafting; - UI DTO contract design.

Must not run before dependencies: - UI migration before SQL query layer and Core Data cache exist; - export implementation before manifest design is locked; - legacy SwiftData removal before replacement flows exist; - archive v2 initialization before reset/reinitialization policy is documented.

Agent Assignment Hints

Workstream Primary Doc
SQLite schema/write path/query layer ../02-architecture/Database-Design.md
HealthKit capture integration ../02-architecture/Implementation-Guide.md
Core Data cache ../02-architecture/Core-Data-Cache-Design.md
Export formats/manifests ../02-architecture/Export-Specification.md
UI migration ../00-agent-guides/CLAUDE.md
Product language/non-goals ../01-product/MVP-Specification.md
Status updates IMPLEMENTATION_STATUS.md