Version: 1.6 Last Updated: 2026-05-23 Purpose: Implementation guide for the iOS local Health DB Time Machine
For database schema, archive invariants, SQL analysis patterns, Core Data cache
boundaries, reset policy, and future migration rules, read
Database-Design.md first. This guide describes
implementation workflow and cross-module behavior.
The following rules apply to all code, logs, examples, tests, and documentation:
The app may store a user's HealthKit samples locally on-device when the user grants HealthKit access. Those samples must never be committed to source control or written to diagnostic logs.
HealthProbe is a single-device local archive and time-machine app for HealthKit-accessible data.
The implementation must prioritize: - point-in-time reconstruction of local HealthKit observations - neutral change explanation between observations - preservation of authorized HealthKit-accessible details before HealthKit aggregation/consolidation makes them unavailable - scoped user exports - no HealthProbe CloudKit/iCloud sync - no cross-device record-by-record comparison - iOS 15-era legacy device support; SwiftData is not a target dependency
Record-count drops are not inherently critical. They are evidence to explain with record-level and aggregate context.
HealthProbe has no real deployments at this stage. Existing SwiftData stores and prototype SQLite archives are disposable.
Archive v2 startup behavior:
1. Open the archive path.
2. Read archive_metadata.schema_version when present.
3. If no archive exists, create archive v2.
4. If a prototype/unknown schema exists, close the database, move it to a timestamped *.prototype-backup file for developer inspection, and create a fresh archive v2.
5. Rebuild/delete Core Data cache rows after archive reset.
6. Log reset reason without raw health values.
Do not implement one-way migration from the old prototype schema unless a later dated product decision reverses this policy.
During the SwiftData-to-archive-v2 transition, legacy SwiftData snapshots must not be mixed with new SQLite archive observations. A SwiftData snapshot that predates archive v2 can still produce a UI diff, but its historical records may not exist in the archive and therefore cannot be exported or used as backup evidence. Test builds may destructively reset HealthProbeRecords.store, HealthProbeArchive.sqlite, and HealthProbeCache.sqlite once for this transition. Local settings stores can be preserved.
The manual test reset is scheduled with both a UserDefaults flag and a small
marker file in Application Support. Startup reset must treat either signal as
authoritative, delete the archive/cache/prototype stores and SQLite sidecars,
then clear both signals. This keeps benchmark resets reliable even if a test
device is force-closed immediately after scheduling.
Use:
- HKAnchoredObjectQuery for incremental capture
- HKObserverQuery as a wake-up hint
- manual capture from the app UI
Capture flow: 1. Resolve the current local device chain ID. 2. Start one archive observation record for the user-visible capture and keep its id. 3. Resolve the capture profile. The v1 profile uses the original tested core types; the v2/full-backup direction uses every HealthKit sample type that is supported, authorized, not user-excluded, and representable by the archive schema. 4. For each requested sample type, run anchored queries or mark an explicit coverage status when unsupported, unauthorized, excluded, empty, timed out, or schema-limited. 5. Write HealthKit samples, deleted-object evidence, and final per-type verification to the local archive first, all under that same observation id. 6. Update materialized aggregate tables in SQLite. 7. Save/rebuild derived Core Data cache rows only after archive writes succeed. 8. Compute summary/diff caches for UI and reports.
The import/store mechanism is not considered complete until it has been tested against the full HealthKit-accessible dataset on real devices. The original 15-type profile is useful for iteration speed, but it is not representative enough to validate archive completeness or worst-case performance.
Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.
Observation statuses:
- started;
- partial;
- completed;
- failed;
- cancelled.
Type-run statuses:
- started;
- completed;
- failed;
- unauthorized;
- timed_out.
Rules: - one failed type does not invalidate successfully committed type runs; - incomplete observations are visible as partial evidence, not as proof of disappearance; - anchors are saved only after the corresponding SQLite transaction commits; - UI change labels must include uncertainty when either side of a diff has partial/failed type evidence.
If an anchor is missing, corrupt, or rejected by HealthKit: - mark the type run with anchor failure context; - run a full scan for the affected type when permissions allow; - rebuild current visibility for that type from observed samples and deleted-object evidence; - continue storing future anchors after the full scan succeeds.
The archive store is the source of truth. It should be a robust local SQLite database designed for both storage and analysis.
The canonical database design is Database-Design.md. The summary below is intentionally high-level; do not treat it as a competing schema source.
The archive should support: - one schema for all selected sample types - differential observation storage; do not store complete recurring snapshots - HealthKit UUID hash and internal fingerprints - sample payload versions deduplicated across observations - type identifier, start/end date, value, unit, and category/workout fields - source/source revision metadata - HealthKit metadata dictionaries - device provenance exposed by HealthKit, redacted or hashed as required - first-seen, last-seen, last-verified, and disappeared-at observations - visibility ranges/events for point-in-time reconstruction - observation history sufficient for point-in-time reconstruction - relationship records where HealthKit exposes links between workouts, samples, events, or related records - materialized aggregate tables for expensive counts used by reports/UI - schema versioning, current test-store reset policy, and future migrations - integrity hashes/manifests for exports - indexes, temporary tables, joins, CTEs, and paged result sets for large diffs - recovery-compatible exports for external tooling, preserving record identity, payload versions, provenance metadata where available, relationships, observation history, and manifest hashes
The archive must be able to answer: - records visible at observation T - records that appeared/disappeared between T1 and T2 - records whose representation changed while semantic/aggregate meaning may be preserved - selected records for streaming export
Minimum target schema shape is defined in Database-Design.md. The archive must at least preserve these concepts:
Historical sketch retained for orientation:
-- One row per local capture attempt/result.
CREATE TABLE observations (
id INTEGER PRIMARY KEY,
observed_at REAL NOT NULL,
status TEXT NOT NULL,
app_version TEXT,
os_version TEXT,
device_chain_id TEXT NOT NULL,
schema_version INTEGER NOT NULL
);
-- Stable identity for a HealthKit-accessible record or semantic record.
CREATE TABLE samples (
id INTEGER PRIMARY KEY,
type_identifier TEXT NOT NULL,
sample_uuid_hash TEXT,
strict_fingerprint TEXT NOT NULL,
semantic_fingerprint TEXT,
first_seen_observation_id INTEGER NOT NULL
);
-- Deduplicated payload representation. New row only when representation changes.
CREATE TABLE sample_versions (
id INTEGER PRIMARY KEY,
sample_id INTEGER NOT NULL,
payload_hash TEXT NOT NULL,
start_date REAL NOT NULL,
end_date REAL NOT NULL,
value REAL,
unit TEXT,
source_id INTEGER,
metadata_hash TEXT
);
-- Visibility/event history, not a full snapshot copy.
CREATE TABLE sample_observation_events (
id INTEGER PRIMARY KEY,
observation_id INTEGER NOT NULL,
sample_id INTEGER NOT NULL,
version_id INTEGER,
event_kind TEXT NOT NULL
);
-- Optional compressed visibility ranges for point-in-time reconstruction.
CREATE TABLE sample_visibility_ranges (
sample_id INTEGER NOT NULL,
version_id INTEGER,
first_observation_id INTEGER NOT NULL,
last_observation_id INTEGER,
PRIMARY KEY (sample_id, version_id, first_observation_id)
);
-- Materialized aggregates feeding reports and Core Data cache.
CREATE TABLE daily_type_aggregates (
observation_id INTEGER NOT NULL,
type_identifier TEXT NOT NULL,
bucket_start REAL NOT NULL,
record_count INTEGER NOT NULL,
value_sum REAL,
value_max REAL,
PRIMARY KEY (observation_id, type_identifier, bucket_start)
);
Exact naming can evolve, but the constraints must hold: payloads are deduplicated, observations are differential, and aggregates are materialized.
Detailed entity contracts live in Core-Data-Cache-Design.md.
Core Data is the target derived/cache layer because it supports older iOS versions than SwiftData and is suitable for UI/report state. It may store: - selected data types and app settings - observation list and capture status - precomputed summaries, temporal bins, and diff previews - operation logs and export indexes - change labels and links into the archive - expensive count results used by reports and presentation
Core Data must not be the only forensic copy. If Core Data and the archive disagree, the SQLite archive wins. The cache must be safe to delete and rebuild from SQLite.
Current SwiftData models are legacy/prototype implementation details. New storage work should target Core Data for cache and SQLite for archive/analysis.
Change logic should be evidence-first and consolidation-aware.
Basic diff should execute in SQLite, not by loading full datasets into Swift arrays:
swift
appeared = currentFingerprints.subtracting(previousFingerprints)
disappeared = previousFingerprints.subtracting(currentFingerprints)
retained = currentFingerprints.intersection(previousFingerprints)
Conceptual SQL shape: ```sql CREATE TEMP TABLE prev_visible AS SELECT sample_id, version_id FROM visible_samples WHERE observation_id = :previous;
CREATE TEMP TABLE curr_visible AS SELECT sample_id, version_id FROM visible_samples WHERE observation_id = :current;
SELECT p.sample_id FROM prev_visible p LEFT JOIN curr_visible c ON c.sample_id = p.sample_id WHERE c.sample_id IS NULL; ```
Semantic grouping should compare: - type identifier - start/end coverage - value sum and value max where meaningful - source/source revision - metadata keys relevant to HealthKit interpretation - interval length and sample density
Suggested labels:
- appeared
- disappeared
- representationChanged
- consolidationLikely
- aggregateChanged
- uncertain
Severity should be reserved for user-facing workflow urgency, not treated as proof of corruption. In particular, a high disappeared count with stable aggregate totals should usually be shown as consolidationLikely or representationChanged, not as critical loss.
Detailed export formats and manifest rules live in Export-Specification.md.
Exports are scoped to what the user is inspecting.
Supported MVP exports: - point-in-time record table - observation manifest with hashes - diff report between two observations - selected appeared/disappeared/changed record set
Export rules: - Include observation timestamps and app/build/schema versions. - Include hashes so exported evidence can be re-identified within HealthProbe. - Do not automatically upload exports. - Keep examples synthetic. - Allow CSV for spreadsheet inspection and JSON for structured analysis. - Stream/page from SQLite. Do not build a full large export in RAM. - Preserve enough structure for external recovery/salvage tools to reason about records without making HealthProbe itself a restore tool.
Context logs help interpret changes but must not claim causality.
Log: - capture start/end/failure - HealthKit permission changes - selected type registry changes - app version and iOS version - coarse iCloud sign-in state if available - archive reset/schema-version changes and integrity-check results
Do not log raw health values or personal identifiers.
Archive health checks:
- open database;
- verify schema version;
- run PRAGMA integrity_check;
- verify required tables/indexes;
- spot-check aggregate rebuilds;
- verify manifest hashes for completed exports.
If integrity fails: - stop write operations; - show archive health as degraded; - allow export only if the specific query path can be verified safe; - offer developer/test reset for current prototype builds; - do not silently delete a real archive in future production builds.
Core Data cache corruption is lower severity: delete and rebuild cache from SQLite.
Primary surfaces: - observation timeline - point-in-time observation detail - per-type record table - diff detail between observations - export preview and export history - archive health/status
Legacy devices may disable or simplify heavy visualizations. They must still support capture, cached summaries, report generation, and export.
Avoid alarm-first wording. Prefer: - "84 records no longer visible in current observation" - "Daily aggregate changed by 0.1%" - "Consolidation likely" - "Cause not inferred"
Avoid: - "Apple lost your data" - "Critical loss" based only on count - "iCloud broke sync"
Unit tests: - point-in-time reconstruction - appeared/disappeared diff sets - consolidation heuristic with stable aggregates - changed aggregate with uncertain label - empty observations - permission/type-registry changes - clock skew/context timestamp handling - Core Data cache deletion and rebuild from SQLite - SQL diff queries on large synthetic datasets without high RAM use
Integration tests:
- archive persistence and recovery
- archive reset/reinitialization for current test installs
- future archive schema migrations once real archive versions exist
- Core Data cache rebuild from archive
- export generation with manifest hashes
- high-frequency capture memory/performance
- deletion evidence via HKDeletedObject
- opt-in large synthetic full-import benchmark for SQLite archive write/finalize cost
Synthetic fixtures only. No real health values or identifiable metadata.
Large-import benchmark policy:
- keep one opt-in XCTest benchmark for a synthetic full import into SQLite;
- measure at least XCTClockMetric and XCTMemoryMetric;
- enable it only with explicit environment variables or launch arguments so
normal test runs stay fast;
- use it to compare archive write/finalize regressions between commits, not to
prove end-to-end HealthKit device performance by itself;
- combine it with real-device diagnostic reports before declaring background
import safe on large live datasets.
| Operation | Target | Notes |
|---|---|---|
| Anchored capture | Background | Stream pages; avoid building huge arrays |
| Archive write | Background | Commit before Core Data cache update |
| UI cache update | Short main-thread work | Use precomputed summaries |
| Diff preview | SQL-first, bounded | Use temp tables/indexes; cap record previews and page full tables |
| Export | User-initiated | Stream/page from SQLite; support filters for large high-frequency types |
HealthProbe Implementation Guide v1.6 - 2026-05-23