1 contributor
403 lines | 15.198kb

HealthProbe - Technical Implementation Guide

Version: 1.6 Last Updated: 2026-05-23 Purpose: Implementation guide for the iOS local Health DB Time Machine

For database schema, archive invariants, SQL analysis patterns, Core Data cache boundaries, reset policy, and future migration rules, read Database-Design.md first. This guide describes implementation workflow and cross-module behavior.

Privacy Directives - Mandatory

The following rules apply to all code, logs, examples, tests, and documentation:

  • No credentials, API keys, tokens, passwords, or signing certificates
  • No personal data: names, emails, phone numbers, dates of birth
  • No account identifiers: Apple IDs, iCloud account info, CloudKit record IDs
  • No raw real health values in the repository, tests, fixtures, logs, examples, or documentation
  • No location data or patterns that could identify a user
  • Device/source identifiers must be redacted, hashed, or stored only as local provenance according to the privacy policy

The app may store a user's HealthKit samples locally on-device when the user grants HealthKit access. Those samples must never be committed to source control or written to diagnostic logs.

1. Product Objective

HealthProbe is a single-device local archive and time-machine app for HealthKit-accessible data.

The implementation must prioritize: - point-in-time reconstruction of local HealthKit observations - neutral change explanation between observations - preservation of selected details before HealthKit aggregation/consolidation makes them unavailable - scoped user exports - no HealthProbe CloudKit/iCloud sync - no cross-device record-by-record comparison - iOS 15-era legacy device support; SwiftData is not a target dependency

Record-count drops are not inherently critical. They are evidence to explain with record-level and aggregate context.

2. Test Installation Reset Lifecycle

HealthProbe has no real deployments at this stage. Existing SwiftData stores and prototype SQLite archives are disposable.

Archive v2 startup behavior: 1. Open the archive path. 2. Read archive_metadata.schema_version when present. 3. If no archive exists, create archive v2. 4. If a prototype/unknown schema exists, close the database, move it to a timestamped *.prototype-backup file for developer inspection, and create a fresh archive v2. 5. Rebuild/delete Core Data cache rows after archive reset. 6. Log reset reason without raw health values.

Do not implement one-way migration from the old prototype schema unless a later dated product decision reverses this policy.

3. HealthKit Capture

Use: - HKAnchoredObjectQuery for incremental capture - HKObserverQuery as a wake-up hint - manual capture from the app UI

Capture flow: 1. Resolve the current local device chain ID. 2. Start an observation record. 3. For each selected sample type, run anchored queries. 4. Write HealthKit samples and deleted-object evidence to the local archive first. 5. Update materialized aggregate tables in SQLite. 6. Save/rebuild derived Core Data cache rows only after archive writes succeed. 7. Compute summary/diff caches for UI and reports.

Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.

3.1 Capture State Machine

Observation statuses: - started; - partial; - completed; - failed; - cancelled.

Type-run statuses: - started; - completed; - failed; - unauthorized; - timed_out.

Rules: - one failed type does not invalidate successfully committed type runs; - incomplete observations are visible as partial evidence, not as proof of disappearance; - anchors are saved only after the corresponding SQLite transaction commits; - UI change labels must include uncertainty when either side of a diff has partial/failed type evidence.

3.2 Anchor Recovery

If an anchor is missing, corrupt, or rejected by HealthKit: - mark the type run with anchor failure context; - run a full scan for the affected type when permissions allow; - rebuild current visibility for that type from observed samples and deleted-object evidence; - continue storing future anchors after the full scan succeeds.

4. Storage Layers

4.1 Local Archive Store

The archive store is the source of truth. It should be a robust local SQLite database designed for both storage and analysis.

The canonical database design is Database-Design.md. The summary below is intentionally high-level; do not treat it as a competing schema source.

The archive should support: - one schema for all selected sample types - differential observation storage; do not store complete recurring snapshots - HealthKit UUID hash and internal fingerprints - sample payload versions deduplicated across observations - type identifier, start/end date, value, unit, and category/workout fields - source/source revision metadata - HealthKit metadata dictionaries - device provenance exposed by HealthKit, redacted or hashed as required - first-seen, last-seen, last-verified, and disappeared-at observations - visibility ranges/events for point-in-time reconstruction - observation history sufficient for point-in-time reconstruction - relationship records where HealthKit exposes links between workouts, samples, events, or related records - materialized aggregate tables for expensive counts used by reports/UI - schema versioning, current test-store reset policy, and future migrations - integrity hashes/manifests for exports - indexes, temporary tables, joins, CTEs, and paged result sets for large diffs - recovery-compatible exports for external tooling, preserving record identity, payload versions, provenance metadata where available, relationships, observation history, and manifest hashes

The archive must be able to answer: - records visible at observation T - records that appeared/disappeared between T1 and T2 - records whose representation changed while semantic/aggregate meaning may be preserved - selected records for streaming export

Minimum target schema shape is defined in Database-Design.md. The archive must at least preserve these concepts:

  • observations;
  • sample identities;
  • sample payload versions;
  • observation events;
  • visibility ranges;
  • sources, source revisions, devices, metadata, and relationships;
  • materialized aggregates;
  • export manifests.

Historical sketch retained for orientation:

-- One row per local capture attempt/result.
CREATE TABLE observations (
    id INTEGER PRIMARY KEY,
    observed_at REAL NOT NULL,
    status TEXT NOT NULL,
    app_version TEXT,
    os_version TEXT,
    device_chain_id TEXT NOT NULL,
    schema_version INTEGER NOT NULL
);

-- Stable identity for a HealthKit-accessible record or semantic record.
CREATE TABLE samples (
    id INTEGER PRIMARY KEY,
    type_identifier TEXT NOT NULL,
    sample_uuid_hash TEXT,
    strict_fingerprint TEXT NOT NULL,
    semantic_fingerprint TEXT,
    first_seen_observation_id INTEGER NOT NULL
);

-- Deduplicated payload representation. New row only when representation changes.
CREATE TABLE sample_versions (
    id INTEGER PRIMARY KEY,
    sample_id INTEGER NOT NULL,
    payload_hash TEXT NOT NULL,
    start_date REAL NOT NULL,
    end_date REAL NOT NULL,
    value REAL,
    unit TEXT,
    source_id INTEGER,
    metadata_hash TEXT
);

-- Visibility/event history, not a full snapshot copy.
CREATE TABLE sample_observation_events (
    id INTEGER PRIMARY KEY,
    observation_id INTEGER NOT NULL,
    sample_id INTEGER NOT NULL,
    version_id INTEGER,
    event_kind TEXT NOT NULL
);

-- Optional compressed visibility ranges for point-in-time reconstruction.
CREATE TABLE sample_visibility_ranges (
    sample_id INTEGER NOT NULL,
    version_id INTEGER,
    first_observation_id INTEGER NOT NULL,
    last_observation_id INTEGER,
    PRIMARY KEY (sample_id, version_id, first_observation_id)
);

-- Materialized aggregates feeding reports and Core Data cache.
CREATE TABLE daily_type_aggregates (
    observation_id INTEGER NOT NULL,
    type_identifier TEXT NOT NULL,
    bucket_start REAL NOT NULL,
    record_count INTEGER NOT NULL,
    value_sum REAL,
    value_max REAL,
    PRIMARY KEY (observation_id, type_identifier, bucket_start)
);

Exact naming can evolve, but the constraints must hold: payloads are deduplicated, observations are differential, and aggregates are materialized.

4.2 Core Data UI/Report Cache

Detailed entity contracts live in Core-Data-Cache-Design.md.

Core Data is the target derived/cache layer because it supports older iOS versions than SwiftData and is suitable for UI/report state. It may store: - selected data types and app settings - observation list and capture status - precomputed summaries, temporal bins, and diff previews - operation logs and export indexes - change labels and links into the archive - expensive count results used by reports and presentation

Core Data must not be the only forensic copy. If Core Data and the archive disagree, the SQLite archive wins. The cache must be safe to delete and rebuild from SQLite.

Current SwiftData models are legacy/prototype implementation details. New storage work should target Core Data for cache and SQLite for archive/analysis.

5. Change Explanation

Change logic should be evidence-first and consolidation-aware.

Basic diff should execute in SQLite, not by loading full datasets into Swift arrays: swift appeared = currentFingerprints.subtracting(previousFingerprints) disappeared = previousFingerprints.subtracting(currentFingerprints) retained = currentFingerprints.intersection(previousFingerprints)

Conceptual SQL shape: ```sql CREATE TEMP TABLE prev_visible AS SELECT sample_id, version_id FROM visible_samples WHERE observation_id = :previous;

CREATE TEMP TABLE curr_visible AS SELECT sample_id, version_id FROM visible_samples WHERE observation_id = :current;

SELECT p.sample_id FROM prev_visible p LEFT JOIN curr_visible c ON c.sample_id = p.sample_id WHERE c.sample_id IS NULL; ```

Semantic grouping should compare: - type identifier - start/end coverage - value sum and value max where meaningful - source/source revision - metadata keys relevant to HealthKit interpretation - interval length and sample density

Suggested labels: - appeared - disappeared - representationChanged - consolidationLikely - aggregateChanged - uncertain

Severity should be reserved for user-facing workflow urgency, not treated as proof of corruption. In particular, a high disappeared count with stable aggregate totals should usually be shown as consolidationLikely or representationChanged, not as critical loss.

6. Exports

Detailed export formats and manifest rules live in Export-Specification.md.

Exports are scoped to what the user is inspecting.

Supported MVP exports: - point-in-time record table - observation manifest with hashes - diff report between two observations - selected appeared/disappeared/changed record set

Export rules: - Include observation timestamps and app/build/schema versions. - Include hashes so exported evidence can be re-identified within HealthProbe. - Do not automatically upload exports. - Keep examples synthetic. - Allow CSV for spreadsheet inspection and JSON for structured analysis. - Stream/page from SQLite. Do not build a full large export in RAM. - Preserve enough structure for external recovery/salvage tools to reason about records without making HealthProbe itself a restore tool.

7. Context Logging

Context logs help interpret changes but must not claim causality.

Log: - capture start/end/failure - HealthKit permission changes - selected type registry changes - app version and iOS version - coarse iCloud sign-in state if available - archive reset/schema-version changes and integrity-check results

Do not log raw health values or personal identifiers.

8. Archive Health And Integrity Failure

Archive health checks: - open database; - verify schema version; - run PRAGMA integrity_check; - verify required tables/indexes; - spot-check aggregate rebuilds; - verify manifest hashes for completed exports.

If integrity fails: - stop write operations; - show archive health as degraded; - allow export only if the specific query path can be verified safe; - offer developer/test reset for current prototype builds; - do not silently delete a real archive in future production builds.

Core Data cache corruption is lower severity: delete and rebuild cache from SQLite.

9. UI Implementation Guidance

Primary surfaces: - observation timeline - point-in-time observation detail - per-type record table - diff detail between observations - export preview and export history - archive health/status

Legacy devices may disable or simplify heavy visualizations. They must still support capture, cached summaries, report generation, and export.

Avoid alarm-first wording. Prefer: - "84 records no longer visible in current observation" - "Daily aggregate changed by 0.1%" - "Consolidation likely" - "Cause not inferred"

Avoid: - "Apple lost your data" - "Critical loss" based only on count - "iCloud broke sync"

10. Testing Strategy

Unit tests: - point-in-time reconstruction - appeared/disappeared diff sets - consolidation heuristic with stable aggregates - changed aggregate with uncertain label - empty observations - permission/type-registry changes - clock skew/context timestamp handling - Core Data cache deletion and rebuild from SQLite - SQL diff queries on large synthetic datasets without high RAM use

Integration tests: - archive persistence and recovery - archive reset/reinitialization for current test installs - future archive schema migrations once real archive versions exist - Core Data cache rebuild from archive - export generation with manifest hashes - high-frequency capture memory/performance - deletion evidence via HKDeletedObject

Synthetic fixtures only. No real health values or identifiable metadata.

11. Performance Considerations

Operation Target Notes
Anchored capture Background Stream pages; avoid building huge arrays
Archive write Background Commit before Core Data cache update
UI cache update Short main-thread work Use precomputed summaries
Diff preview SQL-first, bounded Use temp tables/indexes; cap record previews and page full tables
Export User-initiated Stream/page from SQLite; support filters for large high-frequency types

12. Deployment Checklist

  • [ ] HealthKit read permissions declared in Info.plist
  • [ ] Background Modes enabled if used
  • [ ] Core Data cache schema/rebuild tested
  • [ ] Archive reset/reinitialization and schema versioning tested
  • [ ] Archive integrity/manifests tested
  • [ ] Export files verified with synthetic data
  • [ ] Privacy policy matches local archive behavior
  • [ ] UI copy reviewed for neutral, consolidation-aware language
  • [ ] Legacy-device mode reviewed for simplified UI/report/export behavior

HealthProbe Implementation Guide v1.6 - 2026-05-23