Showing 27 changed files with 3937 additions and 2908 deletions
+9 -257
AGENTS.md
@@ -1,271 +1,14 @@
1
-# HealthProbe – Multi-Model Development Guide
1
+# HealthProbe Agent Bootstrap
2 2
 
3
-## Overview
3
+Canonical agent instructions live in:
4 4
 
5
-HealthProbe is built by multiple AI models, each owning a distinct domain.  
6
-This document defines boundaries, interfaces, and handoff contracts.
5
+- `HealthProbe/Doc/00-agent-guides/AGENTS.md`
7 6
 
8
-**Agentic reality:** The repo is developed largely via agents (Codex CLI, Claude, and dedicated model sessions). When updating product scope, update docs first, then implement behind flags, and add tests for the new behavior.
7
+Before working, read:
9 8
 
9
+1. `HealthProbe/Doc/README.md`
10
+2. the specific chapter linked there for your task
11
+3. `HealthProbe/Doc/00-agent-guides/AGENTS.md`
10 12
 
11
-## Model Allocation
12
-
13
-| Domain | Owner | Tools |
14
-|--------|-------|-------|
15
-| **UI / SwiftUI Views** | Claude Code | Xcode, SwiftUI, CLAUDE.md |
16
-| **Archive Store** | Dedicated model session | SQLite/local archive format, HealthKit metadata mapping |
17
-| **Data Models (SwiftData)** | Dedicated model session | Xcode, Swift; derived UI/cache/settings/log models only |
18
-| **HealthKit Integration** | Dedicated model session | Xcode, HealthKit docs |
19
-| **Anomaly Detection Algorithms** | Dedicated model session | Swift, statistical references |
20
-| **Context Monitoring** | Dedicated model session | Xcode; logs Health/iCloud state as context only |
21
-| **Documentation** | Claude Code + dedicated session | Markdown |
22
-| **Tests** | Dedicated model session | XCTest, Swift Testing |
23
-
24
-
25
-## Directory Ownership
26
-
27
-```
28
-HealthProbe/
29
-├── Views/           ← Claude Code (UI)
30
-├── ViewModels/      ← Claude Code (UI)
31
-├── Utilities/       ← Claude Code (shared helpers, mocks)
32
-├── Models/          ← Models agent (SwiftData UI/cache schemas)
33
-├── Services/        ← Services agent (HealthKit, archive store, anomaly, context)
34
-└── Tests/           ← Tests agent
35
-```
36
-
37
-**Rule:** Each agent writes only within its owned directories.  
38
-Cross-boundary changes require an explicit interface contract (protocol) first.
39
-
40
-**Documentation scope:** `HealthProbe/Doc/` is shared. Keep it consistent with shipped behavior, and add a dated entry when objectives change.
41
-
42
-
43
-## Interface Contracts
44
-
45
-All service boundaries are defined as Swift protocols.  
46
-Claude Code (UI) consumes protocols — never concrete implementations.
47
-
48
-### HealthMonitorProtocol
49
-
50
-```swift
51
-/// Owned by: Services agent
52
-/// Consumed by: UI (DashboardViewModel)
53
-protocol HealthMonitorProtocol {
54
-    var currentStatus: HealthStatus { get }
55
-    var lastChecked: Date? { get }
56
-    func runCheck() async throws
57
-}
58
-```
59
-
60
-### AnomalyStoreProtocol
61
-
62
-```swift
63
-/// Owned by: Services agent
64
-/// Consumed by: UI (AnomalyListViewModel)
65
-protocol AnomalyStoreProtocol {
66
-    var anomalies: [DetectedAnomaly] { get }
67
-    func markResolved(_ anomaly: DetectedAnomaly) async throws
68
-}
69
-```
70
-
71
-### AuditTrailProtocol
72
-
73
-```swift
74
-/// Owned by: Services agent
75
-/// Consumed by: UI (AuditTrailView)
76
-protocol AuditTrailProtocol {
77
-    var entries: [AuditTrailEntry] { get }
78
-    func export() async throws -> Data  // JSON
79
-}
80
-```
81
-
82
-### ContextMonitorProtocol
83
-
84
-```swift
85
-/// Owned by: Services agent
86
-/// Consumed by: UI (ContextViewModel)
87
-protocol ContextMonitorProtocol {
88
-    var iCloudEnabled: Bool { get }
89
-    var lastObservedAt: Date? { get }
90
-    var stateChanges: [ContextStateChange] { get }
91
-}
92
-```
93
-
94
-
95
-## Shared Types (Models Agent)
96
-
97
-These types are defined once in `Models/` and shared across all agents:
98
-
99
-```swift
100
-// Models/TypeDistributionBin.swift
101
-@Model
102
-final class TypeDistributionBin {
103
-    var bucketStart: Date
104
-    var bucketEnd: Date
105
-    var count: Int
106
-}
107
-
108
-// Models/TypeCount.swift
109
-// TypeCount owns zero or more TypeDistributionBin records.
110
-// These bins store sample counts and import anchors, not raw health values.
111
-
112
-// Interface updated 2026-05-12 — see AGENTS.md
113
-// Models/HealthRecord.swift
114
-// HealthRecord stores one anonymized HealthKit record fingerprint plus its start/end dates.
115
-// It intentionally does not store raw health values, device identifiers, or source metadata.
116
-// UI may compare HealthRecord fingerprints between adjacent snapshots to expose losses
117
-// that are masked by newly-added records with the same total count.
118
-// High-volume snapshots store these records in TypeCount.recordArchiveData instead of
119
-// creating one SwiftData model per record, avoiding main-thread stalls after import.
120
-
121
-// Interface updated 2026-05-13 — see AGENTS.md
122
-// TypeDistributionBin also stores content hashes and HealthKit query anchors.
123
-// Import uses a global anchored query per data type so follow-up snapshots fetch only
124
-// HealthKit deltas instead of scanning calendar blocks with fixed per-query latency.
125
-
126
-// Interface updated 2026-05-18 — see AGENTS.md
127
-// SwiftData is not the forensic source of truth. TypeCount and related rows store
128
-// precomputed UI/index data only. Complete HealthKit samples and metadata belong
129
-// in the local archive store, in one schema that can preserve relationships across
130
-// data types, sources, devices, workouts, and metadata.
131
-
132
-// Interface updated 2026-05-18 — see AGENTS.md
133
-// Services/Protocols/HealthArchiveStore.swift defines the local archive boundary.
134
-// SQLiteHealthArchiveStore is the current implementation. HealthKit anchored-query
135
-// pages must be written to this archive before SwiftData UI/cache rows are saved.
136
-// Deletions are recorded by sampleUUIDHash because HKDeletedObject exposes UUIDs,
137
-// not complete sample payloads.
138
-
139
-// Interface updated 2026-05-17 — see AGENTS.md
140
-// Models/TypeCount.detailCacheData stores precomputed detail data for the current
141
-// TypeCount compared with the immediately previous snapshot on the same device.
142
-// The cache contains aggregate added/disappeared counts, capped preview records for
143
-// UI drill-down, and daily change bins for temporal charts. It must be computed when
144
-// snapshots are saved and refreshed for neighboring snapshots when snapshot deletion
145
-// changes chain links. Existing stores are backfilled incrementally with a strict
146
-// per-launch TypeCount cap to avoid decoding many large archives in one run.
147
-
148
-// Interface updated 2026-05-17 — see AGENTS.md
149
-// Models/HealthSnapshot.contentEquivalentSnapshotID marks snapshots whose TypeCount
150
-// content is identical to a previous snapshot on the same device. These snapshots are
151
-// retained as temporal labels but behave as aliases to the representative content
152
-// snapshot for expensive detail cache/diff work.
153
-
154
-// Interface updated 2026-05-17 — see AGENTS.md
155
-// Models/TypeCount.contentEquivalentTypeCountID marks individual data types whose
156
-// content is identical to the previous snapshot's same TypeCount. This allows a
157
-// snapshot to contain real changes for some metrics while long-stable metrics behave
158
-// as temporal aliases and skip per-type detail cache/diff work.
159
-
160
-// Interface updated 2026-05-17 — see AGENTS.md
161
-// Models/HealthSnapshot stores cached overview scalars for UI consumption:
162
-// tracked type count, aggregate record count, and overall oldest/newest record dates.
163
-// These values must be computed during snapshot save while TypeCount data is already
164
-// in memory, so snapshot list/detail screens never recompute them by traversing
165
-// snapshot.typeCounts on the UI thread.
166
-
167
-// Interface updated 2026-05-17 — see AGENTS.md
168
-// Models/SnapshotDelta stores cached list/detail summary scalars derived from TypeDelta.
169
-// Overview screens consume these scalars and type-delta summaries directly instead of
170
-// recalculating per-snapshot changes from HealthSnapshot.typeCounts.
171
-
172
-// Models/DetectedAnomaly.swift
173
-enum AnomalyType: String, Codable {
174
-    case historicalInsertion = "historical_insertion"
175
-    case silentDeletion      = "silent_deletion"
176
-    case duplicate           = "duplicate"
177
-    case divergence          = "divergence"
178
-}
179
-
180
-enum Severity: String, Codable, Comparable {
181
-    case info, warning, critical
182
-}
183
-
184
-enum HealthStatus: String {
185
-    case healthy, warning, critical, unknown
186
-}
187
-```
188
-
189
-Any model changes must be announced in this file before other agents consume them.
190
-
191
-
192
-## Handoff Process
193
-
194
-When a module is ready to be consumed by another agent:
195
-
196
-1. **Define the protocol** in `Services/Protocols/` (services agent)
197
-2. **Implement a mock** in `Utilities/Mocks.swift` (Claude Code)
198
-3. **Build UI against the mock** (Claude Code)
199
-4. **Replace mock with real implementation** (services agent)
200
-5. **Integration test** (tests agent)
201
-
202
-This allows UI development and service development to proceed in parallel.
203
-
204
-
205
-## Algorithms & Detection Logic
206
-
207
-The following modules involve non-trivial logic and should be reviewed carefully:
208
-
209
-| Module | File | Description |
210
-|--------|------|-------------|
211
-| **Anomaly Detector** | `Services/AnomalyDetector.swift` | Statistical detection: insertions, deletions, duplicates, divergence |
212
-| **Divergence Engine** | `Services/DivergenceEngine.swift` | Time-series trend analysis, σ comparison |
213
-| **Fingerprinter** | `Services/SampleFingerprinter.swift` | Duplicate detection via sample hashing |
214
-| **Snapshot Comparator** | `Services/SnapshotComparator.swift` | Diff between two HealthKit snapshots |
215
-| **Distribution Comparator** | `Services/SnapshotDiffService.swift` | Daily per-type distribution diff to reveal old-data disappearance masked by new data |
216
-
217
-**Guidelines for algorithm modules:**
218
-- Document assumptions explicitly (e.g., "assumes continuous monitoring since install")
219
-- All thresholds (e.g., `age > 7 days`) must be configurable constants, not magic numbers
220
-- Include unit tests for edge cases (empty snapshots, partial data, clock skew)
221
-- No UI code; return plain Swift types only
222
-
223
-
224
-## Privacy Directives — All Agents
225
-
226
-**Mandatory across all modules:**
227
-- No credentials, API keys, tokens, or certificates in any file
228
-- No personal data: names, emails, phone numbers, dates of birth
229
-- No device identifiers: UDID, serial number, advertising ID, device name
230
-- No account identifiers: Apple ID, iCloud account info, CloudKit record IDs
231
-- No raw health values in code, tests, previews, logs, or comments
232
-- No location data or patterns enabling re-identification
233
-- Synthetic data only in tests and previews
234
-
235
-**Clarification:** “No raw health values” applies to this repository’s contents. The app may optionally store a user's raw HealthKit samples *locally on-device* for forensic backup purposes, but such samples must never appear in source control, logs, or docs.
236
-
237
-
238
-## Communication Between Agents
239
-
240
-When one agent needs to communicate a decision or change to another:
241
-
242
-1. **Update this file** (`AGENTS.md`) with the protocol/interface change
243
-2. **Update the relevant protocol** in `Services/Protocols/`
244
-3. **Add a comment** in the affected file: `// Interface updated YYYY-MM-DD — see AGENTS.md`
245
-
246
-
247
-## Current Status
248
-
249
-| Module | Status | Owner |
250
-|--------|--------|-------|
251
-| SwiftData Models | ✅ Done | Models agent |
252
-| HealthKit Integration | ✅ Done | Services agent |
253
-| Snapshot Diff Service | ✅ Done | Services agent |
254
-| Service Protocols | ⏳ Not started | Services agent |
255
-| Anomaly Detection | ⏳ Not started | Services agent |
256
-| Sync Monitor | ⏳ Not started | Services agent |
257
-| UI – App entry + TabView | ✅ Done | Claude Code |
258
-| UI – Dashboard | ✅ Done (functional, minimal) | Claude Code |
259
-| UI – Snapshots + Detail | ✅ Done | Claude Code |
260
-| UI – Data Types | ✅ Done | Claude Code |
261
-| UI – Settings | ✅ Done | Claude Code |
262
-| Unit Tests | ⏳ Not started | Tests agent |
13
+The root repository must not contain substantive project documentation. Keep
14
+canonical docs under `HealthProbe/Doc/` so agents do not discover stale copies.
+4 -227
CLAUDE.md
@@ -1,239 +1,7 @@
1
-# HealthProbe – Claude Code Instructions
1
+# HealthProbe Claude Bootstrap
2 2
 
3
-## Project Context
3
+Canonical Claude/UI instructions live in:
4 4
 
5
-**HealthProbe** is an iOS app that audits Apple HealthKit data integrity.  
6
-It detects anomalies: data loss, historical insertions, duplicates, divergence trends.  
7
-Full specification: `HealthProbe/Doc/HealthProbe – Complete Specification & Motivations.md`
5
+- `HealthProbe/Doc/00-agent-guides/CLAUDE.md`
8 6
 
9
-**Current state:** SwiftUI + SwiftData app is active. Product direction changed on 2026-05-18: HealthProbe is a local audit/capture agent. Do not add HealthProbe CloudKit/iCloud sync.
10
-
11
-
12
-## Claude Code Scope: UI Layer
13
-
14
-Claude Code is responsible for:
15
-- All **SwiftUI Views** (`Views/` directory)
16
-- All **ViewModels** (`ViewModels/` directory)
17
-- **Navigation structure** and tab/split layout
18
-- **Design system** (colors, typography, spacing)
19
-- **Preview providers** for all views
20
-- **Accessibility** (VoiceOver, Dynamic Type)
21
-
22
-Claude Code does NOT own:
23
-- `Services/` — HealthKit queries, anomaly detection, archive store, context monitoring (see AGENTS.md)
24
-- `Models/` — SwiftData models (see AGENTS.md)
25
-- Entitlements, Info.plist, project configuration
26
-
27
-When services are not yet implemented, **consume their protocols and use mock implementations** for UI development.
28
-
29
-
30
-## Target Screen Structure
31
-
32
-```
33
-App (TabView)
34
-├── Tab 1: Dashboard          → DashboardView
35
-├── Tab 2: Anomalies          → AnomalyListView → AnomalyDetailView
36
-├── Tab 3: Audit Trail        → AuditTrailView
37
-├── Tab 4: Archive Status     → ArchiveStatusView
38
-└── Tab 5: Settings           → SettingsView
39
-```
40
-
41
-### DashboardView
42
-- Large status indicator: ✅ Healthy / ⚠️ Check / 🚨 Critical
43
-- Last check timestamp
44
-- Summary cards: samples tracked, anomalies found (all-time)
45
-- Up to 3 recent active alerts (tappable → AnomalyDetailView)
46
-- "Check Now" button (calls monitoring service)
47
-
48
-### AnomalyListView
49
-- List of `DetectedAnomaly` sorted by date (most recent first)
50
-- Filter: All / Critical / Warning / Info
51
-- Filter: by type (deletion, insertion, duplicate, divergence)
52
-- Each row: severity badge, type, sample type, date
53
-- Swipe to mark resolved
54
-
55
-### AnomalyDetailView
56
-- Full anomaly details
57
-- Evidence dictionary displayed as key-value rows
58
-- Severity badge
59
-- Share button → exports as Markdown (for bug reports)
60
-- "Mark Resolved" action
61
-
62
-### AuditTrailView
63
-- Chronological list of `AuditTrailEntry`
64
-- Each row: timestamp, event type chip, message
65
-- Search/filter by event type
66
-- Export button → JSON
67
-
68
-### ArchiveStatusView
69
-- Current local archive health
70
-- Last archive verification timestamp
71
-- Selected data types covered by forensic capture
72
-- Recent Health/iCloud context events (for correlation only; no HealthProbe sync)
73
-
74
-### SettingsView
75
-- Check frequency: 2h / 6h / 12h / 24h (Picker)
76
-- Sample types to monitor (MultiSelect toggle list)
77
-- Alert thresholds (severity level for push notifications)
78
-- Point export/report actions for selected findings
79
-- Delete all audit data (destructive, confirm alert)
80
-
81
-
82
-## Design Guidelines
83
-
84
-**Tone:** Professional, calm, medical-adjacent. Not alarming unless critical.
85
-
86
-**Color System:**
87
-```swift
88
-// Status colors
89
-.healthyGreen   // SF: green  — all clear
90
-.warningAmber   // SF: yellow — attention needed  
91
-.criticalRed    // SF: red    — action required
92
-.neutralGray    // SF: gray   — informational / resolved
93
-```
94
-
95
-**Typography:** SF Pro (system font). No custom fonts.
96
-
97
-**Spacing:** 8pt grid. Use `VStack(spacing: 12)` as baseline.
98
-
99
-**Icons:** SF Symbols only. No third-party icon sets.
100
-
101
-**Key SF Symbols:**
102
-- `checkmark.shield.fill` — healthy status
103
-- `exclamationmark.triangle.fill` — warning
104
-- `xmark.shield.fill` — critical
105
-- `clock.arrow.circlepath` — audit trail
106
-- `externaldrive.fill.badge.checkmark` — archive status
107
-- `waveform.path.ecg` — health data
108
-- `doc.text.magnifyingglass` — anomaly detail
109
-
110
-**Dark mode:** Required. Test in both modes.
111
-
112
-**Privacy-first UI:**
113
-- Health metric values are **never shown in plain text** in list rows
114
-- Values visible only in `AnomalyDetailView` after tap
115
-- Evidence dictionary values shown as monospace text, not highlighted
116
-
117
-
118
-## SwiftData Integration
119
-
120
-Models are defined in `Models/`. Reference them read-only from views:
121
-
122
-```swift
123
-// In views, use @Query — never write directly from a View
124
-@Query(sort: \DetectedAnomaly.detectedAt, order: .reverse)
125
-private var anomalies: [DetectedAnomaly]
126
-
127
-// Mutations go through ViewModels or services only
128
-```
129
-
130
-Until `Models/` are implemented, use mock data via `PreviewProvider`.
131
-
132
-
133
-## ViewModel Pattern
134
-
135
-```swift
136
-// Pattern for all ViewModels
137
-@MainActor
138
-@Observable
139
-final class DashboardViewModel {
140
-    private let monitor: HealthMonitorProtocol  // protocol, not concrete type
141
-    
142
-    var status: HealthStatus = .unknown
143
-    var recentAnomalies: [DetectedAnomaly] = []
144
-    var lastChecked: Date?
145
-    
146
-    init(monitor: HealthMonitorProtocol = HealthMonitorService.shared) {
147
-        self.monitor = monitor
148
-    }
149
-    
150
-    func refresh() async {
151
-        await monitor.runCheck()
152
-    }
153
-}
154
-```
155
-
156
-Always inject dependencies via protocols — makes previews and tests possible without real HealthKit.
157
-
158
-
159
-## Mock Data Protocol
160
-
161
-Until services are ready, define preview mocks in `Utilities/Mocks.swift`:
162
-
163
-```swift
164
-struct MockHealthMonitor: HealthMonitorProtocol {
165
-    func runCheck() async { }
166
-    var status: HealthStatus { .warning }
167
-}
168
-
169
-extension DetectedAnomaly {
170
-    static var preview: DetectedAnomaly {
171
-        DetectedAnomaly(
172
-            detectedAt: .now,
173
-            type: "silent_deletion",
174
-            severity: "warning",
175
-            sampleType: "Steps",
176
-            summary: "72 samples missing without deletion event",
177
-            evidence: ["loss_count": "72", "loss_percent": "23.4"]
178
-        )
179
-    }
180
-}
181
-```
182
-
183
-
184
-## File Organization
185
-
186
-```
187
-HealthProbe/
188
-├── Views/
189
-│   ├── Dashboard/
190
-│   │   ├── DashboardView.swift
191
-│   │   └── StatusCardView.swift
192
-│   ├── Anomalies/
193
-│   │   ├── AnomalyListView.swift
194
-│   │   └── AnomalyDetailView.swift
195
-│   ├── AuditTrail/
196
-│   │   └── AuditTrailView.swift
197
-│   ├── Archive/
198
-│   │   └── ArchiveStatusView.swift
199
-│   └── Settings/
200
-│       └── SettingsView.swift
201
-├── ViewModels/
202
-│   ├── DashboardViewModel.swift
203
-│   ├── AnomalyListViewModel.swift
204
-│   └── ArchiveStatusViewModel.swift
205
-├── Models/           ← NOT owned by Claude Code
206
-├── Services/         ← NOT owned by Claude Code
207
-└── Utilities/
208
-    ├── Mocks.swift
209
-    ├── DateFormatters.swift
210
-    └── DesignSystem.swift
211
-```
212
-
213
-
214
-## Privacy Directives
215
-
216
-**Mandatory — no exceptions:**
217
-- No credentials, tokens, or API keys in any file
218
-- No personal data, device identifiers, or account identifiers
219
-- No real health values in code, comments, previews, or tests
220
-- Synthetic preview data only (see Mocks.swift above)
221
-
222
-
223
-## Before Marking a Task Complete
224
-
225
-- [ ] View renders in both Light and Dark mode (use Preview)
226
-- [ ] VoiceOver labels set on interactive elements
227
-- [ ] Dynamic Type tested (at least xSmall and AX3)
228
-- [ ] Works with mock data (no real HealthKit dependency in View layer)
229
-- [ ] No health values displayed without explicit user tap
230
-- [ ] Compiles without warnings
7
+Read `HealthProbe/Doc/README.md` first, then the UI agent guide above.
+0 -45
CONTRIBUTING.md
@@ -1,46 +0,0 @@
1
-# Contributing to HealthProbe
2
-
3
-## ⚠️ Privacy Rules — Non-Negotiable
4
-
5
-Before submitting any code, issue, PR, or documentation:
6
-
7
-**Never include:**
8
-- Credentials, API keys, tokens, or certificates
9
-- Personal data: names, emails, phone numbers, dates of birth
10
-- Device identifiers: UDID, serial number, advertising ID, device name
11
-- Account identifiers: Apple ID, iCloud account, CloudKit record IDs
12
-- Raw health data: actual measurements, records, or workout details
13
-- Location data: GPS coordinates, location history
14
-- Any combination of fields that could identify a person or device
15
-
16
-**For examples and tests, use synthetic data only:**
17
-```
18
-Device:  "iPhone-TESTDEVICE-001"
19
-User:    "Test User"
20
-Date:    2000-01-01
21
-Value:   0 (or clearly fictional)
22
-```
23
-
24
-Submissions containing real credentials or personal data will be closed without review.
25
-
26
-
27
-## Contribution Standards
28
-
29
-- **Observations ≠ conclusions:** Label theories and speculation explicitly
30
-- **Read-only HealthKit:** No code that modifies or deletes health data
31
-- **Evidence-based:** Bug reports require reproduction steps, device model, and iOS version
32
-- **No raw health exports:** Aggregated counts only; never raw sample values
33
-
34
-## Bug Reports
35
-
36
-Include:
37
-- Device model (e.g., iPhone 15 Pro) — no serial/UDID
38
-- iOS version
39
-- HealthProbe version
40
-- Observed vs. expected behavior
41
-- Anonymized screenshot or export (values redacted)
42
-
43
-## License
44
-
45
-By contributing you agree your code is released under the project license.
+18 -1
HealthProbe.xcodeproj/project.pbxproj
@@ -14,6 +14,21 @@
14 14
 		439832862FA4933F003C0182 /* Exceptions for "HealthProbe" folder in "HealthProbe" target */ = {
15 15
 			isa = PBXFileSystemSynchronizedBuildFileExceptionSet;
16 16
 			membershipExceptions = (
17
+				"Doc/00-agent-guides/AGENTS.md",
18
+				"Doc/00-agent-guides/CLAUDE.md",
19
+				"Doc/01-product/Forensics-Limitations.md",
20
+				"Doc/01-product/MVP-Specification.md",
21
+				"Doc/01-product/Product-Specification.md",
22
+				"Doc/02-architecture/Core-Data-Cache-Design.md",
23
+				"Doc/02-architecture/Database-Design.md",
24
+				"Doc/02-architecture/Export-Specification.md",
25
+				"Doc/02-architecture/Implementation-Guide.md",
26
+				"Doc/03-ui/README.md",
27
+				"Doc/04-project/IMPLEMENTATION_STATUS.md",
28
+				"Doc/04-project/Refactoring-Plan.md",
29
+				"Doc/99-archive/DATA_TYPE_VIEWS_OPTIMIZATION.md",
30
+				"Doc/99-archive/REFACTORING_DATA_TYPE_VIEWS.md",
31
+				Doc/README.md,
17 32
 				Info.plist,
18 33
 			);
19 34
 			target = 439832782FA4933E003C0182 /* HealthProbe */;
@@ -91,7 +106,7 @@
91 106
 			attributes = {
92 107
 				BuildIndependentTargetsInParallel = 1;
93 108
 				LastSwiftUpdateCheck = 2640;
94
-				LastUpgradeCheck = 2640;
109
+				LastUpgradeCheck = 2650;
95 110
 				TargetAttributes = {
96 111
 					439832782FA4933E003C0182 = {
97 112
 						CreatedOnToolsVersion = 26.4.1;
@@ -278,6 +293,7 @@
278 293
 				MTL_FAST_MATH = YES;
279 294
 				ONLY_ACTIVE_ARCH = YES;
280 295
 				SDKROOT = iphoneos;
296
+				STRING_CATALOG_GENERATE_SYMBOLS = YES;
281 297
 				SWIFT_ACTIVE_COMPILATION_CONDITIONS = "DEBUG $(inherited)";
282 298
 				SWIFT_OPTIMIZATION_LEVEL = "-Onone";
283 299
 			};
@@ -335,6 +351,7 @@
335 351
 				MTL_ENABLE_DEBUG_INFO = NO;
336 352
 				MTL_FAST_MATH = YES;
337 353
 				SDKROOT = iphoneos;
354
+				STRING_CATALOG_GENERATE_SYMBOLS = YES;
338 355
 				SWIFT_COMPILATION_MODE = wholemodule;
339 356
 				VALIDATE_PRODUCT = YES;
340 357
 			};
+341 -0
HealthProbe/Doc/00-agent-guides/AGENTS.md
@@ -0,0 +1,341 @@
1
+# HealthProbe - Multi-Model Development Guide
2
+
3
+Canonical path: `HealthProbe/Doc/00-agent-guides/AGENTS.md`
4
+
5
+Start every documentation lookup from [`../README.md`](../README.md).
6
+
7
+## Overview
8
+
9
+HealthProbe is built by multiple AI models, each owning a distinct domain.
10
+This document defines boundaries, interfaces, and handoff contracts.
11
+
12
+**Agentic reality:** The repo is developed largely via agents (Codex CLI, Claude, and dedicated model sessions). When updating product scope, update docs first, then implement behind flags, and add tests for the new behavior.
13
+
14
+**Objective updated 2026-05-23:** HealthProbe is a single-device local
15
+Health database time machine. It captures the local HealthKit database over time,
16
+lets the user inspect how accessible health data looked at a chosen observation
17
+date, explains what changed between local observations, and preserves exportable
18
+evidence that may no longer be available after Apple Health consolidates,
19
+aggregates, or prunes historical records. The app no longer treats raw
20
+record-count drops as inherently alarming, no longer studies differences between
21
+snapshots from different devices, and does not sync HealthProbe data through
22
+CloudKit/iCloud. Device metadata may still be stored as local provenance for the
23
+current device's chain, but UI and algorithms should compare snapshots only
24
+within one local device timeline.
25
+
26
+**Storage decision updated 2026-05-23:** HealthProbe targets legacy devices still
27
+used for Health collection, including iPhone 6s / Apple Watch Series 3 class
28
+setups. SwiftData is therefore not an acceptable long-term foundation because it
29
+requires newer OS versions. The target architecture is:
30
+1. a direct SQLite archive/analysis database as source of truth;
31
+2. differential observation storage, never recurring complete snapshot copies;
32
+3. SQL-first analysis using indexes, joins, CTEs, temporary tables, and paged
33
+   result sets without loading large archives into RAM;
34
+4. a rebuildable Core Data UI/reporting cache for expensive counts, summaries,
35
+   timeline rows, report metadata, and display state.
36
+
37
+Current SwiftData models are legacy/prototype implementation details until the
38
+Core Data cache replacement is implemented. New product/storage work should not
39
+expand SwiftData dependency.
40
+
41
+**Deployment/reset note updated 2026-05-23:** HealthProbe has no real
42
+deployments, only test installations. Existing SwiftData stores and prototype
43
+SQLite archives are disposable for the archive v2 refactor: agents should reset,
44
+ignore, or reinitialize them rather than building one-way migrations or backward
45
+compatibility layers. Future real archive versions may require migrations, but
46
+the current prototype schema does not.
47
+
48
+**Recovery compatibility note updated 2026-05-23:** HealthProbe will not perform
49
+disaster-recovery workflows such as transplanting Health database files into
50
+encrypted iOS backups or re-publishing archived values into HealthKit. However,
51
+the local archive and user exports should preserve enough structure to support
52
+external recovery/salvage procedures: stable record identity, values, dates,
53
+units, source/provenance metadata where available, observation history,
54
+relationships, hashes, and manifests. Recovery compatibility is an archive/export
55
+requirement, not an in-app restore feature.
56
+
57
+---
58
+
59
+## Model Allocation
60
+
61
+| Domain | Owner | Tools |
62
+|--------|-------|-------|
63
+| **UI / SwiftUI Views** | Claude Code | Xcode, SwiftUI, `HealthProbe/Doc/00-agent-guides/CLAUDE.md` |
64
+| **Archive Store** | Dedicated model session | SQLite/local archive format, HealthKit metadata mapping |
65
+| **Data Models (Core Data cache)** | Dedicated model session | Xcode, Swift; derived UI/cache/settings/log/report models only |
66
+| **HealthKit Integration** | Dedicated model session | Xcode, HealthKit docs |
67
+| **Change Explanation Algorithms** | Dedicated model session | Swift, archive diffing, consolidation heuristics |
68
+| **Context Monitoring** | Dedicated model session | Xcode; logs Health/iCloud state as context only |
69
+| **Documentation** | Claude Code + dedicated session | Markdown |
70
+| **Tests** | Dedicated model session | XCTest, Swift Testing |
71
+
72
+---
73
+
74
+## Directory Ownership
75
+
76
+```
77
+HealthProbe/
78
+├── Views/           ← Claude Code (UI)
79
+├── ViewModels/      ← Claude Code (UI)
80
+├── Utilities/       ← Claude Code (shared helpers, mocks)
81
+├── Models/          ← Models agent (legacy SwiftData now; target Core Data UI/cache schemas)
82
+├── Services/        ← Services agent (HealthKit, archive store, change explanation, context)
83
+└── Tests/           ← Tests agent
84
+```
85
+
86
+**Rule:** Each agent writes only within its owned directories.
87
+Cross-boundary changes require an explicit interface contract (protocol) first.
88
+
89
+**Documentation scope:** `HealthProbe/Doc/` is shared. Keep it consistent with shipped behavior, and add a dated entry when objectives change.
90
+
91
+**Database work starts here:** `HealthProbe/Doc/02-architecture/Database-Design.md`.
92
+The database is the central project artifact. Agents changing archive schema,
93
+capture persistence, diff logic, aggregate caches, exports, reset behavior, or
94
+future migrations must read and update that document before code changes.
95
+
96
+---
97
+
98
+## Interface Contracts
99
+
100
+All service boundaries are defined as Swift protocols.
101
+Claude Code (UI) consumes protocols — never concrete implementations.
102
+
103
+### CaptureMonitorProtocol
104
+
105
+```swift
106
+/// Owned by: Services agent
107
+/// Consumed by: UI (DashboardViewModel)
108
+protocol CaptureMonitorProtocol {
109
+    var archiveStatus: ArchiveStatus { get }
110
+    var lastObservedAt: Date? { get }
111
+    func captureNow() async throws
112
+}
113
+```
114
+
115
+### ChangeSummaryStoreProtocol
116
+
117
+```swift
118
+/// Owned by: Services agent
119
+/// Consumed by: UI (change timeline/detail views)
120
+protocol ChangeSummaryStoreProtocol {
121
+    var changes: [DetectedChange] { get }
122
+    func markReviewed(_ change: DetectedChange) async throws
123
+}
124
+```
125
+
126
+### AuditTrailProtocol
127
+
128
+```swift
129
+/// Owned by: Services agent
130
+/// Consumed by: UI (AuditTrailView)
131
+protocol AuditTrailProtocol {
132
+    var entries: [AuditTrailEntry] { get }
133
+    func export() async throws -> Data  // JSON
134
+}
135
+```
136
+
137
+### ContextMonitorProtocol
138
+
139
+```swift
140
+/// Owned by: Services agent
141
+/// Consumed by: UI (ContextViewModel)
142
+protocol ContextMonitorProtocol {
143
+    var iCloudEnabled: Bool { get }
144
+    var lastObservedAt: Date? { get }
145
+    var stateChanges: [ContextStateChange] { get }
146
+}
147
+```
148
+
149
+---
150
+
151
+## Shared Types (Models Agent)
152
+
153
+These types are defined once in `Models/` and shared across all agents:
154
+
155
+```swift
156
+// Models/TypeDistributionBin.swift
157
+@Model
158
+final class TypeDistributionBin {
159
+    var bucketStart: Date
160
+    var bucketEnd: Date
161
+    var count: Int
162
+}
163
+
164
+// Models/TypeCount.swift
165
+// TypeCount owns zero or more TypeDistributionBin records.
166
+// These bins store sample counts and import anchors, not raw health values.
167
+
168
+// Interface updated 2026-05-12 — see AGENTS.md
169
+// Models/HealthRecord.swift
170
+// HealthRecord stores one anonymized HealthKit record fingerprint plus its start/end dates.
171
+// It intentionally does not store raw health values, device identifiers, or source metadata.
172
+// UI may compare HealthRecord fingerprints between adjacent snapshots to expose local
173
+// record-level changes that are masked by newly-added records with the same total count.
174
+// High-volume snapshots store these records in TypeCount.recordArchiveData instead of
175
+// creating one SwiftData model per record, avoiding main-thread stalls after import.
176
+
177
+// Interface updated 2026-05-13 — see AGENTS.md
178
+// TypeDistributionBin also stores content hashes and HealthKit query anchors.
179
+// Import uses a global anchored query per data type so follow-up snapshots fetch only
180
+// HealthKit deltas instead of scanning calendar blocks with fixed per-query latency.
181
+
182
+// Interface updated 2026-05-18 — see AGENTS.md
183
+// SwiftData is not the forensic source of truth and is legacy/prototype storage
184
+// for the current app. Target architecture uses Core Data as a rebuildable
185
+// UI/reporting cache only. Complete HealthKit samples and metadata belong in the
186
+// SQLite archive store, in one schema that can preserve relationships across data
187
+// types, sources, devices, workouts, and metadata.
188
+
189
+// Interface updated 2026-05-18 — see AGENTS.md
190
+// Services/Protocols/HealthArchiveStore.swift defines the local archive boundary.
191
+// SQLiteHealthArchiveStore is the current implementation. HealthKit anchored-query
192
+// pages must be written to this archive before UI/cache rows are saved.
193
+// Deletions are recorded by sampleUUIDHash because HKDeletedObject exposes UUIDs,
194
+// not complete sample payloads.
195
+
196
+// Storage objective updated 2026-05-23 — see AGENTS.md
197
+// Recurring complete snapshots are out of scope for the target architecture.
198
+// Store differential observations, versioned sample payloads, observation ranges,
199
+// and materialized aggregates. Expensive counts used by reports/UI should be
200
+// cached in Core Data and be rebuildable from SQLite.
201
+
202
+// Objective updated 2026-05-23 — see AGENTS.md
203
+// HealthProbe is a local Health DB Time Machine. Snapshot/device identifiers are
204
+// retained only to preserve local provenance and keep comparisons within one
205
+// device chain. Record-count drops must be explained with aggregate and
206
+// representation context, not treated as inherently alarming.
207
+
208
+// Interface updated 2026-05-17 — see AGENTS.md
209
+// Models/TypeCount.detailCacheData stores precomputed detail data for the current
210
+// TypeCount compared with the immediately previous snapshot on the same device.
211
+// The cache contains aggregate added/disappeared counts, capped preview records for
212
+// UI drill-down, and daily change bins for temporal charts. It must be computed when
213
+// snapshots are saved and refreshed for neighboring snapshots when snapshot deletion
214
+// changes chain links. Existing stores are backfilled incrementally with a strict
215
+// per-launch TypeCount cap to avoid decoding many large archives in one run.
216
+
217
+// Interface updated 2026-05-17 — see AGENTS.md
218
+// Models/HealthSnapshot.contentEquivalentSnapshotID marks snapshots whose TypeCount
219
+// content is identical to a previous snapshot on the same device. These snapshots are
220
+// retained as temporal labels but behave as aliases to the representative content
221
+// snapshot for expensive detail cache/diff work.
222
+
223
+// Interface updated 2026-05-17 — see AGENTS.md
224
+// Models/TypeCount.contentEquivalentTypeCountID marks individual data types whose
225
+// content is identical to the previous snapshot's same TypeCount. This allows a
226
+// snapshot to contain real changes for some metrics while long-stable metrics behave
227
+// as temporal aliases and skip per-type detail cache/diff work.
228
+
229
+// Interface updated 2026-05-17 — see AGENTS.md
230
+// Models/HealthSnapshot stores cached overview scalars for UI consumption:
231
+// tracked type count, aggregate record count, and overall oldest/newest record dates.
232
+// These values must be computed during snapshot save while TypeCount data is already
233
+// in memory, so snapshot list/detail screens never recompute them by traversing
234
+// snapshot.typeCounts on the UI thread.
235
+
236
+// Interface updated 2026-05-17 — see AGENTS.md
237
+// Models/SnapshotDelta stores cached list/detail summary scalars derived from TypeDelta.
238
+// Overview screens consume these scalars and type-delta summaries directly instead of
239
+// recalculating per-snapshot changes from HealthSnapshot.typeCounts.
240
+
241
+// Interface updated 2026-05-23 — see AGENTS.md
242
+// Future UI/domain naming should prefer "change" or "observation diff" over
243
+// "anomaly". Existing AnomalyRecord/AnomalyType code is legacy naming until the
244
+// model replacement/refactor is implemented.
245
+enum ChangeClassification: String, Codable {
246
+    case appeared
247
+    case disappeared
248
+    case representationChanged = "representation_changed"
249
+    case consolidationLikely = "consolidation_likely"
250
+    case aggregateChanged = "aggregate_changed"
251
+    case uncertain
252
+}
253
+
254
+enum ReviewPriority: String, Codable, Comparable {
255
+    case info, review, important
256
+}
257
+
258
+enum ArchiveStatus: String {
259
+    case ready, needsCapture, degraded, unknown
260
+}
261
+```
262
+
263
+Any model changes must be announced in this file before other agents consume them.
264
+
265
+---
266
+
267
+## Handoff Process
268
+
269
+When a module is ready to be consumed by another agent:
270
+
271
+1. **Define the protocol** in `Services/Protocols/` (services agent)
272
+2. **Implement a mock** in `Utilities/Mocks.swift` (Claude Code)
273
+3. **Build UI against the mock** (Claude Code)
274
+4. **Replace mock with real implementation** (services agent)
275
+5. **Integration test** (tests agent)
276
+
277
+This allows UI development and service development to proceed in parallel.
278
+
279
+---
280
+
281
+## Algorithms & Change Explanation Logic
282
+
283
+The following modules involve non-trivial logic and should be reviewed carefully:
284
+
285
+| Module | File | Description |
286
+|--------|------|-------------|
287
+| **Change Explainer** | `Services/AnomalyDetector.swift` *(legacy name)* | Classify appeared/disappeared/representation-changed records without assuming loss |
288
+| **Consolidation Heuristics** | `Services/DivergenceEngine.swift` *(legacy name)* | Compare aggregates, intervals, and density to identify likely HealthKit consolidation |
289
+| **Fingerprinter** | `Services/SampleFingerprinter.swift` | Record matching via sample and semantic hashes |
290
+| **Snapshot Comparator** | `Services/SnapshotComparator.swift` | Diff between observations in one local device timeline |
291
+| **Distribution Comparator** | `Services/SnapshotDiffService.swift` | Daily per-type distribution diff to distinguish detail thinning from aggregate change |
292
+
293
+**Guidelines for algorithm modules:**
294
+- Document assumptions explicitly (e.g., "HealthProbe can only preserve detail it observed")
295
+- All thresholds (e.g., `age > 7 days`) must be configurable constants, not magic numbers
296
+- Include unit tests for edge cases (empty observations, partial data, clock skew, consolidation-like rewrites)
297
+- No UI code; return plain Swift types only
298
+
299
+---
300
+
301
+## Privacy Directives — All Agents
302
+
303
+**Mandatory across all modules:**
304
+- No credentials, API keys, tokens, or certificates in any file
305
+- No personal data: names, emails, phone numbers, dates of birth
306
+- No device identifiers: UDID, serial number, advertising ID, device name
307
+- No account identifiers: Apple ID, iCloud account info, CloudKit record IDs
308
+- No raw health values in code, tests, previews, logs, or comments
309
+- No location data or patterns enabling re-identification
310
+- Synthetic data only in tests and previews
311
+
312
+**Clarification:** “No raw health values” applies to this repository’s contents. The app may optionally store a user's raw HealthKit samples *locally on-device* for forensic backup purposes, but such samples must never appear in source control, logs, or docs.
313
+
314
+---
315
+
316
+## Communication Between Agents
317
+
318
+When one agent needs to communicate a decision or change to another:
319
+
320
+1. **Update this file** (`HealthProbe/Doc/00-agent-guides/AGENTS.md`) with the protocol/interface change
321
+2. **Update the relevant protocol** in `Services/Protocols/`
322
+3. **Add a comment** in the affected file: `// Interface updated YYYY-MM-DD — see AGENTS.md`
323
+
324
+---
325
+
326
+## Current Status
327
+
328
+| Module | Status | Owner |
329
+|--------|--------|-------|
330
+| Core Data UI/Report Cache | ⏳ Planned replacement of SwiftData | Models agent |
331
+| HealthKit Integration | ✅ Done | Services agent |
332
+| Snapshot Diff Service | ✅ Done | Services agent |
333
+| Service Protocols | ⏳ Not started | Services agent |
334
+| Change Explanation / Consolidation Heuristics | ⏳ Needs refocus | Services agent |
335
+| Context Monitor | ⏳ Not started | Services agent |
336
+| UI – App entry + TabView | ✅ Done | Claude Code |
337
+| UI – Dashboard | ✅ Done (functional, minimal) | Claude Code |
338
+| UI – Snapshots + Detail | ✅ Done | Claude Code |
339
+| UI – Data Types | ✅ Done | Claude Code |
340
+| UI – Settings | ✅ Done | Claude Code |
341
+| Unit Tests | ⏳ Not started | Tests agent |
+157 -0
HealthProbe/Doc/00-agent-guides/CLAUDE.md
@@ -0,0 +1,157 @@
1
+# HealthProbe - Claude/UI Agent Guide
2
+
3
+## Read First
4
+
5
+Before UI work, read:
6
+
7
+1. [`../README.md`](../README.md)
8
+2. [`../01-product/MVP-Specification.md`](../01-product/MVP-Specification.md)
9
+3. [`../02-architecture/Implementation-Guide.md`](../02-architecture/Implementation-Guide.md)
10
+4. this file
11
+
12
+## Current UI Objective
13
+
14
+HealthProbe is not an alert-first anomaly dashboard. It is a local Health DB Time Machine.
15
+
16
+The UI should help the user:
17
+- browse local observations over time;
18
+- inspect how HealthKit-accessible data looked at a selected observation date;
19
+- compare two local observations;
20
+- understand appeared/disappeared/representation-changed records without assuming data loss;
21
+- export selected point-in-time views and diffs;
22
+- see archive/cache health and capture status.
23
+
24
+## UI Ownership
25
+
26
+Claude/UI owns:
27
+- SwiftUI views in `HealthProbe/Views/`;
28
+- view models in `HealthProbe/ViewModels/`;
29
+- navigation and screen composition;
30
+- visual design, accessibility, Dynamic Type, and legacy-device UI simplification;
31
+- mock data for previews.
32
+
33
+Claude/UI does not own:
34
+- SQLite archive schema or analysis queries;
35
+- Core Data cache model design or storage replacement strategy;
36
+- HealthKit capture internals;
37
+- recovery/salvage tooling;
38
+- project entitlements or signing.
39
+
40
+Cross-boundary needs should be expressed as protocols or view-ready DTOs.
41
+
42
+## Target Screens
43
+
44
+Current target surfaces:
45
+
46
+```text
47
+App
48
+├── Dashboard / Capture Status
49
+├── Observation Timeline
50
+├── Observation Detail
51
+├── Diff Detail
52
+├── Data Types
53
+├── Export Preview / Export History
54
+├── Archive Status
55
+└── Settings
56
+```
57
+
58
+Legacy/low-memory devices may get simplified tables and summaries instead of heavy charts. Capture, reporting, and export remain more important than visual richness.
59
+
60
+## Legacy Device UI Mode
61
+
62
+Use simplified UI when the app runs on iOS 15-era devices, very small screens, or when memory/performance instrumentation shows repeated pressure during chart/detail screens.
63
+
64
+Simplify by:
65
+- preferring tables and summary rows over dense charts;
66
+- limiting default record previews;
67
+- loading one detail surface at a time;
68
+- using paged SQLite DTOs for drill-down;
69
+- hiding non-essential visual comparisons behind explicit taps.
70
+
71
+Do not remove:
72
+- capture controls;
73
+- archive health/status;
74
+- cached report summaries;
75
+- export flows;
76
+- paged record inspection.
77
+
78
+## Language Rules
79
+
80
+Prefer:
81
+- "records no longer visible in this observation"
82
+- "representation changed"
83
+- "consolidation likely"
84
+- "aggregate changed"
85
+- "cause not inferred"
86
+- "export selected evidence"
87
+
88
+Avoid:
89
+- "Apple lost your data"
90
+- "critical data loss" from counts alone
91
+- "sync bug"
92
+- "cross-device truth"
93
+- "restore from HealthProbe"
94
+
95
+## Design Guidelines
96
+
97
+Tone:
98
+- calm, technical, evidence-oriented;
99
+- no emergency language unless the user explicitly chooses an interpretation;
100
+- make uncertainty visible.
101
+
102
+Visual hierarchy:
103
+- summary first, details on demand;
104
+- paged record tables for large datasets;
105
+- chart only when it is cheap and helpful;
106
+- no UI that implies all data is loaded in memory.
107
+
108
+Controls:
109
+- segmented controls for observation/diff modes;
110
+- filters for type/date/source/change classification;
111
+- export buttons only for explicit user-triggered actions;
112
+- destructive actions require confirmation.
113
+
114
+Accessibility:
115
+- support Dynamic Type;
116
+- keep tables readable on small devices;
117
+- provide VoiceOver labels for charts and summary cards;
118
+- avoid color-only meaning.
119
+
120
+## Data Access Pattern
121
+
122
+Views should consume:
123
+- view models;
124
+- protocols;
125
+- paged DTOs;
126
+- Core Data cached summaries once implemented.
127
+
128
+Views should not:
129
+- query the large SQLite archive directly;
130
+- decode full record archives into arrays;
131
+- mutate HealthKit;
132
+- treat SwiftData as target architecture.
133
+
134
+## Preview Data
135
+
136
+Preview/mock data must be synthetic:
137
+- no real health values;
138
+- no real source names;
139
+- no device identifiers;
140
+- no real dates tied to a person.
141
+
142
+Use examples such as:
143
+- type: `step_count`
144
+- value: `42`
145
+- source hash: `source_hash_example`
146
+- record hash: `record_hash_example`
147
+
148
+## Completion Checklist
149
+
150
+- [ ] UI copy follows Time Machine / observation language.
151
+- [ ] No count-only critical loss messaging.
152
+- [ ] Large record lists are paged or mocked as paged.
153
+- [ ] Works on narrow/small-device layouts.
154
+- [ ] Dark mode and Dynamic Type reviewed.
155
+- [ ] VoiceOver labels exist for interactive elements.
156
+- [ ] No real health values in previews/tests.
157
+- [ ] No new dependency on SwiftData as target storage.
+225 -0
HealthProbe/Doc/01-product/Forensics-Limitations.md
@@ -0,0 +1,225 @@
1
+# HealthProbe - Risks, Limitations & Forensic Capabilities
2
+
3
+**Version:** 1.5
4
+**Last Updated:** 2026-05-23
5
+
6
+## 1. Product Boundary
7
+
8
+HealthProbe is a local Health DB Time Machine. It preserves selected HealthKit-accessible observations on one device, explains how those observations differ over time, and exports scoped evidence.
9
+
10
+HealthProbe is not:
11
+- a proof engine for Apple Health bugs
12
+- a cross-device HealthKit database comparator
13
+- a CloudKit/iCloud sync product
14
+- a guarantee that HealthKit currently exposes all historical detail
15
+- a replacement for Apple Health or Apple exports
16
+- a disaster-recovery tool that mutates HealthKit or patches iOS backups
17
+
18
+## 2. Known Limitations
19
+
20
+### 2.1 HealthKit Framework Constraints
21
+
22
+| Gap | Why | Mitigation |
23
+|-----|-----|------------|
24
+| **No stable raw database contract** | HealthKit exposes API objects, not a forensic SQLite dump | Store what the API exposes at each observation |
25
+| **Representation changes** | Older high-frequency samples can become intervalized, thinned, or aggregated | Compare records and aggregates; label consolidation separately from loss |
26
+| **Modifications without explicit change events** | HealthKit primarily reports added/deleted objects | Use observation diffs and fingerprints |
27
+| **Missed background windows** | iOS can delay or skip background work | Manual capture, app-launch capture, observation timestamps |
28
+| **Private Health types** | Some data is not available to third-party apps | Document inaccessible gaps |
29
+| **Cross-device divergence** | Apple devices may expose different local HealthKit states | Compare only within the current device timeline |
30
+
31
+### 2.2 Interpretation Limits
32
+
33
+A disappeared fingerprint does not automatically mean permanent user data loss. It may mean:
34
+- Apple Health consolidated older records
35
+- a permission changed
36
+- a source app rewrote or re-imported data
37
+- HealthKit query timing changed
38
+- the user deleted or edited data
39
+- the app missed one or more background windows
40
+
41
+HealthProbe should present evidence and uncertainty. Count-only drops must not be shown as critical alerts without supporting detail.
42
+
43
+### 2.3 Data Retention Constraints
44
+
45
+HealthProbe can only preserve detail after it has observed it. It cannot reconstruct records that were already aggregated or unavailable before installation.
46
+
47
+The local archive can preserve selected details beyond what future HealthKit queries or Apple exports expose, but only for data types the user has enabled and granted permission to read.
48
+
49
+## 3. Privacy & Security Risks
50
+
51
+| Risk | Impact | Mitigation |
52
+|------|--------|------------|
53
+| **Raw health data exposure** | Critical | Local-only archive; explicit user exports only |
54
+| **Device/source re-identification** | High | Hash or redact identifiers where possible; avoid personal data in logs/docs/tests |
55
+| **Behavior inference from timestamps** | Medium | No automatic cloud sync; scoped exports |
56
+| **Lost archive on uninstall/device loss** | High | Encourage explicit exports for important evidence |
57
+| **Local device compromise** | High | Rely on iOS data protection; no network copy by default |
58
+
59
+## 4. Questions HealthProbe Can Answer
60
+
61
+### Q1: "What did my Health data look like at this observation date?"
62
+
63
+Method:
64
+1. Select an observation.
65
+2. Load archived records visible at that observation.
66
+3. Show per-type counts, date ranges, aggregates, source summaries, and record tables.
67
+4. Export the selected view if needed.
68
+
69
+### Q2: "What changed between these two observations?"
70
+
71
+Method:
72
+1. Compare adjacent or selected observations from the same local device timeline.
73
+2. Group records as appeared, disappeared, retained, or representation-changed.
74
+3. Compare aggregate totals and coverage windows.
75
+4. Label likely consolidation when record detail changed but aggregate meaning is preserved.
76
+
77
+### Q3: "Can I export details that are no longer available from current HealthKit?"
78
+
79
+Method:
80
+1. Query the local archive by observation date and sample type.
81
+2. Select the historical record set.
82
+3. Export JSON/CSV plus manifest hashes and observation metadata.
83
+
84
+### Q4: "When did this record first become visible to HealthProbe?"
85
+
86
+Method:
87
+1. Search the archive observation history for the record fingerprint.
88
+2. Report first-seen, last-seen, last-verified, and disappeared-at timestamps.
89
+3. Include context events without claiming causality.
90
+
91
+### Q5: "Is a change probably consolidation rather than loss?"
92
+
93
+Method:
94
+1. Compare missing old records with newer interval or aggregate records over the same time window.
95
+2. Check value sums, date coverage, and source metadata.
96
+3. Report "consolidation likely" only when aggregate evidence supports it; otherwise report "uncertain."
97
+
98
+## 5. Export Formats
99
+
100
+### JSON Diff Report
101
+
102
+```json
103
+{
104
+  "report_id": "DIFF_SYNTHETIC_001",
105
+  "exported_at": "2026-05-23T12:00:00Z",
106
+  "from_observation": "2026-04-23T12:00:00Z",
107
+  "to_observation": "2026-05-23T12:00:00Z",
108
+  "sample_type": "HKQuantityTypeIdentifierStepCount",
109
+  "summary": {
110
+    "appeared": 12,
111
+    "disappeared": 84,
112
+    "representation_changed": 6,
113
+    "aggregate_delta_percent": 0.1,
114
+    "label": "consolidation_likely"
115
+  },
116
+  "manifest_hash": "synthetic-example-hash"
117
+}
118
+```
119
+
120
+### CSV Point-In-Time Table
121
+
122
+```csv
123
+observation_at,type,start_date,end_date,value,unit,source_hash,record_hash
124
+2026-05-23T12:00:00Z,step_count,2026-01-01T00:00:00Z,2026-01-01T00:10:00Z,42,count,source_hash_example,record_hash_example
125
+```
126
+
127
+### Markdown Summary
128
+
129
+```markdown
130
+# HealthProbe Observation Diff
131
+
132
+Observation A: 2026-04-23 12:00 UTC
133
+Observation B: 2026-05-23 12:00 UTC
134
+
135
+## Summary
136
+- Appeared records: 12
137
+- Disappeared records: 84
138
+- Representation changes: 6
139
+- Aggregate delta: 0.1%
140
+- Interpretation: consolidation likely
141
+
142
+## Notes
143
+This report describes local HealthKit observations from one device.
144
+It does not infer cause or compare against another device as a source of truth.
145
+```
146
+
147
+## 6. Forensic Techniques Enabled
148
+
149
+**Timeline reconstruction:** Walk observation history and rebuild visible records for a selected date.
150
+
151
+**Representation-change analysis:** Compare count, intervals, value sums, value max, and coverage windows to distinguish detail thinning from meaningful aggregate drift.
152
+
153
+**Archive-backed export:** Export data from HealthProbe's local archive even when a current HealthKit query no longer exposes the same record-level detail.
154
+
155
+**Context correlation:** Place local changes near iOS/app/permission/iCloud-state events while avoiding causal claims.
156
+
157
+## 7. Recovery-Compatible Export Boundary
158
+
159
+External tools may use HealthProbe exports to:
160
+- inspect what HealthProbe observed at a point in time;
161
+- compare exported records with backup/XML/database evidence;
162
+- build salvage workflows outside the app;
163
+- re-publish values into another system with explicit user consent.
164
+
165
+HealthProbe exports should therefore preserve:
166
+- values, dates, units, and type identifiers;
167
+- stable hashes and payload version hashes;
168
+- source/provenance hashes where available;
169
+- relationships where available and in scope;
170
+- observation history and manifest/item hashes.
171
+
172
+HealthProbe exports cannot guarantee:
173
+- original Apple/private database primary keys;
174
+- original HealthKit metadata that was never exposed to the app;
175
+- proof that a disappeared record was deleted by Apple;
176
+- lossless re-publication into HealthKit with original provenance.
177
+
178
+The iOS app remains read-only. Any backup transplant, HealthKit re-publication, or disaster-recovery procedure is external tooling, not an in-app feature.
179
+
180
+## 8. Recommended Usage
181
+
182
+### Individual Users
183
+
184
+1. Enable the data types that matter most.
185
+2. Let HealthProbe build an initial archive.
186
+3. Capture manually before/after OS updates, restores, migrations, or major app changes.
187
+4. Use the timeline to inspect old observations.
188
+5. Export selected views when a detail set matters.
189
+
190
+### Researchers And Support
191
+
192
+1. Work from scoped exports, not raw device databases.
193
+2. Treat HealthProbe evidence as local observation history.
194
+3. Avoid claiming root cause without additional Apple/system evidence.
195
+4. Prefer aggregate-neutral language: changed, appeared, disappeared, consolidated, uncertain.
196
+
197
+## 9. Future Enhancements
198
+
199
+- richer point-in-time query tools
200
+- improved consolidation heuristics
201
+- archive integrity audits and repair workflows
202
+- optional encrypted archive copy chosen by the user
203
+- macOS analysis of explicit HealthProbe exports
204
+- recovery-compatible archive/export manifests that external tools can use as input, without adding restore or re-publication features to the app
205
+
206
+## 10. Troubleshooting
207
+
208
+| Issue | Cause | Fix |
209
+|-------|-------|-----|
210
+| **No recent observations** | Background refresh disabled or app not opened | Open app and run manual capture |
211
+| **Some types missing** | HealthKit permission not granted or type unavailable | Review permissions and selected types |
212
+| **Large count drop shown** | Possible consolidation or query/permission change | Inspect diff and aggregate evidence before interpreting |
213
+| **Old detail unavailable** | HealthProbe did not observe it before aggregation | Only future observations can be preserved |
214
+| **Export too large** | High-frequency type selected over long interval | Narrow date/type filter |
215
+
216
+## 11. References
217
+
218
+- Apple HealthKit Framework: https://developer.apple.com/documentation/healthkit/
219
+- HKAnchoredObjectQuery: https://developer.apple.com/documentation/healthkit/hkanchoredobjectquery
220
+- Core Data: https://developer.apple.com/documentation/coredata
221
+- DearApple Issue #001: historical context for reported Apple Health data anomalies
222
+
223
+---
224
+
225
+*HealthProbe - A local time machine for HealthKit-accessible data.*
+154 -0
HealthProbe/Doc/01-product/MVP-Specification.md
@@ -0,0 +1,154 @@
1
+# HealthProbe iOS - Specification (MVP)
2
+
3
+**Version:** 1.5
4
+**Last Updated:** 2026-05-23
5
+**Status:** MVP iOS local Health DB Time Machine
6
+
7
+## Overview
8
+
9
+HealthProbe is a read-only iOS app that captures selected HealthKit data into a local archive so the user can revisit how their Health database looked at earlier observation dates.
10
+
11
+The MVP is not a cloud sync product and not a cross-device comparator. It is a single-device local timeline:
12
+- capture what HealthKit exposes today
13
+- preserve selected details before they are aggregated, consolidated, or no longer queryable
14
+- show what changed between local observations
15
+- export selected historical views and record tables
16
+
17
+The original motivation was data-loss detection. The current objective is broader and calmer: help the user understand the evolution of their local Health database over time. A record-count drop is a change to explain, not automatically an emergency.
18
+
19
+## Core Principles
20
+
21
+1. **Read-only with respect to HealthKit**
22
+   - Never modify or delete HealthKit data
23
+   - Only observe, archive, compare, and export
24
+
25
+2. **Single-device local timeline**
26
+   - Compare only observations captured on this device
27
+   - Do not infer correctness by comparing HealthKit databases from different devices
28
+
29
+3. **Local-first storage**
30
+   - HealthProbe data works without network access
31
+   - No HealthProbe CloudKit/iCloud sync for raw samples, digests, reports, or caches
32
+
33
+4. **Archive-first design**
34
+   - The local archive is the source of truth
35
+   - SQLite stores differential observations and performs large analyses
36
+   - Core Data stores UI/report/cache/settings/history data that can be rebuilt from the archive
37
+
38
+5. **Legacy-device support**
39
+   - Keep iOS 15-era Health collection devices in scope
40
+   - Do not require SwiftData for target architecture
41
+   - Simplify heavy visualizations on legacy devices while preserving capture, reporting, and export
42
+
43
+6. **Consolidation-aware interpretation**
44
+   - Apple Health may aggregate or rewrite older high-frequency records
45
+   - Count-only alerts are not reliable evidence of loss
46
+   - UI language should describe additions, removals, consolidation, and uncertainty
47
+
48
+7. **Differential storage**
49
+   - Do not append complete periodic snapshots for large datasets
50
+   - Store sample identities, payload versions, observation events/ranges, and aggregates
51
+
52
+## MVP Features
53
+
54
+### 1. Local HealthKit Capture
55
+
56
+Use:
57
+- `HKAnchoredObjectQuery` for incremental capture
58
+- `HKObserverQuery` as a prompt to refresh when iOS wakes the app
59
+- manual capture from the app UI
60
+
61
+Track initially:
62
+- workouts
63
+- steps and activity quantities
64
+- heart rate and other high-frequency quantities selected by the user
65
+- sleep and relevant category samples
66
+- additional types through a selected-type registry
67
+
68
+Persist differentially in the local SQLite archive where HealthKit exposes it:
69
+- sample UUID hash, type, start/end date, value, and unit
70
+- source and source revision metadata
71
+- HealthKit metadata dictionaries
72
+- device provenance exposed by HealthKit, subject to privacy redaction/hash policy
73
+- first-seen, last-seen, last-verified, disappeared-at observations
74
+- fingerprints for internal matching and explicit user exports
75
+- materialized per-type/per-day aggregates needed by reports and presentation
76
+
77
+### 2. Health DB Time Machine
78
+
79
+The app must let the user answer:
80
+- "What did my accessible HealthKit data look like on this observation date?"
81
+- "What changed between these two observations?"
82
+- "Which older details can HealthProbe still export even if HealthKit later consolidates them?"
83
+
84
+Core views:
85
+- observation timeline
86
+- point-in-time summary
87
+- per-type detail table
88
+- adjacent-observation diff
89
+- selected-record export preview
90
+
91
+On legacy/low-memory devices, heavy charts may be replaced with tables, summaries, and generated reports. Export/report correctness is more important than rich visualization.
92
+
93
+### 3. Change Explanation
94
+
95
+Changes are classified as observations, not accusations:
96
+- **Appeared:** record/fingerprint was absent before and present now
97
+- **Disappeared:** record/fingerprint was present before and absent now
98
+- **Changed representation:** aggregates remain similar but record granularity, intervals, timestamps, or values changed
99
+- **Consolidation likely:** old high-frequency detail appears thinned or intervalized while aggregate totals remain explainable
100
+- **Uncertain:** HealthKit/API/background constraints prevent confident classification
101
+
102
+Severity should be used sparingly. The MVP should prefer neutral labels and evidence summaries over alarm language.
103
+
104
+### 4. Exports
105
+
106
+Exports are a primary product feature. They preserve selected historical evidence and support external analysis.
107
+
108
+MVP exports:
109
+- selected point-in-time table as JSON/CSV
110
+- diff report between two observations
111
+- manifest with hashes and observation metadata
112
+- selected disappeared/appeared/changed records
113
+
114
+Routine full-database export is not an MVP goal. The archive itself is the local backup; user-facing exports are scoped to what the user is inspecting.
115
+
116
+Exports must stream or page from SQLite. The app must not materialize large high-frequency record sets in RAM before writing an export.
117
+
118
+### 5. Context Logging
119
+
120
+Health/iCloud state may be logged as context only:
121
+- HealthKit permission changes
122
+- app version / iOS version
123
+- local capture start/end/failure
124
+- iCloud sign-in state if available
125
+
126
+Context logging must not become HealthProbe cloud sync and must not imply that iCloud state proves why a change happened.
127
+
128
+## Out Of Scope
129
+
130
+- HealthProbe CloudKit/iCloud sync
131
+- comparing snapshots from different devices
132
+- claiming Apple lost data solely because sample counts changed
133
+- predicting future loss
134
+- modifying HealthKit data
135
+- restoring by patching/transplanting Health database files into iOS backups
136
+- re-publishing archived samples into HealthKit as HealthProbe-owned replacement records
137
+- automatic upload or community reporting
138
+
139
+The archive/export format should still preserve enough structure for external recovery tools to use it as input.
140
+
141
+## Success Criteria
142
+
143
+| Objective | MVP Target |
144
+|-----------|------------|
145
+| Historical inspection | User can select an observation and inspect per-type counts, ranges, and records |
146
+| Change explanation | User can compare adjacent observations with neutral, consolidation-aware labels |
147
+| Preservation | User can export selected records/details that may no longer be available from HealthKit later |
148
+| Legacy support | Capture/report/export works on iOS 15-era devices with simplified visuals where needed |
149
+| Privacy | All HealthProbe data remains local unless the user explicitly exports a file |
150
+| Performance | High-frequency capture streams to the archive without blocking UI |
151
+
152
+---
153
+
154
+*HealthProbe iOS: a local time machine for HealthKit-accessible data.*
+510 -0
HealthProbe/Doc/01-product/Product-Specification.md
@@ -0,0 +1,510 @@
1
+# HealthProbe – Complete Specification & Motivations
2
+
3
+**Version:** 1.5
4
+**Status:** MVP (iOS local Health DB Time Machine)
5
+**Last Updated:** 2026-05-23
6
+
7
+---
8
+
9
+## 1. Executive Summary
10
+
11
+HealthProbe is a **local time machine for Apple HealthKit-accessible data**. It captures selected local HealthKit observations over time so the user can inspect how their Health database looked at a chosen date, understand what changed, and export details that may later be unavailable after aggregation, consolidation, pruning, or API limitations.
12
+
13
+**Core Problem:** Apple Health is not a stable forensic record store. The same user-visible health history may be represented differently over time: high-frequency samples can be aggregated, old records can become intervalized, exports can lose detail, and different Apple devices may expose different local HealthKit database states. A count drop is therefore not reliable proof of loss.
14
+
15
+**Solution:** HealthProbe incrementally captures selected HealthKit data into a robust local SQLite archive and presents a single-device observation timeline. The archive is differential and analysis-capable: it stores observations, sample identities, payload versions, visibility ranges/events, and materialized aggregates rather than recurring complete snapshot copies. Core Data is the target UI/reporting cache for expensive counts, summaries, report metadata, and display state; it is rebuildable and not the source of truth.
16
+
17
+---
18
+
19
+## 2. Motivation & Product History
20
+
21
+### 2.1 Original Trigger
22
+
23
+HealthProbe started after a user-observed mass disappearance of Apple Health detail. The first idea was simple: count records, compare snapshots, and warn the user when Apple Health appeared to "lose" data.
24
+
25
+That framing was useful for discovery but incomplete.
26
+
27
+### 2.2 What Changed
28
+
29
+Further observation showed that older HealthKit data can change representation without necessarily representing user-meaningful loss:
30
+- high-frequency samples may be thinned
31
+- old point samples may become interval samples
32
+- per-record values may change while daily/monthly aggregates remain explainable
33
+- HealthKit exports taken months apart may not contain the same record-level detail
34
+- different devices can expose different local HealthKit database states
35
+
36
+Because of this, record-by-record cross-device comparison is out of scope and count-only alerts create false alarms.
37
+
38
+### 2.3 Current Objective
39
+
40
+HealthProbe is now a single-device local Health DB Time Machine:
41
+- capture selected HealthKit-accessible data as it exists at observation time
42
+- reconstruct how the local Health database looked at a chosen date
43
+- show additions, removals, representation changes, and aggregate changes between observations
44
+- preserve local evidence that HealthKit may later aggregate or no longer export
45
+- export scoped historical views for personal backup, support, research, or external analysis
46
+
47
+### 2.4 Interpretation Model
48
+
49
+HealthProbe describes changes neutrally:
50
+- **Appeared:** a record/fingerprint is newly visible
51
+- **Disappeared:** a record/fingerprint is no longer visible
52
+- **Representation changed:** timestamps, intervals, values, or granularity changed
53
+- **Consolidation likely:** record detail decreased while aggregates remain explainable
54
+- **Uncertain:** available evidence cannot distinguish user action, HealthKit behavior, app permissions, background timing, or system state
55
+
56
+These classifications are evidence labels, not claims about Apple's intent or definitive proof of corruption.
57
+
58
+### 2.5 Why This Matters
59
+
60
+| Concern | Impact | HealthProbe Role |
61
+|---------|--------|-----------------|
62
+| **Historical detail is unstable** | Older HealthKit detail may later be aggregated or unavailable | Preserve selected observations locally |
63
+| **Counts mislead** | A sample-count drop may be consolidation, not loss | Explain changes with record and aggregate evidence |
64
+| **No built-in time travel** | Health.app shows current state, not prior database states | Provide local point-in-time views |
65
+| **Exports are time-sensitive** | Future exports may not contain old detail | Export selected evidence while it exists locally |
66
+| **Privacy of monitoring** | Health data is sensitive | Local-only archive, explicit user exports |
67
+
68
+---
69
+
70
+## 3. Core Architecture
71
+
72
+### 3.1 Design Principles
73
+
74
+1. **Read-only operations** (never modify HealthKit data)
75
+2. **Local-first** (full functionality without network)
76
+3. **Incremental queries** (efficient, avoid repeating work)
77
+4. **Single archive store** (do not split the forensic store per data type; cross-type relationships and shared metadata matter)
78
+5. **Auditability** (every observation logged, timestamped, reproducible)
79
+6. **Privacy by default** (no HealthProbe cloud sync; local storage remains under user control)
80
+7. **Time-machine capture** (selected data types are archived locally so prior HealthKit-accessible states can be revisited)
81
+8. **Single-device timeline** (snapshot comparisons stay within the local device chain; cross-device record comparison is not a product goal)
82
+9. **Consolidation-aware explanations** (record-count changes are described with uncertainty and aggregate context)
83
+10. **Legacy device support** (target iOS 15-era Health collection devices; avoid SwiftData as a required foundation)
84
+11. **SQL-first analysis** (large diffs and reports run inside SQLite using indexes, temporary tables, joins, and paged results)
85
+
86
+### 3.2 Threading Model
87
+
88
+```
89
+┌─────────────────────────────────────────┐
90
+│  Main Thread (UI)                       │
91
+│  - Display archive and capture status   │
92
+│  - Show timeline, diffs, exports        │
93
+│  - User interaction                     │
94
+└──────────────┬──────────────────────────┘
95
+               │
96
+               ├─ Delegate query results
97
+               │
98
+┌──────────────▼──────────────────────────┐
99
+│  Background Queue (HealthKit Queries)   │
100
+│  - HKAnchoredObjectQuery (efficient)    │
101
+│  - HKObserverQuery (reactive)           │
102
+│  - Observation comparisons              │
103
+│  - Change explanation logic             │
104
+└──────────────┬──────────────────────────┘
105
+               │
106
+               ├─ Write observations and change summaries
107
+               │
108
+┌──────────────▼──────────────────────────┐
109
+│  Local Archive Store                    │
110
+│  - Canonical HealthKit samples          │
111
+│  - Sources, devices, metadata           │
112
+│  - Cross-type relationships             │
113
+│  - Fingerprints and verification hashes │
114
+└──────────────┬──────────────────────────┘
115
+               │
116
+┌──────────────▼──────────────────────────┐
117
+│  Core Data UI/Report Cache              │
118
+│  - Precomputed counts/statistics        │
119
+│  - Visualization state and settings     │
120
+│  - Logs, history, report indexes        │
121
+└─────────────────────────────────────────┘
122
+```
123
+
124
+### 3.3 Storage Model
125
+
126
+**SQLite Archive Store (source of truth):**
127
+- one robust local database for all archived samples and analysis, not one archive per data type
128
+- differential observation storage, not recurring complete snapshot copies
129
+- normalized entities for samples, sample payload versions, workouts, sources, source revisions, devices, metadata, relationships, and observations
130
+- multiple fingerprints per sample: HealthKit UUID hash, strict fingerprint, semantic fingerprint, and fuzzy matching keys for export/backup reconciliation
131
+- append-only observation history (`firstSeen`, `lastSeen`, `lastVerified`, disappearance evidence)
132
+- visibility ranges/events so point-in-time reconstruction can be queried without duplicating every record per observation
133
+- materialized aggregates for expensive counts and report inputs
134
+- snapshot/observation-level and table-level hashes for integrity checks
135
+- SQL-first analysis using indexes, temporary tables, joins, CTEs, and streaming/paged result sets
136
+
137
+**Core Data UI/Report Cache (derived/cache layer):**
138
+- settings and selected data types
139
+- import job state and progress
140
+- precomputed counts, temporal bins, display ranges, and summary statistics
141
+- audit log entries and report indexes
142
+- change summaries and links into the archive store
143
+
144
+Core Data cache rows must be rebuildable from the local archive store. If the two disagree, the SQLite archive wins. Current SwiftData models are legacy/prototype implementation details until this cache layer replaces them.
145
+
146
+There are no real deployments, only test installations. Existing prototype stores are disposable during the archive v2 refactor and do not require backward-compatible migration.
147
+
148
+### 3.4 Storage Architecture Decision
149
+
150
+HealthProbe keeps the database on-device. The current objectives do not require a server or external analytical engine for the iOS app. SQLite is the durable archive and analysis engine because it supports the operations needed for large local Health datasets: indexes, temporary tables, joins, CTEs, transactions, and paged/streaming reads.
151
+
152
+Core Data is the right destination for cached counts and UI/reporting state because:
153
+- it supports legacy iOS versions that SwiftData does not;
154
+- it is well-suited for bounded object graphs and presentation-ready summaries;
155
+- cached rows can be deleted and rebuilt from SQLite;
156
+- expensive counts can be persisted without forcing every screen/report to recalculate them.
157
+
158
+Core Data is not the right place for heavy archive analysis. Diffs across large observations, export selection, consolidation heuristics over high-frequency records, and record table pagination should run against SQLite and return bounded result pages or materialized summary rows.
159
+
160
+The target storage split is therefore:
161
+- `HealthProbeArchive.sqlite`: source of truth, differential observation storage, SQL analysis, export source;
162
+- Core Data cache store: rebuildable UI/report summaries, expensive counts, timeline rows, progress, settings, and export metadata.
163
+
164
+This resolves the product tension: the app remains usable on older Health collection devices while still allowing large-dataset analysis without loading entire snapshots into RAM.
165
+
166
+---
167
+
168
+## 4. Time Machine Features (MVP)
169
+
170
+### 4.1 Incremental Capture
171
+
172
+**Using `HKAnchoredObjectQuery`:**
173
+```
174
+Query pattern:
175
+├─ Initial query: anchor = 0 → captures all existing data
176
+├─ Store anchor locally
177
+├─ Periodic queries: anchor = stored → captures only new/modified samples
178
+└─ Update anchor → efficient incremental updates
179
+```
180
+
181
+**What triggers capture:**
182
+- App launch
183
+- Background refresh (iOS allows periodic background queries)
184
+- User manually triggers capture
185
+- Every 12-24 hours (configurable)
186
+
187
+### 4.2 Tracked Sample Types (Extensible)
188
+
189
+| Type | Why Captured | Change Signal |
190
+|------|---------------|----------------|
191
+| **Workouts** | High-value user records | Appeared/disappeared records, metadata changes |
192
+| **Heart Rate** | High-frequency detail likely to be consolidated | Granularity changes, intervalization, aggregate drift |
193
+| **Activity Summary** | Auto-computed, depends on other types | Recalculation between observations |
194
+| **Steps** | Cumulative and often consolidated | Aggregate preservation vs record thinning |
195
+| **Sleep** | Frequently edited and reclassified | Stage/category representation changes |
196
+| **Blood Pressure** | Manual/clinical-style records | Point-in-time history and export preservation |
197
+| **Audio Exposure** | Often high-frequency/device-specific | Detail retention and later aggregation |
198
+
199
+### 4.3 Change Explanation Logic
200
+
201
+#### A. Point-In-Time Reconstruction
202
+```
203
+Input: observation timestamp T and selected sample type
204
+Use: archived records whose observation history makes them visible at T
205
+Output: table, aggregates, source breakdown, and manifest hash
206
+```
207
+
208
+#### B. Adjacent Observation Diff
209
+```
210
+Previous observation: S_prev
211
+Current observation:  S_now
212
+
213
+Appeared     = S_now - S_prev
214
+Disappeared = S_prev - S_now
215
+Retained    = S_now ∩ S_prev
216
+
217
+For retained semantic groups:
218
+  compare record count, interval length, value sum, value max, and source metadata
219
+```
220
+
221
+#### C. Consolidation Heuristic
222
+```
223
+IF old high-frequency records disappear
224
+AND newer interval records cover similar time ranges
225
+AND aggregate sums remain within tolerance
226
+THEN classify as "consolidation likely"
227
+ELSE classify as "changed/uncertain" with evidence
228
+```
229
+
230
+#### D. Export Preservation
231
+```
232
+For selected records or diffs:
233
+  export archived details, observation metadata, hashes, and explanatory labels
234
+  never require a cloud round trip
235
+```
236
+
237
+---
238
+
239
+## 5. Context Logging
240
+
241
+HealthProbe does **not** sync its own archive through iCloud or CloudKit. Observed HealthKit databases can diverge between devices, and HealthProbe no longer attempts to compare snapshots from different devices. The product scope is one local observation timeline on the current device.
242
+
243
+Health/iCloud state is still useful as **context** for interpreting local changes, but it is not treated as proof of cause.
244
+
245
+### 5.1 Context Tracking
246
+
247
+**Observe HealthKit permission & coarse system context:**
248
+```swift
249
+HKHealthStore().requestAuthorization(...)
250
+// → Detect when user grants/revokes permissions
251
+
252
+// Monitor iCloud sign-in state as context only
253
+FileManager.default.ubiquityIdentityToken
254
+// → Detects iCloud sign-in/sign-out
255
+// → Logs context for later correlation
256
+```
257
+
258
+**Capture lifecycle events:**
259
+- iCloud sign-in detected → log context and schedule a local archive verification pass
260
+- iCloud sign-out detected → note local-only mode
261
+- Device backup initiated → pre-backup snapshot
262
+- App backgrounded/foregrounded → capture if needed
263
+
264
+### 5.2 Context Documentation
265
+
266
+**Audit trail entries:**
267
+```
268
+[2026-05-01 14:23:15] SYNC_STATE_CHANGE: iCloud enabled
269
+  - Previous: local-only
270
+  - Action: archive verification scheduled
271
+  - Result: no HealthProbe cloud sync performed
272
+
273
+[2026-05-01 14:24:02] CAPTURE_COMPLETED: local HealthKit observation
274
+  - Samples observed: 87 new, 3 no longer visible
275
+  - Representation changes: 2 groups
276
+  - Cause: not inferred
277
+
278
+[2026-05-01 16:15:00] CHANGE_SUMMARY: Historical record appeared
279
+  - Type: Workout
280
+  - Record date: 2024-03-15
281
+  - First observed by HealthProbe: 2026-05-01
282
+  - Label: appeared; cause unknown
283
+```
284
+
285
+### 5.3 Background Monitoring
286
+
287
+**iOS Background Modes enabled:**
288
+- `background-fetch` — periodic archive and context checks
289
+- `remote-notification` → not required for HealthProbe archive sync
290
+
291
+**Check frequency:**
292
+- Min: 2 hours
293
+- Max: 24 hours
294
+- Adapts based on archive cost and user preference
295
+
296
+---
297
+
298
+## 6. Local Archive, Reports & Forensics
299
+
300
+### 6.1 Local Archive Store
301
+
302
+The main backup artifact is the on-device archive store. It is populated incrementally from HealthKit and is not dependent on Apple Health ZIP exports or full encrypted iPhone backups.
303
+
304
+The archive must preserve as much HealthKit information as the API exposes:
305
+- sample UUID, type, start/end date, value, unit, and metadata
306
+- source, source revision, bundle identifier, product type, version/build if available
307
+- device fields exposed by `HKDevice`
308
+- relationships between workouts, samples, events, and other linked records where available
309
+- first-seen / last-seen / last-verified observations
310
+- fingerprints suitable for matching against Apple Health XML exports and extracted backup databases
311
+
312
+The archive is selected by data type for performance and privacy, but it is stored in **one schema** so later analysis can follow relationships between types.
313
+
314
+### 6.2 Reports and Point Exports
315
+
316
+HealthProbe does not need to optimize for routine complete exports. The local archive is the backup; point exports are the user-facing way to preserve or share a historical view.
317
+
318
+Export is scoped to what the user is inspecting:
319
+- point-in-time record tables
320
+- diff reports between two observations
321
+- point-in-time manifests and hashes
322
+- selected record sets needed for external analysis
323
+
324
+### 6.3 Forensic Query Examples
325
+
326
+**"What did my step data look like on March 1?"**
327
+```
328
+1. Select the March 1 observation
329
+2. Load archived visible records and aggregates for Steps
330
+3. Show counts, time range, daily totals, and source breakdown
331
+4. Offer JSON/CSV export for the selected view
332
+```
333
+
334
+**"What changed since the previous observation?"**
335
+```
336
+1. Compare adjacent local observations
337
+2. Group appeared, disappeared, retained, and representation-changed records
338
+3. Explain likely consolidation when aggregates remain stable
339
+```
340
+
341
+**"Can I still export detail that HealthKit no longer shows?"**
342
+```
343
+1. Search the local archive for the earlier observation
344
+2. Select the preserved records or diff set
345
+3. Export archived details with manifest hashes and observation metadata
346
+```
347
+
348
+---
349
+
350
+## 7. User-Facing Features
351
+
352
+### 7.1 Dashboard (iOS App)
353
+
354
+**Home Screen:**
355
+- **Latest Observation** — timestamp and capture quality
356
+- **Archive Coverage** — selected data types, date range, storage use
357
+- **Recent Changes** — neutral summary of appeared/disappeared/changed records
358
+- **Export Shortcuts** — selected observation or diff report
359
+
360
+**Detail Views:**
361
+- **Timeline** — local observations over time
362
+- **Observation Detail** — point-in-time tables and aggregates
363
+- **Diff Detail** — changes between two observations
364
+- **Audit Trail** — complete immutable log
365
+- **Archive Status** — current local archive health, last verification, selected data types
366
+
367
+**Settings:**
368
+- Capture frequency
369
+- Sample types to track
370
+- Change-label thresholds/tolerances
371
+- Local archive retention and report export options
372
+
373
+### 7.2 Notifications
374
+
375
+Notification-led alerting is not a current product objective. The app may later add reminders for scheduled capture or completed exports, but alerts about presumed data loss are explicitly out of scope.
376
+
377
+---
378
+
379
+## 8. Future Work Parking Lot
380
+
381
+Items in this section are not active product objectives. They require a separate scope decision before implementation.
382
+
383
+### 8.1 Better Reconstruction
384
+- richer point-in-time query language
385
+- improved consolidation heuristics
386
+- archive compaction without losing observation history
387
+
388
+### 8.2 External Analysis Tools
389
+- Analyze explicit HealthProbe exports outside the iOS app if needed for the DearApple investigation/article
390
+- Do not treat a macOS companion, community sharing, or open-source publication as committed product scope
391
+
392
+### 8.3 Recovery-Compatible Archives
393
+
394
+HealthProbe will not perform recovery workflows. It will not patch iOS backups, transplant Health database files, or re-publish archived values into HealthKit.
395
+
396
+However, HealthProbe archives and exports should be suitable input for external recovery/salvage procedures, including:
397
+
398
+1. **Backup transplant restoration outside the app**
399
+   - External tooling may use HealthProbe evidence alongside the reverse of the DearApple scratchpad HealthDB extraction/reinsertion workflow.
400
+   - Archive/export requirements: preserve source database identity where available, record identity, payload versions, dates, values, units, metadata, relationships, observation timestamps, and manifest hashes.
401
+
402
+2. **HealthKit re-publication outside the app**
403
+   - External tooling may choose to write missing values back through HealthKit as new app-owned samples.
404
+   - Archive/export requirements: preserve enough value/date/unit/type detail to recreate user-visible values, plus explicit provenance warnings because original source/device/sync metadata may be lost.
405
+
406
+Recovery compatibility is therefore an archive/export design requirement, not an in-app restore feature. The app remains read-only.
407
+
408
+---
409
+
410
+## 9. Technical Specifications
411
+
412
+### 9.1 Platform
413
+- **iOS 15.0+** (HealthKit framework support; keeps iPhone 6s-era Health collection devices in scope)
414
+Legacy Apple Watch devices remain relevant as Health data sources paired to the target iPhone, but HealthProbe itself is scoped as an iOS app.
415
+
416
+### 9.2 Permissions Required
417
+- `HealthKit` — read-only access to specified types
418
+- `Background Modes` — "Background Fetch"
419
+
420
+### 9.3 Data Storage
421
+- **SQLite Archive Store:** canonical differential HealthKit observation archive and analysis engine (source of truth)
422
+- **Core Data:** derived UI/report/cache/settings/log/history store, rebuildable from SQLite
423
+- **No CloudKit sync:** HealthProbe data remains local unless the user exports a report or selected record table
424
+
425
+### 9.4 Performance
426
+- Query time: < 5 seconds (anchored queries)
427
+- UI/report cache size: bounded, rebuildable, and safe to purge
428
+- Archive storage: differential; depends on selected high-frequency data types and number of representation changes, not number of full periodic snapshots
429
+- Large analysis: runs in SQLite with paged results; Swift must not load full high-frequency datasets into RAM
430
+
431
+---
432
+
433
+## 10. Privacy & Security
434
+
435
+### 10.1 What HealthProbe Never Does
436
+- ❌ Exports raw health samples to cloud
437
+- ❌ Identifies users by name/account
438
+- ❌ Shares device location or personal context
439
+- ❌ Modifies any HealthKit data
440
+- ❌ Patches, transplants, or rewrites iOS backup databases
441
+- ❌ Sells or shares data with third parties
442
+
443
+### 10.2 What HealthProbe Collects (Local Only)
444
+- ✅ Aggregated counts and per-sample archive data for user-selected types
445
+- ✅ Observation and change timestamps
446
+- ✅ Device model & iOS version (for context)
447
+- ✅ Change labels and evidence summaries
448
+
449
+**Local archive:**
450
+- ✅ Per-sample archive for user-selected types, stored on-device and exportable by user
451
+- ✅ Metadata needed for recognition in Apple Health XML exports, backup database extracts, and future datasets
452
+
453
+### 10.3 Cloud Policy
454
+- No HealthProbe CloudKit/iCloud sync
455
+- No cross-device HealthProbe snapshot comparison
456
+- No automatic upload of raw samples, digests, reports, or device fingerprints
457
+- User-triggered exports are explicit, scoped, and local-file based
458
+
459
+---
460
+
461
+## 11. Success Criteria
462
+
463
+| Objective | Metric | Target |
464
+|-----------|--------|--------|
465
+| **Time-machine inspection** | User can inspect a selected observation | All captured types |
466
+| **Change explanation** | Diffs include neutral labels and evidence | > 95% of visible changes classified or marked uncertain |
467
+| **Export preservation** | Selected historical records can be exported | JSON/CSV with manifest hashes |
468
+| **False alarms** | Count-only drops framed as critical loss | 0 by design |
469
+| **Privacy** | % of users comfortable with data practices | > 90% |
470
+| **Performance** | Background capture battery impact | < 2% drain/day |
471
+| **Reproducibility** | Users can preserve scoped evidence | High relevance for personal analysis and the DearApple investigation context |
472
+
473
+---
474
+
475
+## 12. References & Related Work
476
+
477
+- [DearApple Issue #001](https://github.com/overbog/dear-apple/issues/0001-apple-health-mass-data-loss.md) — historical context for reported Apple Health data anomalies
478
+- [Apple HealthKit Documentation](https://developer.apple.com/documentation/healthkit/)
479
+- [HKAnchoredObjectQuery](https://developer.apple.com/documentation/healthkit/hkanchoredrobjectquery) — Efficient incremental queries
480
+
481
+---
482
+
483
+## Appendix A: Example Diff Export
484
+
485
+```json
486
+{
487
+  "report_id": "DIFF_20260501_001",
488
+  "type": "observation_diff",
489
+  "exported_at": "2026-05-01T14:35:22Z",
490
+  "from_observation": "2026-04-01T08:00:00Z",
491
+  "to_observation": "2026-05-01T08:00:00Z",
492
+  "evidence": {
493
+    "sample_type": "HKQuantityTypeIdentifierStepCount",
494
+    "appeared": 12,
495
+    "disappeared": 84,
496
+    "representation_changed": 6,
497
+    "aggregate_delta": {
498
+      "value_sum_percent": 0.1,
499
+      "covered_days": 31
500
+    },
501
+    "label": "consolidation_likely",
502
+    "cause": "not inferred"
503
+  },
504
+  "manifest_hash": "synthetic-example-hash"
505
+}
506
+```
507
+
508
+---
509
+
510
+*HealthProbe — A local time machine for your Health database.*
+188 -0
HealthProbe/Doc/02-architecture/Core-Data-Cache-Design.md
@@ -0,0 +1,188 @@
1
+# HealthProbe - Core Data Cache Design
2
+
3
+**Last Updated:** 2026-05-23
4
+**Status:** Target design for UI/report cache
5
+
6
+## 1. Purpose
7
+
8
+Core Data is not the forensic archive. It is the bounded, UI-friendly store for values already derived from the SQLite archive.
9
+
10
+Use Core Data for:
11
+- observation rows shown in timelines;
12
+- type summaries and expensive counts;
13
+- daily/monthly aggregate display rows;
14
+- diff summary rows;
15
+- export history/status rows;
16
+- archive health/status rows;
17
+- local app state and settings that are not forensic evidence.
18
+
19
+Do not use Core Data for:
20
+- the only copy of HealthKit samples;
21
+- raw record payload history;
22
+- relationship evidence;
23
+- point-in-time reconstruction truth;
24
+- large record tables.
25
+
26
+SQLite wins on disagreement.
27
+
28
+## 2. Store Categories
29
+
30
+Core Data may contain two categories of entities.
31
+
32
+**Rebuildable cache entities**
33
+Can be deleted and rebuilt from SQLite:
34
+- `CachedObservationRow`;
35
+- `CachedTypeSummary`;
36
+- `CachedDailyAggregate`;
37
+- `CachedDiffSummary`;
38
+- `CachedExportManifest`;
39
+- `CachedArchiveHealth`.
40
+
41
+**Local app state/settings**
42
+Not forensic, not necessarily rebuildable:
43
+- selected type preferences;
44
+- UI display preferences;
45
+- last opened screen/state;
46
+- feature flags for legacy-device UI simplification.
47
+
48
+Cache rebuild must not delete settings unless the user explicitly resets the app.
49
+
50
+## 3. Entity Contracts
51
+
52
+### CachedObservationRow
53
+
54
+Purpose: timeline/list display.
55
+
56
+Required fields:
57
+- `observationID`;
58
+- `observedAt`;
59
+- `status`;
60
+- `triggerReason`;
61
+- `timeZoneIdentifier`;
62
+- `trackedTypeCount`;
63
+- `visibleRecordCount`;
64
+- `appearedCount`;
65
+- `disappearedCount`;
66
+- `representationChangedCount`;
67
+- `archiveSchemaVersion`;
68
+- `cacheSchemaVersion`;
69
+- `sourceAggregateHash`;
70
+- `computedAt`.
71
+
72
+### CachedTypeSummary
73
+
74
+Purpose: per-observation/per-type summary cards and reports.
75
+
76
+Required fields:
77
+- `observationID`;
78
+- `sampleTypeIdentifier`;
79
+- `displayName`;
80
+- `visibleRecordCount`;
81
+- `appearedCount`;
82
+- `disappearedCount`;
83
+- `representationChangedCount`;
84
+- `earliestStartDate`;
85
+- `latestEndDate`;
86
+- `valueSum`;
87
+- `valueMax`;
88
+- `aggregateHash`;
89
+- `computedAt`.
90
+
91
+### CachedDailyAggregate
92
+
93
+Purpose: charts and report tables.
94
+
95
+Required fields:
96
+- `observationID`;
97
+- `sampleTypeIdentifier`;
98
+- `bucketStart`;
99
+- `bucketEnd`;
100
+- `timeZoneIdentifier`;
101
+- `visibleRecordCount`;
102
+- `valueSum`;
103
+- `valueMax`;
104
+- `sourceRevisionDisplayHash`;
105
+- `aggregateHash`;
106
+- `computedAt`.
107
+
108
+### CachedDiffSummary
109
+
110
+Purpose: observation comparison list/detail.
111
+
112
+Required fields:
113
+- `fromObservationID`;
114
+- `toObservationID`;
115
+- `sampleTypeIdentifier`;
116
+- `appearedCount`;
117
+- `disappearedCount`;
118
+- `representationChangedCount`;
119
+- `consolidationLikely`;
120
+- `uncertaintyReason`;
121
+- `sourceAggregateHash`;
122
+- `computedAt`.
123
+
124
+### CachedExportManifest
125
+
126
+Purpose: export history/status display.
127
+
128
+Required fields:
129
+- `exportID`;
130
+- `exportKind`;
131
+- `createdAt`;
132
+- `fromObservationID`;
133
+- `toObservationID`;
134
+- `filterSummary`;
135
+- `recordCount`;
136
+- `manifestHash`;
137
+- `fileURLBookmarkData`;
138
+- `status`;
139
+- `computedAt`.
140
+
141
+### CachedArchiveHealth
142
+
143
+Purpose: archive status screen.
144
+
145
+Required fields:
146
+- `archiveSchemaVersion`;
147
+- `cacheSchemaVersion`;
148
+- `lastIntegrityCheckAt`;
149
+- `lastIntegrityStatus`;
150
+- `lastErrorKind`;
151
+- `lastErrorMessageHash`;
152
+- `cacheBuildID`;
153
+- `computedAt`.
154
+
155
+## 4. Invalidation
156
+
157
+Invalidate/rebuild cache rows when:
158
+- archive schema version changes;
159
+- archive reset/reinitialization occurs;
160
+- selected type registry changes;
161
+- a new observation commits in SQLite;
162
+- aggregate hashes change;
163
+- cache schema version changes.
164
+
165
+Rebuild order:
166
+1. archive health/status;
167
+2. observation rows;
168
+3. type summaries;
169
+4. daily/monthly aggregates;
170
+5. diff summaries;
171
+6. export status rows.
172
+
173
+Partial rebuild is allowed when SQLite can identify affected observations/types. Full rebuild must remain available for repair and tests.
174
+
175
+## 5. Legacy Device Mode
176
+
177
+Legacy or low-memory UI should still use the same Core Data cache. It may reduce:
178
+- chart density;
179
+- default date range;
180
+- preview row count;
181
+- simultaneous loaded detail panes.
182
+
183
+It must preserve:
184
+- capture;
185
+- cached summaries;
186
+- report generation;
187
+- paged SQLite detail/export access.
188
+
+777 -0
HealthProbe/Doc/02-architecture/Database-Design.md
@@ -0,0 +1,777 @@
1
+# HealthProbe - Database Design
2
+
3
+**Version:** 1.1
4
+**Last Updated:** 2026-05-23
5
+**Status:** Canonical database/storage design
6
+
7
+## 1. Purpose
8
+
9
+The database is the central piece of HealthProbe. The app can only reconstruct, analyze, export, and explain HealthKit history if the archive is complete, correct, queryable, and stable across product changes.
10
+
11
+UI can be refactored cheaply. A wrong archive design can permanently lose evidence, make large analyses impossible on low-end devices, or prevent future recovery-compatible exports. All storage work must start from this document.
12
+
13
+## 2. Non-Negotiable Requirements
14
+
15
+1. **SQLite archive is the source of truth.**
16
+   Core Data is a rebuildable cache. SwiftData is legacy/prototype only.
17
+
18
+2. **Store differentially.**
19
+   Do not append recurring complete snapshots of large HealthKit datasets. Store identities, payload versions, observation events/ranges, and aggregates.
20
+
21
+3. **Analyze in SQL, not RAM.**
22
+   Diffs, counts, point-in-time reconstruction, export selection, and consolidation heuristics must use SQLite indexes, joins, CTEs, temporary tables, and paged/streaming results.
23
+
24
+4. **Support legacy devices.**
25
+   The target includes iOS 15-era devices such as iPhone 6s-class Health collection setups. Do not require SwiftData.
26
+
27
+5. **Preserve recovery-compatible structure.**
28
+   The app will not restore or re-publish data, but archives/exports must preserve enough identity, payload, provenance, relationships, hashes, and observation history for external recovery/salvage tooling.
29
+
30
+6. **Never treat counts as sufficient truth.**
31
+   Counts are cached for reports/UI, but record identity, payload versions, visibility history, and aggregate context are required for interpretation.
32
+
33
+7. **No real personal data in repository artifacts.**
34
+   Database fixtures, docs, tests, and examples must use synthetic values only.
35
+
36
+## 3. Storage Layers
37
+
38
+### 3.1 SQLite Archive / Analysis Database
39
+
40
+`HealthProbeArchive.sqlite`
41
+
42
+Responsibilities:
43
+- canonical HealthKit observation history;
44
+- sample identity and payload versioning;
45
+- source/device/metadata/relationship preservation;
46
+- point-in-time reconstruction;
47
+- adjacent and selected-observation diffs;
48
+- consolidation heuristics;
49
+- materialized aggregates;
50
+- streaming/paged exports;
51
+- integrity manifests and future schema migrations.
52
+
53
+This database must be queryable without loading high-frequency datasets into Swift arrays.
54
+
55
+### 3.2 Core Data UI / Report Cache
56
+
57
+Core Data cache store.
58
+
59
+Responsibilities:
60
+- expensive counts already computed from SQLite;
61
+- observation list rows;
62
+- dashboard/timeline summaries;
63
+- per-type summary rows;
64
+- report/export metadata;
65
+- app settings and lightweight UI state.
66
+
67
+Rules:
68
+- cache rows are disposable;
69
+- cache rows must be rebuildable from SQLite;
70
+- if Core Data and SQLite disagree, SQLite wins;
71
+- Core Data must not contain the only copy of any record-level evidence.
72
+
73
+### 3.3 SwiftData Legacy Store
74
+
75
+Current SwiftData models are a prototype implementation detail. New storage work should not expand them.
76
+
77
+There are no real deployments, only test installs. During the archive v2 refactor, old SwiftData stores and prototype SQLite archives may be ignored, deleted, or reinitialized. Do not build backward compatibility or one-way import for the old prototype schema unless a later product decision explicitly changes this policy.
78
+
79
+## 4. Conceptual Model
80
+
81
+### Observation
82
+
83
+An observation is one local capture attempt/result at a specific time on the current device chain. It is not a full copy of all visible records.
84
+
85
+An observation records:
86
+- when capture started/ended;
87
+- app/schema/OS context;
88
+- timezone context at observation time;
89
+- selected type registry;
90
+- per-type capture quality;
91
+- HealthKit anchors;
92
+- events and aggregate changes observed during the capture.
93
+
94
+### Terminology
95
+
96
+- **Capture**: the act of querying HealthKit and writing results to the archive.
97
+- **Observation**: the durable archive record created by a capture attempt.
98
+- **Snapshot**: a reconstructed view of records visible at a selected observation. Do not store snapshot copies for high-volume data.
99
+- **Diff**: SQL-derived comparison between two observations on the same local device chain.
100
+
101
+### Sample Identity
102
+
103
+A sample identity is the stable record or semantic record HealthProbe tracks over time.
104
+
105
+Identity inputs may include:
106
+- HealthKit UUID hash when available;
107
+- strict fingerprint;
108
+- semantic fingerprint;
109
+- sample type;
110
+- date range;
111
+- value/unit/category/workout fields;
112
+- source revision where relevant.
113
+
114
+HealthKit UUID hash is important but not enough for every future use case. Apple exports and backup database extracts may require semantic/fuzzy matching.
115
+
116
+### Sample Version
117
+
118
+A sample version is the payload representation observed for a sample identity.
119
+
120
+A new version is created only when the representation changes:
121
+- start/end dates;
122
+- value/unit/category/workout fields;
123
+- source revision;
124
+- metadata hash;
125
+- related sample/workout/event links.
126
+
127
+### Visibility/Event History
128
+
129
+HealthProbe stores visibility as events and/or compressed ranges:
130
+- appeared;
131
+- verified/seen;
132
+- disappeared/no longer visible;
133
+- representation changed;
134
+- deleted-object evidence where HealthKit exposes `HKDeletedObject`.
135
+
136
+This allows point-in-time reconstruction without duplicating every visible record into every observation.
137
+
138
+### Aggregate Cache In SQLite
139
+
140
+SQLite stores materialized aggregates because many reports and screens need expensive counts/sums repeatedly.
141
+
142
+Aggregates are archive-derived evidence, not the source of truth. They must be rebuildable from sample/version/event tables.
143
+
144
+## 5. Target SQLite Schema
145
+
146
+Exact names may evolve, but the shape and constraints should remain.
147
+
148
+### 5.1 Schema And Metadata
149
+
150
+```sql
151
+CREATE TABLE schema_migrations (
152
+    version INTEGER PRIMARY KEY,
153
+    applied_at REAL NOT NULL,
154
+    description TEXT NOT NULL
155
+);
156
+
157
+CREATE TABLE archive_metadata (
158
+    key TEXT PRIMARY KEY,
159
+    value TEXT NOT NULL
160
+);
161
+```
162
+
163
+### 5.2 Device Chain And Observations
164
+
165
+```sql
166
+CREATE TABLE device_chains (
167
+    id INTEGER PRIMARY KEY,
168
+    device_chain_hash TEXT NOT NULL UNIQUE,
169
+    created_at REAL NOT NULL,
170
+    recovered_from_keychain INTEGER NOT NULL DEFAULT 0
171
+);
172
+
173
+CREATE TABLE observations (
174
+    id INTEGER PRIMARY KEY,
175
+    device_chain_id INTEGER NOT NULL REFERENCES device_chains(id),
176
+    observed_at REAL NOT NULL,
177
+    started_at REAL,
178
+    ended_at REAL,
179
+    status TEXT NOT NULL,
180
+    trigger_reason TEXT NOT NULL,
181
+    app_version TEXT,
182
+    os_version TEXT,
183
+    time_zone_identifier TEXT,
184
+    time_zone_seconds_from_gmt INTEGER,
185
+    schema_version INTEGER NOT NULL,
186
+    selected_type_set_hash TEXT,
187
+    notes TEXT
188
+);
189
+
190
+CREATE INDEX idx_observations_device_time
191
+ON observations(device_chain_id, observed_at);
192
+```
193
+
194
+### 5.3 Per-Type Capture Runs And Anchors
195
+
196
+```sql
197
+CREATE TABLE sample_types (
198
+    id INTEGER PRIMARY KEY,
199
+    type_identifier TEXT NOT NULL UNIQUE,
200
+    display_name TEXT,
201
+    category TEXT
202
+);
203
+
204
+CREATE TABLE observation_type_runs (
205
+    id INTEGER PRIMARY KEY,
206
+    observation_id INTEGER NOT NULL REFERENCES observations(id),
207
+    sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
208
+    status TEXT NOT NULL,
209
+    started_at REAL,
210
+    ended_at REAL,
211
+    anchor_before BLOB,
212
+    anchor_after BLOB,
213
+    inserted_event_count INTEGER NOT NULL DEFAULT 0,
214
+    deleted_event_count INTEGER NOT NULL DEFAULT 0,
215
+    verified_visible_count INTEGER,
216
+    error_kind TEXT,
217
+    error_message_hash TEXT,
218
+    UNIQUE(observation_id, sample_type_id)
219
+);
220
+
221
+CREATE INDEX idx_type_runs_type_observation
222
+ON observation_type_runs(sample_type_id, observation_id);
223
+```
224
+
225
+### 5.4 Sources, Devices, Metadata
226
+
227
+```sql
228
+CREATE TABLE sources (
229
+    id INTEGER PRIMARY KEY,
230
+    source_name_hash TEXT,
231
+    bundle_identifier TEXT
232
+);
233
+
234
+CREATE TABLE source_revisions (
235
+    id INTEGER PRIMARY KEY,
236
+    source_id INTEGER NOT NULL REFERENCES sources(id),
237
+    product_type TEXT,
238
+    version TEXT,
239
+    operating_system_version TEXT,
240
+    UNIQUE(source_id, product_type, version, operating_system_version)
241
+);
242
+
243
+CREATE TABLE hk_devices (
244
+    id INTEGER PRIMARY KEY,
245
+    device_hash TEXT,
246
+    manufacturer_hash TEXT,
247
+    model TEXT,
248
+    hardware_version TEXT,
249
+    firmware_version TEXT,
250
+    software_version TEXT,
251
+    local_identifier_hash TEXT,
252
+    udi_hash TEXT
253
+);
254
+
255
+CREATE TABLE metadata_blobs (
256
+    id INTEGER PRIMARY KEY,
257
+    metadata_hash TEXT NOT NULL UNIQUE,
258
+    metadata_json TEXT NOT NULL
259
+);
260
+```
261
+
262
+Privacy note: raw personal/device identifiers should be hashed or omitted according to policy. Store enough provenance for local analysis and recovery-compatible exports without leaking identifiers into logs or repository fixtures.
263
+
264
+### 5.5 Samples And Payload Versions
265
+
266
+```sql
267
+CREATE TABLE samples (
268
+    id INTEGER PRIMARY KEY,
269
+    sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
270
+    sample_uuid_hash TEXT,
271
+    strict_fingerprint TEXT NOT NULL,
272
+    semantic_fingerprint TEXT,
273
+    fuzzy_key TEXT,
274
+    first_seen_observation_id INTEGER NOT NULL REFERENCES observations(id),
275
+    first_seen_at REAL NOT NULL,
276
+    UNIQUE(sample_type_id, strict_fingerprint)
277
+);
278
+
279
+CREATE INDEX idx_samples_uuid_hash
280
+ON samples(sample_uuid_hash);
281
+
282
+CREATE INDEX idx_samples_type_semantic
283
+ON samples(sample_type_id, semantic_fingerprint);
284
+
285
+CREATE TABLE sample_versions (
286
+    id INTEGER PRIMARY KEY,
287
+    sample_id INTEGER NOT NULL REFERENCES samples(id),
288
+    payload_hash TEXT NOT NULL,
289
+    start_date REAL NOT NULL,
290
+    end_date REAL NOT NULL,
291
+    value_kind TEXT,
292
+    numeric_value REAL,
293
+    unit TEXT,
294
+    category_value INTEGER,
295
+    workout_activity_type INTEGER,
296
+    duration_seconds REAL,
297
+    source_revision_id INTEGER REFERENCES source_revisions(id),
298
+    hk_device_id INTEGER REFERENCES hk_devices(id),
299
+    metadata_id INTEGER REFERENCES metadata_blobs(id),
300
+    created_observation_id INTEGER NOT NULL REFERENCES observations(id),
301
+    UNIQUE(sample_id, payload_hash)
302
+);
303
+
304
+CREATE INDEX idx_sample_versions_sample
305
+ON sample_versions(sample_id);
306
+
307
+CREATE INDEX idx_sample_versions_time
308
+ON sample_versions(start_date, end_date);
309
+```
310
+
311
+### 5.6 Observation Events And Visibility Ranges
312
+
313
+```sql
314
+CREATE TABLE sample_observation_events (
315
+    id INTEGER PRIMARY KEY,
316
+    observation_id INTEGER NOT NULL REFERENCES observations(id),
317
+    sample_id INTEGER NOT NULL REFERENCES samples(id),
318
+    version_id INTEGER REFERENCES sample_versions(id),
319
+    event_kind TEXT NOT NULL,
320
+    observed_at REAL NOT NULL,
321
+    evidence_kind TEXT,
322
+    UNIQUE(observation_id, sample_id, event_kind)
323
+);
324
+
325
+CREATE INDEX idx_events_observation_kind
326
+ON sample_observation_events(observation_id, event_kind);
327
+
328
+CREATE INDEX idx_events_sample
329
+ON sample_observation_events(sample_id, observation_id);
330
+
331
+CREATE TABLE sample_visibility_ranges (
332
+    sample_id INTEGER NOT NULL REFERENCES samples(id),
333
+    version_id INTEGER REFERENCES sample_versions(id),
334
+    first_observation_id INTEGER NOT NULL REFERENCES observations(id),
335
+    last_observation_id INTEGER REFERENCES observations(id),
336
+    first_seen_at REAL NOT NULL,
337
+    last_seen_at REAL,
338
+    PRIMARY KEY (sample_id, version_id, first_observation_id)
339
+);
340
+
341
+CREATE INDEX idx_visibility_open_ranges
342
+ON sample_visibility_ranges(last_observation_id);
343
+
344
+CREATE INDEX idx_visibility_point_lookup
345
+ON sample_visibility_ranges(first_observation_id, last_observation_id);
346
+```
347
+
348
+Range convention:
349
+- `last_observation_id IS NULL` means still visible at the latest verified observation for that type;
350
+- closed ranges represent observations where the sample/version was visible;
351
+- deleted-object evidence should create an event even when full payload is not available.
352
+
353
+### 5.7 Relationships
354
+
355
+```sql
356
+CREATE TABLE sample_relationships (
357
+    id INTEGER PRIMARY KEY,
358
+    observation_id INTEGER REFERENCES observations(id),
359
+    source_sample_id INTEGER NOT NULL REFERENCES samples(id),
360
+    target_sample_id INTEGER NOT NULL REFERENCES samples(id),
361
+    relationship_kind TEXT NOT NULL,
362
+    metadata_id INTEGER REFERENCES metadata_blobs(id),
363
+    UNIQUE(observation_id, source_sample_id, target_sample_id, relationship_kind)
364
+);
365
+
366
+CREATE INDEX idx_relationship_source
367
+ON sample_relationships(source_sample_id, relationship_kind);
368
+
369
+CREATE INDEX idx_relationship_target
370
+ON sample_relationships(target_sample_id, relationship_kind);
371
+```
372
+
373
+Relationships are required for recovery-compatible archives. Even if iOS HealthKit exposes limited relationships, the schema must not prevent future preservation.
374
+
375
+### 5.8 Materialized Aggregates
376
+
377
+```sql
378
+CREATE TABLE observation_type_summaries (
379
+    observation_id INTEGER NOT NULL REFERENCES observations(id),
380
+    sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
381
+    visible_record_count INTEGER NOT NULL,
382
+    appeared_count INTEGER NOT NULL DEFAULT 0,
383
+    disappeared_count INTEGER NOT NULL DEFAULT 0,
384
+    representation_changed_count INTEGER NOT NULL DEFAULT 0,
385
+    earliest_start_date REAL,
386
+    latest_end_date REAL,
387
+    value_sum REAL,
388
+    value_max REAL,
389
+    aggregate_hash TEXT,
390
+    PRIMARY KEY (observation_id, sample_type_id)
391
+);
392
+
393
+CREATE TABLE daily_type_aggregates (
394
+    observation_id INTEGER NOT NULL REFERENCES observations(id),
395
+    sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
396
+    bucket_start REAL NOT NULL,
397
+    bucket_end REAL NOT NULL,
398
+    visible_record_count INTEGER NOT NULL,
399
+    value_sum REAL,
400
+    value_max REAL,
401
+    source_revision_id INTEGER,
402
+    aggregate_hash TEXT,
403
+    PRIMARY KEY (observation_id, sample_type_id, bucket_start, source_revision_id)
404
+);
405
+
406
+CREATE INDEX idx_daily_type_bucket
407
+ON daily_type_aggregates(sample_type_id, bucket_start);
408
+```
409
+
410
+Aggregates feed reports and the Core Data cache. They are also important for consolidation heuristics because a count drop with stable aggregate value may be representation change, not meaningful loss.
411
+
412
+### 5.9 Exports And Manifests
413
+
414
+```sql
415
+CREATE TABLE export_manifests (
416
+    id INTEGER PRIMARY KEY,
417
+    export_id TEXT NOT NULL UNIQUE,
418
+    created_at REAL NOT NULL,
419
+    export_kind TEXT NOT NULL,
420
+    from_observation_id INTEGER REFERENCES observations(id),
421
+    to_observation_id INTEGER REFERENCES observations(id),
422
+    filter_json TEXT,
423
+    manifest_hash TEXT NOT NULL,
424
+    record_count INTEGER NOT NULL
425
+);
426
+
427
+CREATE TABLE export_items (
428
+    export_manifest_id INTEGER NOT NULL REFERENCES export_manifests(id),
429
+    sample_id INTEGER NOT NULL REFERENCES samples(id),
430
+    version_id INTEGER REFERENCES sample_versions(id),
431
+    item_hash TEXT NOT NULL,
432
+    PRIMARY KEY (export_manifest_id, sample_id, version_id)
433
+);
434
+```
435
+
436
+Exports should be reproducible from the archive when possible. Manifest hashes let external tools verify that a recovery-compatible export matches the archived evidence.
437
+
438
+## 6. Archive V2 Implementation Decisions
439
+
440
+These decisions close Milestone 1 for archive v2 unless a later dated entry changes them.
441
+
442
+### 6.1 Timestamps
443
+
444
+Archive timestamps are stored as Unix seconds in UTC using SQLite `REAL`.
445
+
446
+Rules:
447
+- write dates with `Date.timeIntervalSince1970`;
448
+- read dates with `Date(timeIntervalSince1970:)`;
449
+- never store local-time interpreted timestamps in archive date columns;
450
+- canonical hash/export text uses ISO 8601 UTC with fractional seconds;
451
+- aggregate bucket rows store UTC bucket boundaries plus the observation timezone context that produced them.
452
+
453
+### 6.2 Hashing And Privacy
454
+
455
+Use two hash classes:
456
+- **Integrity/content hashes:** plain `SHA-256`, lower-case hex. Use for payload hashes, metadata hashes, aggregate hashes, export item hashes, and manifest hashes.
457
+- **Privacy-sensitive identifiers:** `HMAC-SHA256`, lower-case hex, using a locally stored archive secret. Use for HealthKit UUIDs, device identifiers, source names, local identifiers, UDI-like values, and device-chain identifiers.
458
+
459
+All hash inputs must include a domain/version prefix such as `hp:v2:sample_uuid:` or `hp:v2:payload:`. Do not hash raw strings without a domain prefix.
460
+
461
+The local archive secret is device-local application state. If the secret is lost, future captures may start a new device chain. Export files include already-computed hashes and manifest/item hashes, not the local archive secret.
462
+
463
+### 6.3 Device Chain Identity
464
+
465
+`device_chain_hash` identifies one local capture chain, not a globally unique device.
466
+
467
+Initial implementation:
468
+- create or recover a random chain seed from Keychain;
469
+- compute `device_chain_hash = HMAC-SHA256(archiveSecret, "hp:v2:device_chain:" + chainSeed)`;
470
+- set `recovered_from_keychain = 1` when the seed survived app reinstall and was reused;
471
+- start a new chain if the seed is missing or explicitly reset.
472
+
473
+HealthKit device metadata may be stored as hashed provenance in `hk_devices`, but it must not be used as a cross-device comparison key.
474
+
475
+### 6.4 Fingerprints And Payload Versions
476
+
477
+`sample_uuid_hash` is the preferred stable HealthKit identity when HealthKit exposes a UUID.
478
+
479
+`strict_fingerprint` is a deterministic exact fallback/verification key built from canonical fields:
480
+- type identifier;
481
+- start/end timestamps in canonical UTC text;
482
+- value kind and canonical value representation;
483
+- unit/category/workout fields where applicable;
484
+- source bundle identifier hash when available.
485
+
486
+Do not include SQLite row ids in fingerprints. If HealthKit UUID is available, a payload change creates a new `sample_versions` row for the same sample identity. If HealthKit UUID is not available, an exact strict fingerprint change may create appeared/disappeared evidence, with `semantic_fingerprint` used only as weak consolidation context.
487
+
488
+`payload_hash` is `SHA-256` over the canonical sample payload representation, including dates, value/unit/category/workout fields, source revision fields, device provenance hashes, metadata hash, and relationship payload when available. A new `sample_versions` row is created when `payload_hash` changes.
489
+
490
+`semantic_fingerprint` is type-specific and optional. It supports consolidation heuristics and fuzzy backup/export reconciliation, but it is never sufficient by itself to prove record identity.
491
+
492
+### 6.5 Timezone And Aggregate Buckets
493
+
494
+Raw sample timestamps remain UTC. Daily/monthly aggregate buckets are computed using the device timezone active at observation time because user-facing health summaries are local-day concepts.
495
+
496
+Rules:
497
+- store `time_zone_identifier` and `time_zone_seconds_from_gmt` on `observations`;
498
+- bucket boundaries are midnight-to-midnight in that observation timezone, stored as UTC seconds;
499
+- old observations are not retroactively re-bucketed when the device timezone changes;
500
+- exports include timezone metadata so external tools can reinterpret buckets if needed.
501
+
502
+### 6.6 Visibility Range Maintenance
503
+
504
+Maintain `sample_visibility_ranges` eagerly in the same transaction that writes observation events. Point-in-time queries should read ranges, not rebuild them from events on every request.
505
+
506
+Integrity tools may rebuild ranges from `sample_observation_events` into temporary tables and compare the result with stored ranges. Validation rebuild is for tests/repair checks, not the normal query path.
507
+
508
+### 6.7 Relationship Preservation
509
+
510
+Store every relationship the capture/import surface can observe. Relationship rows are append-only evidence for what was known at an observation; do not rewrite `relationship_kind` to encode later disappearance.
511
+
512
+If a related sample disappears, endpoint visibility ranges and events explain that disappearance. Relationship exports should include relationships when both endpoints are included, and may include unresolved endpoint hashes when allowed by the export scope.
513
+
514
+### 6.8 Aggregates
515
+
516
+Aggregates are computed in SQLite after each successful observation/type run and materialized in `observation_type_summaries` and `daily_type_aggregates`.
517
+
518
+SQL-first does not mean recompute every count live for the UI. It means heavy computation is done by SQLite using indexes, joins, CTEs, temporary tables, and paged result sets. Repeated UI/report reads consume materialized SQLite aggregates and the Core Data cache.
519
+
520
+### 6.9 Export Manifest Canonicalization
521
+
522
+Structured exports use a versioned canonical envelope with deterministic ordering.
523
+
524
+Rules:
525
+- export format version starts at `1`;
526
+- JSON object keys are sorted for canonical bytes;
527
+- record/item order is deterministic: sample type, start date, end date, sample identity hash, version hash;
528
+- each exported item has an `item_hash = SHA-256(canonical item JSON)`;
529
+- `manifest_hash = SHA-256(canonical export metadata + ordered item_hash list + counts + filter description)`;
530
+- large exports compute hashes incrementally while streaming/paging rows from SQLite.
531
+
532
+Manifest hashes must cover exported content through item hashes, not only counts or first/last dates.
533
+
534
+## 7. Write Path
535
+
536
+### 7.1 Capture Transaction Shape
537
+
538
+For each observation/type run:
539
+
540
+1. Open SQLite transaction.
541
+2. Insert/update `observations` and `observation_type_runs`.
542
+3. For each added/visible sample page:
543
+   - upsert source/source revision/device/metadata;
544
+   - upsert `samples`;
545
+   - upsert `sample_versions`;
546
+   - insert observation event;
547
+   - update visibility ranges.
548
+4. For each `HKDeletedObject`:
549
+   - find sample by UUID hash;
550
+   - insert deleted/disappeared event;
551
+   - close open visibility ranges.
552
+5. Recompute affected materialized aggregates.
553
+6. Commit SQLite.
554
+7. Update/rebuild Core Data cache after SQLite commit.
555
+
556
+SQLite commit must happen before Core Data cache update. Cache rebuild failures must not corrupt archive truth.
557
+
558
+### 7.2 Idempotency
559
+
560
+Capture pages may be retried. Writes must be idempotent using uniqueness constraints:
561
+- `(sample_type_id, strict_fingerprint)` for samples;
562
+- `(sample_id, payload_hash)` for versions;
563
+- `(observation_id, sample_id, event_kind)` for events;
564
+- range primary key for visibility.
565
+
566
+### 7.3 Anchor Handling
567
+
568
+HealthKit anchors are capture implementation state. Store them per type run, but do not treat anchors as forensic truth. If an anchor is unusable, the archive must still support rebuilding the current visible state from a full type scan.
569
+
570
+## 8. Point-In-Time Reconstruction
571
+
572
+Point-in-time reconstruction should use ranges, not full snapshot tables.
573
+
574
+Conceptual query:
575
+
576
+```sql
577
+SELECT s.id AS sample_id, sv.id AS version_id, sv.start_date, sv.end_date,
578
+       sv.value_kind, sv.numeric_value, sv.unit
579
+FROM sample_visibility_ranges r
580
+JOIN samples s ON s.id = r.sample_id
581
+JOIN sample_versions sv ON sv.id = r.version_id
582
+JOIN observations target ON target.id = :observation_id
583
+WHERE s.sample_type_id = :sample_type_id
584
+  AND r.first_observation_id <= target.id
585
+  AND (r.last_observation_id IS NULL OR r.last_observation_id >= target.id)
586
+ORDER BY sv.start_date, s.strict_fingerprint
587
+LIMIT :limit OFFSET :offset;
588
+```
589
+
590
+Implementation may optimize this with temporary tables or materialized visible sets for selected observations.
591
+
592
+## 9. Diff Between Observations
593
+
594
+Diffs must run in SQLite.
595
+
596
+```sql
597
+CREATE TEMP TABLE prev_visible AS
598
+SELECT r.sample_id, r.version_id
599
+FROM sample_visibility_ranges r
600
+WHERE r.first_observation_id <= :previous
601
+  AND (r.last_observation_id IS NULL OR r.last_observation_id >= :previous);
602
+
603
+CREATE TEMP TABLE curr_visible AS
604
+SELECT r.sample_id, r.version_id
605
+FROM sample_visibility_ranges r
606
+WHERE r.first_observation_id <= :current
607
+  AND (r.last_observation_id IS NULL OR r.last_observation_id >= :current);
608
+
609
+CREATE INDEX temp_prev_sample ON prev_visible(sample_id);
610
+CREATE INDEX temp_curr_sample ON curr_visible(sample_id);
611
+
612
+-- Disappeared.
613
+SELECT p.sample_id
614
+FROM prev_visible p
615
+LEFT JOIN curr_visible c ON c.sample_id = p.sample_id
616
+WHERE c.sample_id IS NULL;
617
+
618
+-- Appeared.
619
+SELECT c.sample_id
620
+FROM curr_visible c
621
+LEFT JOIN prev_visible p ON p.sample_id = c.sample_id
622
+WHERE p.sample_id IS NULL;
623
+
624
+-- Representation changed.
625
+SELECT c.sample_id, p.version_id AS before_version_id, c.version_id AS after_version_id
626
+FROM curr_visible c
627
+JOIN prev_visible p ON p.sample_id = c.sample_id
628
+WHERE p.version_id != c.version_id;
629
+```
630
+
631
+Result sets must be paged. Counts can be materialized into `observation_type_summaries` and Core Data cache.
632
+
633
+## 10. Consolidation Heuristics
634
+
635
+Consolidation likely when:
636
+- many old high-frequency records disappear;
637
+- newer records cover similar date windows;
638
+- aggregate sums remain within tolerance;
639
+- sample density decreases while duration/interval length increases;
640
+- source/provenance context is compatible.
641
+
642
+Required evidence:
643
+- record-count delta;
644
+- value-sum delta;
645
+- coverage-window overlap;
646
+- interval-length/density comparison;
647
+- source/source-revision breakdown;
648
+- uncertainty label if evidence is incomplete.
649
+
650
+Never classify count drops alone as loss.
651
+
652
+## 11. Core Data Cache Contract
653
+
654
+Core Data entities should mirror presentation needs, not archive internals.
655
+
656
+Core Data may contain two categories:
657
+- rebuildable UI/report cache derived from SQLite;
658
+- non-forensic local app state/settings.
659
+
660
+Deleting rebuildable cache rows must not delete the SQLite archive. User settings may be preserved across cache rebuilds.
661
+
662
+Candidate cache entities:
663
+- `CachedObservationRow`;
664
+- `CachedTypeSummary`;
665
+- `CachedDailyAggregate`;
666
+- `CachedDiffSummary`;
667
+- `CachedExportManifest`;
668
+- `CachedArchiveHealth`;
669
+- `AppSetting`.
670
+
671
+Every cache row should include:
672
+- archive schema version;
673
+- cache schema version;
674
+- source observation id(s);
675
+- source aggregate/hash where applicable;
676
+- computed_at timestamp.
677
+
678
+Invalidation rules:
679
+- archive reset or future archive migration invalidates cache;
680
+- selected type registry change invalidates affected summaries;
681
+- aggregate rebuild invalidates corresponding Core Data rows;
682
+- app can delete all cache rows and rebuild from SQLite.
683
+
684
+## 12. Export Requirements
685
+
686
+Exports are scoped and recovery-compatible.
687
+
688
+Every structured export should include:
689
+- export id;
690
+- archive schema version;
691
+- app version;
692
+- observation id(s);
693
+- selected type filters;
694
+- record count;
695
+- manifest hash;
696
+- per-record sample identity/fingerprint;
697
+- payload version hash;
698
+- dates, values, units, category/workout fields;
699
+- source/provenance metadata where available and allowed;
700
+- relationships where available;
701
+- provenance-loss warnings for external re-publication workflows.
702
+
703
+Exports must stream/page from SQLite. Do not build large JSON/CSV exports entirely in RAM.
704
+
705
+## 13. Reset And Future Migration Policy
706
+
707
+Current status: HealthProbe has no real deployments, only test installations. The archive v2 refactor does not need backward compatibility with the old SwiftData/prototype SQLite schema.
708
+
709
+For the current refactor:
710
+- old SwiftData stores and prototype SQLite archives may be deleted, ignored, or reinitialized;
711
+- no one-way migration from old prototype stores is required;
712
+- test users/developers must expect prototype data loss when moving to archive v2;
713
+- reset behavior must be documented in test/release notes and must not be presented as data preservation;
714
+- Core Data cache stores remain disposable and rebuildable.
715
+
716
+For future real archives:
717
+- migrations must be versioned in `schema_migrations`;
718
+- migrations must be tested on synthetic large archives;
719
+- migration failure must not silently delete the archive;
720
+- cache stores may be deleted and rebuilt;
721
+- archive reset must require explicit user confirmation.
722
+
723
+## 14. Integrity And Verification
724
+
725
+Archive integrity checks:
726
+- SQLite `PRAGMA integrity_check`;
727
+- schema version check;
728
+- missing FK/reference checks;
729
+- aggregate rebuild spot checks;
730
+- manifest hash verification;
731
+- open visibility range sanity checks;
732
+- duplicate identity checks;
733
+- export reproducibility checks.
734
+
735
+Use WAL mode for normal operation. Consider periodic checkpoints when safe.
736
+
737
+## 15. Performance Rules
738
+
739
+- Use prepared statements for bulk writes.
740
+- Batch within transactions.
741
+- Use indexes listed in the schema and add query-specific indexes only with evidence.
742
+- Prefer integer primary keys internally.
743
+- Store archive dates as Unix seconds UTC.
744
+- Page record tables and exports.
745
+- Use temporary tables for large selected-observation diffs.
746
+- Do not decode large archived payloads into Swift collections.
747
+- Profile on low-memory/legacy-class devices.
748
+
749
+## 16. Testing Requirements
750
+
751
+Unit/integration tests must cover:
752
+- idempotent repeated capture page writes;
753
+- first observation for a type;
754
+- appeared/disappeared/representation-changed events;
755
+- visibility range open/close behavior;
756
+- point-in-time reconstruction;
757
+- diff queries on large synthetic datasets;
758
+- aggregate rebuild;
759
+- Core Data cache rebuild after deletion;
760
+- export manifest reproducibility;
761
+- recovery-compatible export fields;
762
+- prototype-store reset/reinitialization behavior for current test installs;
763
+- future archive migrations once real archive versions exist;
764
+- memory ceiling during export and diff.
765
+
766
+No real HealthKit data in fixtures.
767
+
768
+## 17. Deferred Design Questions
769
+
770
+These do not block the archive v2 foundation, but they should be revisited before advanced import/export tooling:
771
+
772
+1. Exact semantic/fuzzy fingerprint fields per HealthKit sample family.
773
+2. Relationship extraction surface available from HealthKit vs backup/XML exports.
774
+3. Optional user-controlled export profiles for more/less provenance disclosure.
775
+4. Repair tooling for rebuilding visibility ranges after future real archive migrations.
776
+
777
+Record any changed decision here with a date before implementing schema changes.
+151 -0
HealthProbe/Doc/02-architecture/Export-Specification.md
@@ -0,0 +1,151 @@
1
+# HealthProbe - Export Specification
2
+
3
+**Last Updated:** 2026-05-23
4
+**Status:** Target design for recovery-compatible exports
5
+
6
+## 1. Purpose
7
+
8
+Exports let the user preserve selected point-in-time views, diffs, record tables, and evidence summaries from the local SQLite archive.
9
+
10
+HealthProbe does not restore, patch backups, or re-publish HealthKit data. Exports should still preserve enough structure for external recovery/salvage tools to reason about what was observed.
11
+
12
+## 2. Export Kinds
13
+
14
+Supported target kinds:
15
+- `observation_records_json`;
16
+- `observation_records_csv`;
17
+- `observation_diff_json`;
18
+- `type_summary_json`;
19
+- `archive_manifest_json`.
20
+
21
+Large exports must stream/page from SQLite. Do not materialize all records into Swift arrays.
22
+
23
+## 3. JSON Envelope
24
+
25
+Every JSON export uses a versioned envelope:
26
+
27
+```json
28
+{
29
+  "export_format_version": 1,
30
+  "export_id": "UUID",
31
+  "export_kind": "observation_records_json",
32
+  "created_at": "2026-05-23T12:00:00.000Z",
33
+  "app": {
34
+    "name": "HealthProbe",
35
+    "version": "local-build"
36
+  },
37
+  "archive": {
38
+    "schema_version": 2,
39
+    "device_chain_hash": "hex",
40
+    "from_observation_id": 1,
41
+    "to_observation_id": null
42
+  },
43
+  "filters": {
44
+    "sample_type_identifiers": [],
45
+    "date_range": null,
46
+    "include_relationships": true
47
+  },
48
+  "manifest": {
49
+    "record_count": 0,
50
+    "item_hash_algorithm": "sha256",
51
+    "manifest_hash_algorithm": "sha256",
52
+    "manifest_hash": "hex"
53
+  },
54
+  "items": []
55
+}
56
+```
57
+
58
+JSON keys are emitted in deterministic sorted order for canonical hashing.
59
+
60
+## 4. Export Item Contract
61
+
62
+Record items should include:
63
+- sample identity hash;
64
+- HealthKit UUID hash when available;
65
+- strict fingerprint;
66
+- semantic fingerprint when available;
67
+- payload version hash;
68
+- sample type identifier;
69
+- start/end timestamps as ISO 8601 UTC;
70
+- value kind, value, unit, category, workout fields;
71
+- source/provenance hashes or redacted fields allowed by the export scope;
72
+- metadata hash and optional metadata object when allowed;
73
+- relationships when both endpoints are in scope, or unresolved endpoint hashes when explicitly allowed;
74
+- observation visibility fields: first seen, last verified, disappeared evidence when available.
75
+
76
+Every item has:
77
+- `item_hash = SHA-256(canonical item JSON)`.
78
+
79
+## 5. Manifest Hash
80
+
81
+`manifest_hash` is calculated incrementally:
82
+
83
+```text
84
+SHA-256(
85
+  canonical_export_metadata_json
86
+  + ordered_item_hashes
87
+  + canonical_counts_json
88
+  + canonical_filter_json
89
+)
90
+```
91
+
92
+The manifest hash must cover exported content through item hashes. Counts, first dates, or last dates alone are not sufficient.
93
+
94
+Item order:
95
+1. sample type identifier;
96
+2. start date;
97
+3. end date;
98
+4. sample identity hash;
99
+5. payload version hash.
100
+
101
+## 6. CSV Contract
102
+
103
+CSV exports are flat record tables for spreadsheet and external tooling.
104
+
105
+Required column order:
106
+1. `export_id`
107
+2. `observation_id`
108
+3. `sample_type_identifier`
109
+4. `sample_identity_hash`
110
+5. `sample_uuid_hash`
111
+6. `strict_fingerprint`
112
+7. `semantic_fingerprint`
113
+8. `payload_hash`
114
+9. `start_date_utc`
115
+10. `end_date_utc`
116
+11. `value_kind`
117
+12. `numeric_value`
118
+13. `unit`
119
+14. `category_value`
120
+15. `workout_activity_type`
121
+16. `duration_seconds`
122
+17. `source_hash`
123
+18. `device_hash`
124
+19. `metadata_hash`
125
+20. `first_seen_observation_id`
126
+21. `last_verified_observation_id`
127
+22. `disappeared_observation_id`
128
+23. `item_hash`
129
+
130
+CSV uses RFC 4180 quoting rules and UTF-8.
131
+
132
+Relationships are not flattened into the main CSV row. If needed, export a companion relationships CSV with source/target sample hashes.
133
+
134
+## 7. Streaming And Cancellation
135
+
136
+Implementation contract:
137
+- page records from SQLite with deterministic cursors;
138
+- write output incrementally;
139
+- update item/manifest hash state as rows stream;
140
+- if the user cancels, mark export status as `cancelled` and do not record a completed manifest;
141
+- failed exports should leave no completed manifest row unless the output is verifiable.
142
+
143
+Resume support is optional for v1 exports.
144
+
145
+## 8. Provenance Warning
146
+
147
+Every user-facing export flow must communicate that:
148
+- exported data is observed evidence from HealthKit-accessible surfaces;
149
+- external re-publication to HealthKit may lose original metadata/provenance;
150
+- HealthProbe itself does not restore or modify HealthKit/iOS backups.
151
+
+403 -0
HealthProbe/Doc/02-architecture/Implementation-Guide.md
@@ -0,0 +1,403 @@
1
+# HealthProbe - Technical Implementation Guide
2
+
3
+**Version:** 1.6
4
+**Last Updated:** 2026-05-23
5
+**Purpose:** Implementation guide for the iOS local Health DB Time Machine
6
+
7
+For database schema, archive invariants, SQL analysis patterns, Core Data cache
8
+boundaries, reset policy, and future migration rules, read
9
+[`Database-Design.md`](Database-Design.md) first. This guide describes
10
+implementation workflow and cross-module behavior.
11
+
12
+## Privacy Directives - Mandatory
13
+
14
+The following rules apply to all code, logs, examples, tests, and documentation:
15
+
16
+- No credentials, API keys, tokens, passwords, or signing certificates
17
+- No personal data: names, emails, phone numbers, dates of birth
18
+- No account identifiers: Apple IDs, iCloud account info, CloudKit record IDs
19
+- No raw real health values in the repository, tests, fixtures, logs, examples, or documentation
20
+- No location data or patterns that could identify a user
21
+- Device/source identifiers must be redacted, hashed, or stored only as local provenance according to the privacy policy
22
+
23
+The app may store a user's HealthKit samples locally on-device when the user grants HealthKit access. Those samples must never be committed to source control or written to diagnostic logs.
24
+
25
+## 1. Product Objective
26
+
27
+HealthProbe is a single-device local archive and time-machine app for HealthKit-accessible data.
28
+
29
+The implementation must prioritize:
30
+- point-in-time reconstruction of local HealthKit observations
31
+- neutral change explanation between observations
32
+- preservation of selected details before HealthKit aggregation/consolidation makes them unavailable
33
+- scoped user exports
34
+- no HealthProbe CloudKit/iCloud sync
35
+- no cross-device record-by-record comparison
36
+- iOS 15-era legacy device support; SwiftData is not a target dependency
37
+
38
+Record-count drops are not inherently critical. They are evidence to explain with record-level and aggregate context.
39
+
40
+## 2. Test Installation Reset Lifecycle
41
+
42
+HealthProbe has no real deployments at this stage. Existing SwiftData stores and prototype SQLite archives are disposable.
43
+
44
+Archive v2 startup behavior:
45
+1. Open the archive path.
46
+2. Read `archive_metadata.schema_version` when present.
47
+3. If no archive exists, create archive v2.
48
+4. If a prototype/unknown schema exists, close the database, move it to a timestamped `*.prototype-backup` file for developer inspection, and create a fresh archive v2.
49
+5. Rebuild/delete Core Data cache rows after archive reset.
50
+6. Log reset reason without raw health values.
51
+
52
+Do not implement one-way migration from the old prototype schema unless a later dated product decision reverses this policy.
53
+
54
+## 3. HealthKit Capture
55
+
56
+Use:
57
+- `HKAnchoredObjectQuery` for incremental capture
58
+- `HKObserverQuery` as a wake-up hint
59
+- manual capture from the app UI
60
+
61
+Capture flow:
62
+1. Resolve the current local device chain ID.
63
+2. Start an observation record.
64
+3. For each selected sample type, run anchored queries.
65
+4. Write HealthKit samples and deleted-object evidence to the local archive first.
66
+5. Update materialized aggregate tables in SQLite.
67
+6. Save/rebuild derived Core Data cache rows only after archive writes succeed.
68
+7. Compute summary/diff caches for UI and reports.
69
+
70
+Anchors belong to the local device timeline and selected type registry. They are implementation state, not forensic truth.
71
+
72
+### 3.1 Capture State Machine
73
+
74
+Observation statuses:
75
+- `started`;
76
+- `partial`;
77
+- `completed`;
78
+- `failed`;
79
+- `cancelled`.
80
+
81
+Type-run statuses:
82
+- `started`;
83
+- `completed`;
84
+- `failed`;
85
+- `unauthorized`;
86
+- `timed_out`.
87
+
88
+Rules:
89
+- one failed type does not invalidate successfully committed type runs;
90
+- incomplete observations are visible as partial evidence, not as proof of disappearance;
91
+- anchors are saved only after the corresponding SQLite transaction commits;
92
+- UI change labels must include uncertainty when either side of a diff has partial/failed type evidence.
93
+
94
+### 3.2 Anchor Recovery
95
+
96
+If an anchor is missing, corrupt, or rejected by HealthKit:
97
+- mark the type run with anchor failure context;
98
+- run a full scan for the affected type when permissions allow;
99
+- rebuild current visibility for that type from observed samples and deleted-object evidence;
100
+- continue storing future anchors after the full scan succeeds.
101
+
102
+## 4. Storage Layers
103
+
104
+### 4.1 Local Archive Store
105
+
106
+The archive store is the source of truth. It should be a robust local SQLite database designed for both storage and analysis.
107
+
108
+The canonical database design is [`Database-Design.md`](Database-Design.md). The summary below is intentionally high-level; do not treat it as a competing schema source.
109
+
110
+The archive should support:
111
+- one schema for all selected sample types
112
+- differential observation storage; do not store complete recurring snapshots
113
+- HealthKit UUID hash and internal fingerprints
114
+- sample payload versions deduplicated across observations
115
+- type identifier, start/end date, value, unit, and category/workout fields
116
+- source/source revision metadata
117
+- HealthKit metadata dictionaries
118
+- device provenance exposed by HealthKit, redacted or hashed as required
119
+- first-seen, last-seen, last-verified, and disappeared-at observations
120
+- visibility ranges/events for point-in-time reconstruction
121
+- observation history sufficient for point-in-time reconstruction
122
+- relationship records where HealthKit exposes links between workouts, samples, events, or related records
123
+- materialized aggregate tables for expensive counts used by reports/UI
124
+- schema versioning, current test-store reset policy, and future migrations
125
+- integrity hashes/manifests for exports
126
+- indexes, temporary tables, joins, CTEs, and paged result sets for large diffs
127
+- recovery-compatible exports for external tooling, preserving record identity, payload versions, provenance metadata where available, relationships, observation history, and manifest hashes
128
+
129
+The archive must be able to answer:
130
+- records visible at observation T
131
+- records that appeared/disappeared between T1 and T2
132
+- records whose representation changed while semantic/aggregate meaning may be preserved
133
+- selected records for streaming export
134
+
135
+Minimum target schema shape is defined in [`Database-Design.md`](Database-Design.md). The archive must at least preserve these concepts:
136
+
137
+- observations;
138
+- sample identities;
139
+- sample payload versions;
140
+- observation events;
141
+- visibility ranges;
142
+- sources, source revisions, devices, metadata, and relationships;
143
+- materialized aggregates;
144
+- export manifests.
145
+
146
+Historical sketch retained for orientation:
147
+
148
+```sql
149
+-- One row per local capture attempt/result.
150
+CREATE TABLE observations (
151
+    id INTEGER PRIMARY KEY,
152
+    observed_at REAL NOT NULL,
153
+    status TEXT NOT NULL,
154
+    app_version TEXT,
155
+    os_version TEXT,
156
+    device_chain_id TEXT NOT NULL,
157
+    schema_version INTEGER NOT NULL
158
+);
159
+
160
+-- Stable identity for a HealthKit-accessible record or semantic record.
161
+CREATE TABLE samples (
162
+    id INTEGER PRIMARY KEY,
163
+    type_identifier TEXT NOT NULL,
164
+    sample_uuid_hash TEXT,
165
+    strict_fingerprint TEXT NOT NULL,
166
+    semantic_fingerprint TEXT,
167
+    first_seen_observation_id INTEGER NOT NULL
168
+);
169
+
170
+-- Deduplicated payload representation. New row only when representation changes.
171
+CREATE TABLE sample_versions (
172
+    id INTEGER PRIMARY KEY,
173
+    sample_id INTEGER NOT NULL,
174
+    payload_hash TEXT NOT NULL,
175
+    start_date REAL NOT NULL,
176
+    end_date REAL NOT NULL,
177
+    value REAL,
178
+    unit TEXT,
179
+    source_id INTEGER,
180
+    metadata_hash TEXT
181
+);
182
+
183
+-- Visibility/event history, not a full snapshot copy.
184
+CREATE TABLE sample_observation_events (
185
+    id INTEGER PRIMARY KEY,
186
+    observation_id INTEGER NOT NULL,
187
+    sample_id INTEGER NOT NULL,
188
+    version_id INTEGER,
189
+    event_kind TEXT NOT NULL
190
+);
191
+
192
+-- Optional compressed visibility ranges for point-in-time reconstruction.
193
+CREATE TABLE sample_visibility_ranges (
194
+    sample_id INTEGER NOT NULL,
195
+    version_id INTEGER,
196
+    first_observation_id INTEGER NOT NULL,
197
+    last_observation_id INTEGER,
198
+    PRIMARY KEY (sample_id, version_id, first_observation_id)
199
+);
200
+
201
+-- Materialized aggregates feeding reports and Core Data cache.
202
+CREATE TABLE daily_type_aggregates (
203
+    observation_id INTEGER NOT NULL,
204
+    type_identifier TEXT NOT NULL,
205
+    bucket_start REAL NOT NULL,
206
+    record_count INTEGER NOT NULL,
207
+    value_sum REAL,
208
+    value_max REAL,
209
+    PRIMARY KEY (observation_id, type_identifier, bucket_start)
210
+);
211
+```
212
+
213
+Exact naming can evolve, but the constraints must hold: payloads are deduplicated, observations are differential, and aggregates are materialized.
214
+
215
+### 4.2 Core Data UI/Report Cache
216
+
217
+Detailed entity contracts live in [`Core-Data-Cache-Design.md`](Core-Data-Cache-Design.md).
218
+
219
+Core Data is the target derived/cache layer because it supports older iOS versions than SwiftData and is suitable for UI/report state. It may store:
220
+- selected data types and app settings
221
+- observation list and capture status
222
+- precomputed summaries, temporal bins, and diff previews
223
+- operation logs and export indexes
224
+- change labels and links into the archive
225
+- expensive count results used by reports and presentation
226
+
227
+Core Data must not be the only forensic copy. If Core Data and the archive disagree, the SQLite archive wins. The cache must be safe to delete and rebuild from SQLite.
228
+
229
+Current SwiftData models are legacy/prototype implementation details. New storage work should target Core Data for cache and SQLite for archive/analysis.
230
+
231
+## 5. Change Explanation
232
+
233
+Change logic should be evidence-first and consolidation-aware.
234
+
235
+Basic diff should execute in SQLite, not by loading full datasets into Swift arrays:
236
+```swift
237
+appeared = currentFingerprints.subtracting(previousFingerprints)
238
+disappeared = previousFingerprints.subtracting(currentFingerprints)
239
+retained = currentFingerprints.intersection(previousFingerprints)
240
+```
241
+
242
+Conceptual SQL shape:
243
+```sql
244
+CREATE TEMP TABLE prev_visible AS
245
+SELECT sample_id, version_id
246
+FROM visible_samples
247
+WHERE observation_id = :previous;
248
+
249
+CREATE TEMP TABLE curr_visible AS
250
+SELECT sample_id, version_id
251
+FROM visible_samples
252
+WHERE observation_id = :current;
253
+
254
+SELECT p.sample_id
255
+FROM prev_visible p
256
+LEFT JOIN curr_visible c ON c.sample_id = p.sample_id
257
+WHERE c.sample_id IS NULL;
258
+```
259
+
260
+Semantic grouping should compare:
261
+- type identifier
262
+- start/end coverage
263
+- value sum and value max where meaningful
264
+- source/source revision
265
+- metadata keys relevant to HealthKit interpretation
266
+- interval length and sample density
267
+
268
+Suggested labels:
269
+- `appeared`
270
+- `disappeared`
271
+- `representationChanged`
272
+- `consolidationLikely`
273
+- `aggregateChanged`
274
+- `uncertain`
275
+
276
+Severity should be reserved for user-facing workflow urgency, not treated as proof of corruption. In particular, a high disappeared count with stable aggregate totals should usually be shown as `consolidationLikely` or `representationChanged`, not as critical loss.
277
+
278
+## 6. Exports
279
+
280
+Detailed export formats and manifest rules live in [`Export-Specification.md`](Export-Specification.md).
281
+
282
+Exports are scoped to what the user is inspecting.
283
+
284
+Supported MVP exports:
285
+- point-in-time record table
286
+- observation manifest with hashes
287
+- diff report between two observations
288
+- selected appeared/disappeared/changed record set
289
+
290
+Export rules:
291
+- Include observation timestamps and app/build/schema versions.
292
+- Include hashes so exported evidence can be re-identified within HealthProbe.
293
+- Do not automatically upload exports.
294
+- Keep examples synthetic.
295
+- Allow CSV for spreadsheet inspection and JSON for structured analysis.
296
+- Stream/page from SQLite. Do not build a full large export in RAM.
297
+- Preserve enough structure for external recovery/salvage tools to reason about records without making HealthProbe itself a restore tool.
298
+
299
+## 7. Context Logging
300
+
301
+Context logs help interpret changes but must not claim causality.
302
+
303
+Log:
304
+- capture start/end/failure
305
+- HealthKit permission changes
306
+- selected type registry changes
307
+- app version and iOS version
308
+- coarse iCloud sign-in state if available
309
+- archive reset/schema-version changes and integrity-check results
310
+
311
+Do not log raw health values or personal identifiers.
312
+
313
+## 8. Archive Health And Integrity Failure
314
+
315
+Archive health checks:
316
+- open database;
317
+- verify schema version;
318
+- run `PRAGMA integrity_check`;
319
+- verify required tables/indexes;
320
+- spot-check aggregate rebuilds;
321
+- verify manifest hashes for completed exports.
322
+
323
+If integrity fails:
324
+- stop write operations;
325
+- show archive health as degraded;
326
+- allow export only if the specific query path can be verified safe;
327
+- offer developer/test reset for current prototype builds;
328
+- do not silently delete a real archive in future production builds.
329
+
330
+Core Data cache corruption is lower severity: delete and rebuild cache from SQLite.
331
+
332
+## 9. UI Implementation Guidance
333
+
334
+Primary surfaces:
335
+- observation timeline
336
+- point-in-time observation detail
337
+- per-type record table
338
+- diff detail between observations
339
+- export preview and export history
340
+- archive health/status
341
+
342
+Legacy devices may disable or simplify heavy visualizations. They must still support capture, cached summaries, report generation, and export.
343
+
344
+Avoid alarm-first wording. Prefer:
345
+- "84 records no longer visible in current observation"
346
+- "Daily aggregate changed by 0.1%"
347
+- "Consolidation likely"
348
+- "Cause not inferred"
349
+
350
+Avoid:
351
+- "Apple lost your data"
352
+- "Critical loss" based only on count
353
+- "iCloud broke sync"
354
+
355
+## 10. Testing Strategy
356
+
357
+Unit tests:
358
+- point-in-time reconstruction
359
+- appeared/disappeared diff sets
360
+- consolidation heuristic with stable aggregates
361
+- changed aggregate with uncertain label
362
+- empty observations
363
+- permission/type-registry changes
364
+- clock skew/context timestamp handling
365
+- Core Data cache deletion and rebuild from SQLite
366
+- SQL diff queries on large synthetic datasets without high RAM use
367
+
368
+Integration tests:
369
+- archive persistence and recovery
370
+- archive reset/reinitialization for current test installs
371
+- future archive schema migrations once real archive versions exist
372
+- Core Data cache rebuild from archive
373
+- export generation with manifest hashes
374
+- high-frequency capture memory/performance
375
+- deletion evidence via `HKDeletedObject`
376
+
377
+Synthetic fixtures only. No real health values or identifiable metadata.
378
+
379
+## 11. Performance Considerations
380
+
381
+| Operation | Target | Notes |
382
+|-----------|--------|-------|
383
+| Anchored capture | Background | Stream pages; avoid building huge arrays |
384
+| Archive write | Background | Commit before Core Data cache update |
385
+| UI cache update | Short main-thread work | Use precomputed summaries |
386
+| Diff preview | SQL-first, bounded | Use temp tables/indexes; cap record previews and page full tables |
387
+| Export | User-initiated | Stream/page from SQLite; support filters for large high-frequency types |
388
+
389
+## 12. Deployment Checklist
390
+
391
+- [ ] HealthKit read permissions declared in Info.plist
392
+- [ ] Background Modes enabled if used
393
+- [ ] Core Data cache schema/rebuild tested
394
+- [ ] Archive reset/reinitialization and schema versioning tested
395
+- [ ] Archive integrity/manifests tested
396
+- [ ] Export files verified with synthetic data
397
+- [ ] Privacy policy matches local archive behavior
398
+- [ ] UI copy reviewed for neutral, consolidation-aware language
399
+- [ ] Legacy-device mode reviewed for simplified UI/report/export behavior
400
+
401
+---
402
+
403
+*HealthProbe Implementation Guide v1.6 - 2026-05-23*
+12 -0
HealthProbe/Doc/03-ui/README.md
@@ -0,0 +1,12 @@
1
+# HealthProbe UI Chapter
2
+
3
+Active UI agent guidance is currently centralized in:
4
+
5
+- [`../00-agent-guides/CLAUDE.md`](../00-agent-guides/CLAUDE.md)
6
+
7
+Historical UI redesign notes are archived in:
8
+
9
+- [`../99-archive/DATA_TYPE_VIEWS_OPTIMIZATION.md`](../99-archive/DATA_TYPE_VIEWS_OPTIMIZATION.md)
10
+- [`../99-archive/REFACTORING_DATA_TYPE_VIEWS.md`](../99-archive/REFACTORING_DATA_TYPE_VIEWS.md)
11
+
12
+Do not treat archived UI notes as current product scope. Current UI work should follow the Time Machine, observation, diff, and export language in the Claude guide and MVP specification.
+74 -0
HealthProbe/Doc/04-project/IMPLEMENTATION_STATUS.md
@@ -0,0 +1,74 @@
1
+# HealthProbe - Implementation Status
2
+
3
+**Last Updated:** 2026-05-23
4
+
5
+## Current Reality
6
+
7
+The app currently contains a working SwiftUI + SwiftData prototype with HealthKit capture, snapshot/delta screens, and an initial SQLite archive store.
8
+
9
+The product direction has changed. The target architecture is now:
10
+- iOS 15-era compatible;
11
+- direct SQLite archive/analysis database as source of truth;
12
+- differential observation storage;
13
+- Core Data UI/report cache;
14
+- Time Machine UI and scoped exports;
15
+- recovery-compatible archive/export format;
16
+- no in-app restore, backup patching, or HealthKit re-publication.
17
+
18
+Current SwiftData models and anomaly-oriented naming are legacy/prototype implementation details.
19
+
20
+There are no real deployments, only test installations. Existing prototype databases are disposable: the archive v2 refactor should reset, ignore, or reinitialize old SwiftData/prototype SQLite stores instead of preserving backward compatibility with them.
21
+
22
+## Status By Area
23
+
24
+| Area | Current Status | Target / Next Work |
25
+|------|----------------|--------------------|
26
+| Product docs | Updated | Keep `HealthProbe/Doc/README.md` as canonical index |
27
+| HealthKit capture | Prototype exists | Adapt capture to write differential SQLite observations first |
28
+| SQLite archive | Archive v2 schema bootstrap exists; legacy write table still active | Move write path from `archive_samples` to observations/samples/versions/events/ranges |
29
+| Core Data cache | Not implemented | Add rebuildable cache for expensive counts, summaries, report metadata, UI state |
30
+| SwiftData cache | Exists | Treat as disposable prototype data; reset/ignore during v2 transition |
31
+| UI | Prototype exists | Reframe screens around observations, diffs, export, archive status |
32
+| Diff/change explanation | Prototype/legacy anomaly logic exists | Move heavy diffing into SQLite and use neutral change classifications |
33
+| Export | Prototype scoped JSON export exists | Add recovery-compatible manifests and streaming/paged export |
34
+| Legacy device support | Not implemented | Remove SwiftData dependency and simplify heavy views for low-memory devices |
35
+| Recovery workflows | Not supported | Preserve export/archive structure for external recovery tools only |
36
+
37
+## Refactoring Priorities
38
+
39
+Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
40
+
41
+1. Implement differential write path: observations, samples, payload versions, events/ranges, aggregates.
42
+2. Add SQLite integrity/open/schema-version tests.
43
+3. Move large diffs/counts into SQL queries with indexes/temp tables/paged results.
44
+4. Add Core Data UI/report cache and rebuild pipeline.
45
+5. Replace SwiftData UI dependencies with Core Data/cache DTOs.
46
+6. Update UI language from anomaly/status to observation/diff/export.
47
+7. Add streaming exports with manifests.
48
+8. Validate on low-memory/legacy-class devices.
49
+
50
+## Known Prototype Mismatches
51
+
52
+- SwiftData currently blocks iOS 15-era device support.
53
+- Existing `Anomaly*` model/service names are legacy language.
54
+- Some screens still imply snapshot-count monitoring rather than Time Machine inspection.
55
+- Current archive schema is not sufficient as the long-term source of truth.
56
+- Existing implementation may decode or cache too much data for low-end devices.
57
+- Old prototype database compatibility is no longer required.
58
+
59
+## Verification Checklist
60
+
61
+- [ ] SQLite archive v2 can reconstruct records visible at observation T.
62
+- [ ] No recurring complete snapshot copies are written for high-volume types.
63
+- [ ] SQL diff between two observations runs without loading full datasets into Swift arrays.
64
+- [ ] Expensive counts used by reports/UI are cached and rebuildable.
65
+- [ ] Deleting Core Data cache and rebuilding from SQLite restores UI/report summaries.
66
+- [ ] Export can stream large selected record sets.
67
+- [ ] Export manifests include hashes and observation metadata.
68
+- [ ] iOS app remains read-only with respect to HealthKit.
69
+- [ ] Docs and UI do not claim in-app restore/re-publication support.
70
+- [ ] Legacy/small-device UI mode preserves capture/report/export.
71
+
72
+## Historical Notes
73
+
74
+Older status docs described a completed snapshot/anomaly/SwiftData system. That was true for the prototype direction, but it is no longer the target architecture.
+293 -0
HealthProbe/Doc/04-project/Refactoring-Plan.md
@@ -0,0 +1,293 @@
1
+# HealthProbe - Database-Led Refactoring Plan
2
+
3
+**Last Updated:** 2026-05-23
4
+**Status:** Active planning document
5
+
6
+## Goal
7
+
8
+Move HealthProbe from the current SwiftData/snapshot/anomaly prototype toward the target architecture:
9
+
10
+- SQLite archive/analysis database as source of truth;
11
+- differential observation storage;
12
+- SQL-first analysis for large datasets;
13
+- Core Data UI/report cache;
14
+- recovery-compatible exports;
15
+- iOS 15-era legacy-device support;
16
+- Time Machine UI over local observations.
17
+- destructive reset/reinitialization of prototype/test stores; old database
18
+  compatibility is not required.
19
+
20
+UI refactoring happens after the storage and query foundations exist.
21
+
22
+## Milestone 0 - Freeze Legacy Direction
23
+
24
+**Purpose:** Stop work from deepening the old architecture.
25
+
26
+Checklist:
27
+- [ ] Mark SwiftData as legacy/prototype in active implementation tickets.
28
+- [ ] Stop adding new SwiftData entities.
29
+- [ ] Stop adding features that require recurring complete snapshots.
30
+- [ ] Mark existing prototype/test installation data as disposable for archive v2.
31
+- [ ] Point all storage agents to [`../02-architecture/Database-Design.md`](../02-architecture/Database-Design.md).
32
+- [ ] Confirm root docs only bootstrap into `HealthProbe/Doc/`.
33
+
34
+Acceptance:
35
+- [ ] No active task describes SwiftData as target persistence.
36
+- [ ] No active task proposes full periodic snapshot storage.
37
+- [ ] No active task requires old prototype-store compatibility.
38
+- [ ] `HealthProbe/Doc/README.md` points DB work to `Database-Design.md`.
39
+
40
+## Milestone 1 - Lock Database Decisions
41
+
42
+**Purpose:** Resolve irreversible archive choices before coding schema v2.
43
+
44
+Checklist:
45
+- [x] Decide timestamp storage convention.
46
+- [x] Decide hash/salt/key strategy for source/device identifiers.
47
+- [x] Define strict fingerprint foundation.
48
+- [x] Define semantic/fuzzy fingerprint policy.
49
+- [x] Define timezone policy for daily/monthly aggregate buckets.
50
+- [x] Decide whether visibility ranges are maintained eagerly or rebuilt from events.
51
+- [x] Define relationship preservation policy for workouts/samples/events.
52
+- [x] Record prototype data policy: discard/reset old SwiftData and prototype SQLite stores; no compatibility migration.
53
+- [x] Define export manifest canonicalization and hash algorithm.
54
+
55
+Acceptance:
56
+- [x] `Database-Design.md` open questions are answered or explicitly deferred.
57
+- [x] Schema v2 can be implemented without guessing.
58
+- [x] Test-install reset/reinitialization policy is documented.
59
+- [x] Privacy implications of identifiers/provenance are documented.
60
+
61
+## Milestone 2 - Synthetic Large-Data Test Harness
62
+
63
+**Purpose:** Prove the new design can be tested before real HealthKit data is involved.
64
+
65
+Checklist:
66
+- [ ] Create synthetic observation generator.
67
+- [ ] Generate low, medium, and high-volume sample sets.
68
+- [ ] Include appeared/disappeared/representationChanged scenarios.
69
+- [ ] Include consolidation-like high-frequency thinning scenarios.
70
+- [ ] Include source/device/metadata variation.
71
+- [ ] Include relationship fixtures.
72
+- [ ] Add memory/performance measurement for large diff/export operations.
73
+
74
+Acceptance:
75
+- [ ] Tests can create a large synthetic archive without real health data.
76
+- [ ] Large diff test does not require loading all records into Swift arrays.
77
+- [ ] Export test streams/pages output.
78
+- [ ] Fixtures contain no personal, device, location, or real health data.
79
+
80
+## Milestone 3 - SQLite Archive V2 Schema
81
+
82
+**Purpose:** Create the new archive foundation.
83
+
84
+Checklist:
85
+- [x] Implement `schema_migrations`.
86
+- [x] Implement `archive_metadata`.
87
+- [x] Implement `device_chains`.
88
+- [x] Implement `observations`.
89
+- [x] Implement `sample_types`.
90
+- [x] Implement `observation_type_runs`.
91
+- [x] Implement `sources`.
92
+- [x] Implement `source_revisions`.
93
+- [x] Implement `hk_devices`.
94
+- [x] Implement `metadata_blobs`.
95
+- [x] Implement `samples`.
96
+- [x] Implement `sample_versions`.
97
+- [x] Implement `sample_observation_events`.
98
+- [x] Implement `sample_visibility_ranges`.
99
+- [x] Implement `sample_relationships`.
100
+- [x] Implement `observation_type_summaries`.
101
+- [x] Implement `daily_type_aggregates`.
102
+- [x] Implement `export_manifests`.
103
+- [x] Implement `export_items`.
104
+- [x] Add required indexes.
105
+- [ ] Add SQLite integrity/open/schema-version tests.
106
+
107
+Acceptance:
108
+- [ ] Fresh archive initializes successfully.
109
+- [x] Schema version is recorded.
110
+- [x] Archive v2 can initialize after old prototype stores are removed or ignored.
111
+- [ ] `PRAGMA integrity_check` passes.
112
+- [x] Required indexes exist.
113
+- [ ] Empty archive queries return valid empty results.
114
+
115
+## Milestone 4 - Differential Write Path
116
+
117
+**Purpose:** Write observations without storing full recurring snapshots.
118
+
119
+Checklist:
120
+- [ ] Create observation transaction wrapper.
121
+- [ ] Upsert sample types.
122
+- [ ] Upsert source/source revision/device/metadata rows.
123
+- [ ] Upsert sample identity.
124
+- [ ] Upsert sample payload version only when payload changes.
125
+- [ ] Insert appeared/verified/representationChanged events.
126
+- [ ] Record `HKDeletedObject` evidence by UUID hash.
127
+- [ ] Close visibility ranges for disappeared/deleted samples.
128
+- [ ] Maintain open visibility ranges for visible samples.
129
+- [ ] Rebuild/update affected aggregates after capture.
130
+- [ ] Commit SQLite before Core Data/cache work.
131
+- [ ] Make repeated capture page writes idempotent.
132
+
133
+Acceptance:
134
+- [ ] Initial import stores identities and versions once.
135
+- [ ] Re-running same page does not duplicate records.
136
+- [ ] Representation change creates a new version, not a new logical sample.
137
+- [ ] Disappearance closes visibility range.
138
+- [ ] No full observation copy table is written.
139
+
140
+## Milestone 5 - SQL Analysis Layer
141
+
142
+**Purpose:** Make the archive useful without RAM-heavy processing.
143
+
144
+Checklist:
145
+- [ ] Implement point-in-time visible-record query.
146
+- [ ] Implement paged record table query.
147
+- [ ] Implement appeared query between observations.
148
+- [ ] Implement disappeared query between observations.
149
+- [ ] Implement representationChanged query between observations.
150
+- [ ] Implement diff counts using temp tables or equivalent SQL-first strategy.
151
+- [ ] Implement aggregate comparison query.
152
+- [ ] Implement consolidation-likely evidence query.
153
+- [ ] Implement source/provenance breakdown query.
154
+- [ ] Add query timing/memory tests on synthetic large datasets.
155
+
156
+Acceptance:
157
+- [ ] Observation T can be reconstructed from ranges/events.
158
+- [ ] Large diff returns counts and first page without loading all rows.
159
+- [ ] Query results are deterministic and ordered.
160
+- [ ] Consolidation evidence includes count, aggregate, coverage, density, and uncertainty data.
161
+
162
+## Milestone 6 - Core Data UI/Report Cache
163
+
164
+**Purpose:** Cache expensive presentation/report values while keeping SQLite authoritative.
165
+
166
+Checklist:
167
+- [ ] Define Core Data model for observation rows.
168
+- [ ] Define type summary cache entity.
169
+- [ ] Define daily/monthly aggregate cache entity.
170
+- [ ] Define diff summary cache entity.
171
+- [ ] Define export manifest/status cache entity.
172
+- [ ] Define archive health/status cache entity.
173
+- [ ] Implement cache rebuild from SQLite.
174
+- [ ] Implement cache invalidation by archive schema/cache schema/version/hash.
175
+- [ ] Implement delete-cache-and-rebuild flow.
176
+- [ ] Add cache schema/version and rebuild tests.
177
+
178
+Acceptance:
179
+- [ ] Deleting Core Data cache does not lose forensic data.
180
+- [ ] Cache rebuild restores dashboard/timeline/report summaries.
181
+- [ ] Cache rows include source observation ids and archive/cache schema versions.
182
+- [ ] SQLite wins on disagreement.
183
+
184
+## Milestone 7 - Export Layer
185
+
186
+**Purpose:** Produce scoped, recovery-compatible exports.
187
+
188
+Checklist:
189
+- [ ] Define JSON export envelope.
190
+- [ ] Define CSV record-table export.
191
+- [ ] Define manifest hash algorithm.
192
+- [ ] Include archive/app/schema/observation metadata.
193
+- [ ] Include sample identity and payload version hashes.
194
+- [ ] Include values/dates/units/type fields.
195
+- [ ] Include source/provenance metadata where available and allowed.
196
+- [ ] Include relationships where available.
197
+- [ ] Include provenance-loss warning for external HealthKit re-publication.
198
+- [ ] Stream/page export from SQLite.
199
+- [ ] Store export manifest rows.
200
+- [ ] Add reproducibility test for export manifests.
201
+
202
+Acceptance:
203
+- [ ] Large export does not materialize full record set in RAM.
204
+- [ ] Export can be verified against archive hashes.
205
+- [ ] Export contains enough structure for external recovery/salvage tooling.
206
+- [ ] App still does not perform restore, backup patching, or HealthKit re-publication.
207
+
208
+## Milestone 8 - UI/Data Flow Migration
209
+
210
+**Purpose:** Move UI from prototype storage to target cache/query flow.
211
+
212
+Checklist:
213
+- [ ] Replace direct SwiftData `@Query` dependencies for target screens.
214
+- [ ] Dashboard reads Core Data cache.
215
+- [ ] Observation timeline reads Core Data cache.
216
+- [ ] Observation detail uses cached summaries plus paged SQLite DTOs.
217
+- [ ] Diff detail uses cached summary plus paged SQLite DTOs.
218
+- [ ] Data type screens use target change labels.
219
+- [ ] Export preview uses export query/manifest APIs.
220
+- [ ] Archive status reflects SQLite/Core Data cache health.
221
+- [ ] Legacy/small-device UI mode simplifies heavy visualizations.
222
+
223
+Acceptance:
224
+- [ ] Core Time Machine flows work without SwiftData as target persistence.
225
+- [ ] UI copy uses observation/diff/export language.
226
+- [ ] No count-only critical data loss messaging.
227
+- [ ] Large record tables are paged.
228
+- [ ] Legacy mode preserves capture/report/export.
229
+
230
+## Milestone 9 - Legacy SwiftData Retirement
231
+
232
+**Purpose:** Remove prototype persistence from the target architecture.
233
+
234
+Checklist:
235
+- [ ] Identify all remaining SwiftData imports.
236
+- [ ] Replace SwiftData models used by active flows.
237
+- [ ] Remove/disable `ModelContainer` as required for target builds.
238
+- [ ] Add prototype-store ignore/delete/reset path for test installs.
239
+- [ ] Verify no old-store compatibility layer remains in active flows.
240
+- [ ] Lower deployment target as far as dependencies allow.
241
+- [ ] Verify build for iOS 15-era target constraints.
242
+
243
+Acceptance:
244
+- [ ] SwiftData is not required for normal app launch.
245
+- [ ] Active flows use SQLite + Core Data cache.
246
+- [ ] Prototype data handling is explicit: old stores are ignored/deleted/reset for test installs.
247
+
248
+## Milestone 10 - Acceptance Gate
249
+
250
+**Purpose:** Decide whether the refactor is complete enough to build product features on top.
251
+
252
+Checklist:
253
+- [ ] Point-in-time reconstruction works.
254
+- [ ] Large diff works SQL-first.
255
+- [ ] Materialized aggregates can be rebuilt and verified.
256
+- [ ] Core Data cache can be deleted and rebuilt.
257
+- [ ] Large export streams/pages.
258
+- [ ] Recovery-compatible manifest is present.
259
+- [ ] SQLite integrity checks pass.
260
+- [ ] Low-memory synthetic tests pass.
261
+- [ ] UI no longer depends on SwiftData as foundation.
262
+- [ ] Docs match implementation.
263
+
264
+Acceptance:
265
+- [ ] Product can safely proceed to UI polish and higher-level workflows.
266
+- [ ] Database is no longer the main unresolved architectural risk.
267
+
268
+## Parallelization Guide
269
+
270
+Can run in parallel after Milestone 1:
271
+- synthetic data harness;
272
+- schema implementation;
273
+- Core Data cache model drafting;
274
+- export format drafting;
275
+- UI DTO contract design.
276
+
277
+Must not run before dependencies:
278
+- UI migration before SQL query layer and Core Data cache exist;
279
+- export implementation before manifest design is locked;
280
+- legacy SwiftData removal before replacement flows exist;
281
+- archive v2 initialization before reset/reinitialization policy is documented.
282
+
283
+## Agent Assignment Hints
284
+
285
+| Workstream | Primary Doc |
286
+|------------|-------------|
287
+| SQLite schema/write path/query layer | [`../02-architecture/Database-Design.md`](../02-architecture/Database-Design.md) |
288
+| HealthKit capture integration | [`../02-architecture/Implementation-Guide.md`](../02-architecture/Implementation-Guide.md) |
289
+| Core Data cache | [`../02-architecture/Core-Data-Cache-Design.md`](../02-architecture/Core-Data-Cache-Design.md) |
290
+| Export formats/manifests | [`../02-architecture/Export-Specification.md`](../02-architecture/Export-Specification.md) |
291
+| UI migration | [`../00-agent-guides/CLAUDE.md`](../00-agent-guides/CLAUDE.md) |
292
+| Product language/non-goals | [`../01-product/MVP-Specification.md`](../01-product/MVP-Specification.md) |
293
+| Status updates | [`IMPLEMENTATION_STATUS.md`](IMPLEMENTATION_STATUS.md) |
+12 -1
DATA_TYPE_VIEWS_OPTIMIZATION.md → HealthProbe/Doc/99-archive/DATA_TYPE_VIEWS_OPTIMIZATION.md
@@ -1,4 +1,15 @@
1
-# Data Type Views Optimization – Visual Guide
1
+# Archived Note - Data Type Views Optimization Visual Guide
2
+
3
+**Status:** Historical implementation note. Do not treat this file as current product scope or UI requirements.
4
+
5
+For active UI direction, read:
6
+- [`../README.md`](../README.md)
7
+- [`../00-agent-guides/CLAUDE.md`](../00-agent-guides/CLAUDE.md)
8
+- [`../01-product/MVP-Specification.md`](../01-product/MVP-Specification.md)
9
+
10
+---
11
+
12
+# Data Type Views Optimization - Visual Guide
2 13
 
3 14
 ## Overview
4 15
 
+12 -1
REFACTORING_DATA_TYPE_VIEWS.md → HealthProbe/Doc/99-archive/REFACTORING_DATA_TYPE_VIEWS.md
@@ -1,4 +1,15 @@
1
-# Data Type Views Optimization – Visual Redesign
1
+# Archived Note - Data Type Views Optimization Visual Redesign
2
+
3
+**Status:** Historical implementation note. Do not treat this file as current product scope or UI requirements.
4
+
5
+For active UI direction, read:
6
+- [`../README.md`](../README.md)
7
+- [`../00-agent-guides/CLAUDE.md`](../00-agent-guides/CLAUDE.md)
8
+- [`../01-product/MVP-Specification.md`](../01-product/MVP-Specification.md)
9
+
10
+---
11
+
12
+# Data Type Views Optimization - Visual Redesign
2 13
 
3 14
 ## Summary
4 15
 
+0 -400
HealthProbe/Doc/Forensics & Limitations.md
@@ -1,409 +0,0 @@
1
-# HealthProbe – Risks, Limitations & Forensic Capabilities
2
-
3
-
4
-## 1. Known Limitations
5
-
6
-### 1.1 HealthKit Framework Constraints
7
-
8
-**What HealthProbe Cannot Detect:**
9
-
10
-| Gap | Why | Mitigation |
11
-|-----|-----|-----------|
12
-| **Modifications without deletion** | HealthKit has no "modified" event, only "added" and "deleted" | Use snapshot comparison to detect value changes |
13
-| **Lost deletions** | If deletion notification arrives while app backgrounded, we may miss it | Monitor both anchored queries AND deleted objects |
14
-| **Timing precision** | Anchored queries may batch multiple changes, lose granular timestamps | Store both sample timestamp AND observation timestamp |
15
-| **Private HealthKit types** | Some data types not accessible to third-party apps | Accept data available only to Health.app |
16
-| **Cross-device sync delays** | Watch-to-phone-to-cloud can take 24+ hours | Extend observation window, don't flag immediate after device sync |
17
-| **Consolidation / downsampling** | Apple Health/iCloud can rewrite high-frequency historical samples in-place (count decreases; values/intervals change) | Store fingerprints + optionally archive raw samples for selected types (local-only forensic backup mode) |
18
-
19
-### 1.2 iOS Background Mode Limitations
20
-
21
-**Background Fetch Reality:**
22
-- iOS may delay or skip background fetch requests (battery, network state, user activity)
23
-- No guarantee that HealthProbe will run at specified interval
24
-- User can disable background refresh in Settings → HealthProbe
25
-- System may suspend app if device is low on storage
26
-
27
-**Impact:** Anomalies may not be detected for 24-72 hours after occurrence
28
-
29
-**Mitigation:**
30
-- Encourage users to open app regularly (at least weekly)
31
-- Provide manual "Check Now" button
32
-- Use `HKObserverQuery` for real-time detection when app is running
33
-
34
-### 1.3 Data Retention Constraints
35
-
36
-**HealthKit Sample Retention:**
37
-- HealthKit automatically deletes some transient data (e.g., minute-level HR after 90 days)
38
-- User can manually delete samples (HealthProbe cannot prevent this)
39
-- Backups may not restore all data (lossy compression, sync state)
40
-
41
-**HealthProbe Impact:**
42
-- Cannot reconstruct data that was never observed
43
-- Snapshot from 6 months ago may have samples no longer in HealthKit
44
-- Gap detection assumes continuous observation (may be false positive if app uninstalled then reinstalled)
45
-- If Apple consolidates history, **counts alone can be misleading** (a month can “lose” samples but keep the same totals via aggregation); value-level forensics are required for proof
46
-
47
-
48
-## 2. Risk Assessment
49
-
50
-### 2.1 Privacy Risks (Mitigation: Excellent ✅)
51
-
52
-| Risk | Impact | Mitigation |
53
-|------|--------|-----------|
54
-| **Raw health data exfiltration** | CRITICAL: user's personal health history exposed | ✅ Local-only storage, never sends raw samples |
55
-| **Device fingerprinting** | HIGH: tracking user across services | ✅ Salted hash of device ID, stored locally only |
56
-| **Timing attacks** (inferring behavior) | MEDIUM: archive/check patterns reveal habits | ✅ No automatic cloud sync; reports are explicit local exports |
57
-| **App crashes leaking data** | LOW: crash logs may contain HealthKit info | ✅ All logging is aggregated (counts, not values) |
58
-
59
-### 2.2 Data Integrity Risks (Mitigation: Good ✅)
60
-
61
-| Risk | Impact | Mitigation |
62
-|------|--------|-----------|
63
-| **Snapshot corruption** | HIGH: audit trail becomes unreliable | ✅ Use MD5 checksum of snapshots, detect corruption |
64
-| **Lost audit trail on uninstall** | HIGH: forensic data disappears | ⚠️ PARTIAL: Encourage export before uninstall; future: iCloud backup option |
65
-| **Silent rewrites (no deletion events)** | HIGH: history can change without HKDeletedObject evidence | ✅ Detect via fingerprint diffs; **Forensic Backup Mode** can preserve per-sample evidence locally for selected types |
66
-| **Clock skew** (device time wrong) | MEDIUM: timestamps inaccurate, anomaly detection confused | ⚠️ Log both device time + time since boot, detect skew |
67
-| **Concurrent modification** (app + Health.app) | LOW: race conditions during query | ✅ Anchored queries are atomic |
68
-
69
-### 2.3 Security Risks (Mitigation: Good ✅)
70
-
71
-| Risk | Impact | Mitigation |
72
-|------|--------|-----------|
73
-| **Malicious apps accessing HealthKit** | MEDIUM: third-party apps can read our data | ✅ iOS sandboxing; ask user for HealthKit permission per app |
74
-| **Jailbroken device** | CRITICAL: all bets off | ⚠️ Not defendable; document assumption: standard iOS device |
75
-| **iCloud account compromise** | MEDIUM: Apple Health/iCloud state may still affect source data | ✅ HealthProbe archive remains local; no HealthProbe CloudKit sync |
76
-| **Local device theft** | MEDIUM: thief can see audit trail | ✅ Data encrypted by iOS, requires device unlock |
77
-
78
-
79
-## 3. Forensic Capabilities
80
-
81
-### 3.1 Questions HealthProbe Can Answer
82
-
83
-**Q1: "Was my data lost?"**
84
-```
85
-Answer method:
86
-  1. Load all snapshots for "Steps" type
87
-  2. Create timeline: date → sample count
88
-  3. Detect gap where count drops significantly
89
-  4. Report: when, how much, which source
90
-  
91
-Example output:
92
-  ✅ Yes — On 2026-03-15, step data dropped from 8,234 to 2,100
93
-     Loss: 6,134 samples (74.3%)
94
-     Source: "iPhone Health App"
95
-     Severity: CRITICAL
96
-```
97
-
98
-**Q2: "Why did my data diverge?"**
99
-```
100
-Answer method:
101
-  1. Load historical aggregates (daily sums)
102
-  2. Fit trend line across 30/60/90 day periods
103
-  3. Calculate deviation from baseline
104
-  4. Correlate with sync state changes & OS updates
105
-  
106
-Example output:
107
-  📊 Step count trending down 15% per month
108
-     Baseline (2025): avg 9,200 steps/day (σ = 1,200)
109
-     Recent (2026): avg 7,800 steps/day (σ = 1,800)
110
-     Correlation: Matches iPhone → Apple Watch priority shift
111
-```
112
-
113
-**Q3: "When did this data first appear?"**
114
-```
115
-Answer method:
116
-  1. Search anomaly trail for "historical_insertion"
117
-  2. Find sample in audit trail with matching ID
118
-  3. Report exact timestamp (±1 sync cycle)
119
-  
120
-Example output:
121
-  🔍 Workout "Morning Run" (2025-01-15)
122
-     First observed: 2026-05-01 at 14:35:22 UTC
123
-     Age: 471 days
124
-     Context: iCloud sync completed 2 minutes prior
125
-```
126
-
127
-**Q4: "Is my device syncing correctly?"**
128
-```
129
-Answer method:
130
-  1. Load sync state changes
131
-  2. Check for state = "icloud_sync_active" frequency
132
-  3. Measure time between sync completions
133
-  4. Compare to baseline (typical: every 2-4 hours)
134
-  
135
-Example output:
136
-  📡 Sync frequency: ABNORMAL
137
-     Expected: sync every 2-4 hours
138
-     Observed: Last sync 6 days ago
139
-     Status: ⚠️ iCloud sync may be stuck
140
-```
141
-
142
-**Q5: "Which devices are contributing data?"**
143
-```
144
-Answer method:
145
-  1. Analyze source distribution in snapshots
146
-  2. Track which source each sample came from
147
-  3. Report composition over time
148
-  
149
-Example output:
150
-  📱 Data source composition:
151
-     iPhone Health: 45% (4,532 samples)
152
-     Apple Watch: 50% (5,041 samples)
153
-     Manual entry: 5% (504 samples)
154
-  
155
-  Trend: Watch contribution increased from 30% (Jan) to 50% (Now)
156
-```
157
-
158
-### 3.2 Export Formats for Analysis
159
-
160
-**Format 1: JSON Forensic Report**
161
-```json
162
-{
163
-  "export_date": "2026-05-01T14:35:22Z",
164
-  "device": "iPhone 15 Pro",
165
-  "ios_version": "18.4.1",
166
-  "app_version": "1.0.0",
167
-  "observation_period": {
168
-    "start": "2026-01-01T00:00:00Z",
169
-    "end": "2026-05-01T14:35:22Z",
170
-    "days": 120
171
-  },
172
-  "anomalies_summary": {
173
-    "total": 12,
174
-    "critical": 2,
175
-    "warning": 3,
176
-    "info": 7,
177
-    "by_type": {
178
-      "historical_insertion": 5,
179
-      "silent_deletion": 2,
180
-      "duplicate": 3,
181
-      "divergence": 2
182
-    }
183
-  },
184
-  "anomalies": [
185
-    {
186
-      "id": "ANML_20260501_001",
187
-      "type": "silent_deletion",
188
-      "severity": "critical",
189
-      "timestamp": "2026-04-15T08:30:00Z",
190
-      "description": "72 step samples lost without deletion notification",
191
-      "evidence": {
192
-        "sample_type": "Steps",
193
-        "loss_count": 72,
194
-        "loss_percent": 23.4,
195
-        "affected_dates": ["2026-04-13", "2026-04-14", "2026-04-15"]
196
-      }
197
-    }
198
-  ],
199
-  "snapshots": [
200
-    {
201
-      "timestamp": "2026-05-01T14:35:00Z",
202
-      "type": "Steps",
203
-      "count": 305_432,
204
-      "sources": {
205
-        "iPhone Health": 152_716,
206
-        "Apple Watch": 152_716
207
-      }
208
-    }
209
-  ]
210
-}
211
-```
212
-
213
-**Format 2: CSV Timeline (for Excel/Sheets)**
214
-```csv
215
-Date,Time,Event,Type,Severity,Description,Details
216
-2026-01-01,08:15,Initial Snapshot,snapshot,info,Baseline established,305432 steps
217
-2026-02-15,14:20,Historical Insertion,anomaly,medium,Data appeared retroactively,Workout from 2025-01-15
218
-2026-03-15,09:00,Silent Deletion,anomaly,critical,Data gap detected,72 steps lost
219
-2026-04-20,16:45,Sync State Change,sync,info,iCloud sync completed,Samples added: 87
220
-```
221
-
222
-**Format 3: Markdown Report (for Bug Submissions)**
223
-```markdown
224
-# Apple Health Data Integrity Issue Report
225
-
226
-## Timeline
227
-- **Start observation:** 2026-01-01
228
-- **Last check:** 2026-05-01
229
-- **Total observations:** 120 days
230
-
231
-## Issue Summary
232
-3 critical anomalies detected involving 280+ data samples and 15 days of missing data.
233
-
234
-### Critical Finding #1: Silent Deletion (2026-03-15)
235
-- **Type:** 72 step samples disappeared without notification
236
-- **Date affected:** 2026-04-13 to 2026-04-15
237
-- **Detection:** Snapshot comparison (305,432 → 305,360)
238
-- **Severity:** Critical
239
-- **Context:** Occurred 3 days after iOS 18.4 update
240
-
241
-### Critical Finding #2: Historical Insertion (2026-02-15)
242
-- **Type:** Workout appears 471 days after original date
243
-- **Sample:** "Morning Run" from 2025-01-15
244
-- **First observed:** 2026-02-15 14:20 UTC
245
-- **Context:** 2 minutes after iCloud sync completed
246
-- **Severity:** Medium (likely restore from backup)
247
-
248
-## Recommendations
249
-1. Verify data integrity on other devices
250
-2. Compare with iCloud.com Health export
251
-3. Review iOS 18.4 release notes for HealthKit changes
252
-4. Check if backup restore was interrupted
253
-```
254
-
255
-### 3.3 Forensic Techniques Enabled by HealthProbe
256
-
257
-**Technique 1: Timeline Reconstruction**
258
-```
259
-Given: Snapshots at [T0, T1, T2, ...]
260
-Compute: Δ_count = snapshot[T_i] - snapshot[T_i-1]
261
-Result: Visual timeline of when data appeared/disappeared
262
-Use: Correlate with sync events, OS updates, app launches
263
-```
264
-
265
-**Technique 2: Source Attribution**
266
-```
267
-Given: Source field in each snapshot
268
-Track: iPhone vs. Watch vs. Manual contributions
269
-Result: Identify which device is unreliable
270
-Use: Isolate whether issue is device, OS, or iCloud
271
-```
272
-
273
-**Technique 3: Anomaly Clustering**
274
-```
275
-Given: All anomalies with timestamps
276
-Cluster: Group nearby anomalies (e.g., within 24 hours)
277
-Result: Pattern detection — is this systemic or isolated?
278
-Use: Determine if it's device-specific or iOS version issue
279
-```
280
-
281
-**Technique 4: Cross-Device Correlation** (Future: macOS)
282
-```
283
-Given: Multiple device's HealthProbe exports
284
-Compare: Are anomalies synchronized across devices?
285
-Result: Distinguish local bug from iCloud sync issue
286
-Use: Report "this affects all devices" vs. "only this device"
287
-```
288
-
289
-
290
-## 4. Comparison: HealthProbe vs. Alternatives
291
-
292
-| Feature | HealthProbe | Health.app | Third-party apps |
293
-|---------|------------|-----------|-----------------|
294
-| **Real-time monitoring** | ✅ Yes | ❌ No | ⚠️ Partial |
295
-| **Audit trail** | ✅ Yes | ❌ No | ❌ No |
296
-| **Detects data loss** | ✅ Yes | ❌ No (silent) | ❌ No |
297
-| **Privacy (no exfiltration)** | ✅ Yes | ✅ Yes | ❌ Often sells data |
298
-| **Local-only** | ✅ Yes | ✅ Yes | ❌ Often cloud-based |
299
-| **Open source** | 🔄 Future | ❌ No | ⚠️ Some are |
300
-| **Forensic export** | ✅ Yes | ❌ No | ⚠️ Limited |
301
-
302
-
303
-## 5. Recommended Usage Patterns
304
-
305
-### 5.1 For Individual Users (Personal Monitoring)
306
-
307
-```
308
-Baseline:
309
-  1. Install HealthProbe
310
-  2. Let it run for 30 days to establish baseline
311
-  3. Export first "clean" snapshot
312
-  
313
-Ongoing:
314
-  1. Check app weekly (or enable background notifications)
315
-  2. If anomaly alert → Screenshot it
316
-  3. If critical → Export full report immediately
317
-  4. Keep exported reports in Notes/Files app as backup
318
-  
319
-Post-incident:
320
-  1. Export complete forensic report
321
-  2. Attach to Apple Feedback Assistant ticket
322
-  3. Include link to DearApple issue #001
323
-```
324
-
325
-### 5.2 For Researchers (Data Collection)
326
-
327
-```
328
-Setup:
329
-  1. Export anonymized anomaly summaries manually
330
-  2. Keep raw archive local
331
-  
332
-Analysis:
333
-  1. Correlate own data loss with iOS release dates
334
-  2. Compare patterns with other HealthProbe users
335
-  3. Contribute findings to DearApple repository
336
-```
337
-
338
-### 5.3 For Apple Support / Developers
339
-
340
-```
341
-When submitting feedback:
342
-  1. Include HealthProbe forensic export
343
-  2. Specify device model, iOS version, exact reproduction steps
344
-  3. Include timeline showing when anomalies appeared
345
-  4. Mention if pattern repeats across multiple devices
346
-```
347
-
348
-
349
-## 6. Future Enhancements (Post-MVP)
350
-
351
-### 6.1 Machine Learning Anomaly Scoring
352
-```
353
-Current: Binary detection (anomaly or not)
354
-Future: Confidence scoring (0-100%)
355
-  - Low risk: Temporary duplicates, minor drifts
356
-  - High risk: Permanent loss, systematic divergence
357
-  - Enable severity-based alerting
358
-```
359
-
360
-### 6.2 Community Pattern Database
361
-```
362
-Current: Single-device observation
363
-Future: Anonymized multi-device dataset
364
-  - iOS 18.4 affected 23% of users
365
-  - "Morning Run" workout loses 15% post-sync (systematic)
366
-  - Identify if issue is iOS, device model, or iCloud region specific
367
-```
368
-
369
-### 6.3 Predictive Detection
370
-```
371
-Current: Detect after anomaly occurs
372
-Future: Alert before data is lost
373
-  - Watch for sync stall patterns
374
-  - Pre-loss indicators (e.g., rapid duplicates → deletion)
375
-```
376
-
377
-
378
-## 7. Troubleshooting HealthProbe Itself
379
-
380
-### Common Issues
381
-
382
-| Issue | Cause | Fix |
383
-|-------|-------|-----|
384
-| **No anomalies detected for weeks** | Background fetch disabled | Settings → HealthProbe → Background Refresh |
385
-| **Snapshots not being saved** | Insufficient storage or archive write failure | Free up space; verify local archive health and SwiftData cache rebuild |
386
-| **Sync state not updating** | iCloud token check failing | Sign out/in to iCloud; restart device |
387
-| **Old audit trail entries missing** | SwiftData cache/log retention policy or migration issue | Rebuild derived views from archive where possible; export reports before uninstall |
388
-
389
-
390
-## 8. References
391
-
392
-- **iOS HealthKit Framework:** https://developer.apple.com/documentation/healthkit/
393
-- **HKAnchoredObjectQuery:** https://developer.apple.com/documentation/healthkit/hkanchoredrobjectquery
394
-- **SwiftData Persistence:** https://developer.apple.com/documentation/swiftdata
395
-- **DearApple Issue #001:** Apple Health mass data loss investigation
396
-- **Apple Privacy:** https://www.apple.com/privacy/
397
-
398
-
399
-*HealthProbe — Forensics for Your Health Data*  
400
-*Document v1.0 — 2026-05-01*
+0 -72
HealthProbe/Doc/HealthProbe iOS – Specification (MVP).md
@@ -1,75 +0,0 @@
1
-# HealthProbe iOS – Specification (MVP)
2
-
3
-## Overview
4
-HealthProbe is an iOS application designed to **audit and monitor the integrity of HealthKit data**.  
5
-It detects anomalies such as:
6
-- unexpected historical insertions
7
-- silent deletions of past data
8
-- duplicate records
9
-- divergence trends over time
10
-
11
-The application operates as a **local audit and capture agent**. It does not sync HealthProbe data via CloudKit/iCloud; HealthKit databases can evolve differently per device, so the MVP keeps each device archive local and explicit.
12
-
13
-⚠️ This document describes ONLY the iOS application (MVP phase).  
14
-A future macOS application will act as a visualization/analysis layer.
15
-
16
-
17
-## Core Principles
18
-
19
-1. **Read-only with respect to HealthKit**
20
-   - Never modify or delete HealthKit data
21
-   - Only observe and audit
22
-
23
-2. **Local-first architecture**
24
-   - All detection must work without network access
25
-
26
-3. **Incremental observation**
27
-   - Use anchored queries to track changes
28
-
29
-4. **No app cloud sync**
30
-   - HealthProbe does not sync raw samples, digests, or reports through CloudKit/iCloud
31
-
32
-5. **Robust local archive**
33
-   - Store captured HealthKit data in one local archive store, not per-data-type silos
34
-   - SwiftData is used for derived UI data, settings, logs, and history only
35
-
36
-
37
-## Features (MVP)
38
-
39
-### 1. HealthKit Monitoring
40
-Use:
41
-- `HKAnchoredObjectQuery`
42
-- `HKObserverQuery`
43
-
44
-Track:
45
-- Workouts (`HKWorkoutType`)
46
-- Heart Rate (`HKQuantityTypeIdentifierHeartRate`)
47
-- High Heart Rate Events
48
-- Other relevant samples (extensible)
49
-
50
-Persist:
51
-- sample values and units
52
-- source and source revision metadata
53
-- device metadata exposed by HealthKit
54
-- HealthKit metadata dictionaries
55
-- first-seen / last-seen / last-verified timestamps
56
-- fingerprints for matching against Apple Health XML exports and backup database extracts
57
-
58
-
59
-### 2. Anomaly Detection
60
-
61
-#### A. Historical Insertions
62
-Detect samples where:
63
-- `startDate << now`
64
-- AND `firstSeenAt ≈ now`
65
-
66
-#### B. Deletions
67
-Detect via:
68
-- `HKDeletedObject`
69
-- Compare with previously stored snapshot
70
-
71
-#### C. Duplicate Detection
72
-Fingerprint:
+0 -539
HealthProbe/Doc/HealthProbe – Complete Specification & Motivations.md
@@ -1,558 +0,0 @@
1
-# HealthProbe – Complete Specification & Motivations
2
-
3
-**Version:** 1.2  
4
-**Status:** MVP (iOS monitoring agent)  
5
-**Last Updated:** 2026-05-18  
6
-
7
-
8
-## 1. Executive Summary
9
-
10
-HealthProbe is an **audit and integrity monitoring tool for Apple HealthKit**, designed to detect and document anomalies in health data that would otherwise go unnoticed. It serves as a **local sentinel** — read-only observation with forensic-grade logging for later analysis.
11
-
12
-**Core Problem:** Apple Health data loss events (confirmed Sept 2025 incident, ongoing sporadic reports) lack detective mechanisms. Users do not know when data has been lost, corrupted, or silently modified.
13
-
14
-**Solution:** HealthProbe incrementally captures HealthKit data into a robust local archive and maintains an audit trail for post-incident forensic analysis. SwiftData is used for UI assistance, settings, logs, history, and precomputed values; it is not the source of truth. The local archive exists because Apple Health/iCloud can **rewrite / downsample / consolidate** historical samples in-place (not only add/delete), which cannot be proven by counts alone.
15
-
16
-
17
-## 2. Motivations & Concrete Observed Cases
18
-
19
-### 2.1 The September 2025 Mass Data Loss Event
20
-
21
-**What happened:**
22
-- Large-scale loss of Apple Health records reported across multiple devices and iOS versions
23
-- Timeframe: September 2025 (correlated with iOS 26 release)
24
-- Suspected triggers:
25
-  - Device migration (iCloud sync state transitions)
26
-  - OS upgrade/downgrade cycles
27
-  - Backup restore operations
28
-  - HealthKit database re-indexing
29
-  - iCloud sync divergence
30
-
31
-**Why undetected:**
32
-- No notification from Apple Health
33
-- Users discover loss retrospectively (weeks/months later)
34
-- No audit trail to identify exactly *when* or *what* was lost
35
-- No differentiation between: user deletion, sync loss, corruption, or app bug
36
-
37
-**HealthProbe's answer:**
38
-- Continuous monitoring detects *first occurrence* of loss
39
-- Timestamped snapshots enable forensic reconstruction
40
-- Pattern detection identifies trends (gradual loss vs. sudden wipe)
41
-- Allows users to file reproducible bug reports with evidence
42
-
43
-### 2.2 Observed Anomalies
44
-
45
-#### A. Historical Insertions (Backdated Data)
46
-**Pattern:** HealthKit receives samples with `startDate` far in the past, but `firstSeen` ≈ now
47
-- **Examples:**
48
-  - Workout from Jan 2023 suddenly appears in Feb 2026
49
-  - Step count "corrected" retroactively without user action
50
-  - Heart rate baseline recalibration affecting past months
51
-  
52
-**Root cause theories:**
53
-  - iCloud sync restoring from outdated backup
54
-  - Third-party fitness app injecting historical reconstructions
55
-  - HealthKit recovery logic applying retroactive corrections
56
-  - Cross-device sync desynchronization
57
-
58
-**Detection method:** Anchored queries + timestamp comparison
59
-
60
-
61
-#### B. Silent Deletions
62
-**Pattern:** Samples present in previous snapshot, absent in current, no `HKDeletedObject` notification
63
-- **Examples:**
64
-  - 2-week gap of step data (no deletion events logged)
65
-  - Entire workout history from old iPhone missing post-restore
66
-  - Selective loss (e.g., only workouts, heart rate preserved)
67
-
68
-**Root cause theories:**
69
-  - Incomplete restore from backup
70
-  - Selective iCloud sync pruning based on storage limits
71
-  - Corrupted local database during indexing
72
-  - Race condition during multi-device sync
73
-
74
-**Detection method:** Snapshot comparison + gap detection
75
-
76
-
77
-#### C. Duplicate Records
78
-**Pattern:** Identical samples (same type, time, value) appearing multiple times
79
-- **Examples:**
80
-  - Duplicate step counts from watch syncing (15 min apart)
81
-  - Duplicate workouts after iCloud re-sync
82
-  - Conflicting HR readings within 30-second window
83
-
84
-**Root cause theories:**
85
-  - Multi-device sync collision (watch + phone + iPad)
86
-  - Retry logic without deduplication
87
-  - Backup restore merging with live data
88
-
89
-**Detection method:** Fingerprinting (type + date + value + source)
90
-
91
-
92
-#### D. Divergence Trends
93
-**Pattern:** Measurable drift in aggregated metrics over time
94
-- **Examples:**
95
-  - Active energy expenditure trending down 30% without behavior change
96
-  - Sleep records shifting systematically earlier/later
97
-  - Heart rate variability calculation method changing unexpectedly
98
-
99
-**Root cause theories:**
100
-  - Algorithm updates not backfilled uniformly
101
-  - Device calibration drift
102
-  - Source priority shifting (watch → phone)
103
-  - Health app recalculation without user visibility
104
-
105
-**Detection method:** Time-series aggregation + statistical outlier detection
106
-
107
-
108
-#### E. Consolidation / Downsampling (Historical Rewrites)
109
-**Pattern:** For high-frequency data types, Apple Health/iCloud may rewrite historical samples such that:
110
-- total sample `count` decreases for older months
111
-- timestamps in a later snapshot/export become a **subset** of an earlier snapshot/export (thinning)
112
-- some samples become **interval-based** (`endDate > startDate`) and/or values become fractional
113
-- for cumulative quantities, `value_sum` can remain stable while per-sample `value_max` increases (consolidation)
114
-
115
-**Why it matters:** A “record counter” cannot distinguish discard vs consolidation. HealthProbe must support value-level forensics and (optionally) preserve complete evidence locally.
116
-
117
-**Detection method:** Snapshot comparison of fingerprints *plus* optional per-sample archives for selected data types.
118
-
119
-
120
-### 2.3 Why This Matters
121
-
122
-| Concern | Impact | HealthProbe Role |
123
-|---------|--------|-----------------|
124
-| **Data loss undetected** | Users lose personal health history with no notification | Immediate detection & alert |
125
-| **No forensic trail** | Impossible to reproduce for bug reports | Audit trail enables Apple debugging |
126
-| **Blame uncertainty** | "Is it sync? Backup? A bug?" | Precise classification of anomaly type |
127
-| **Third-party apps** | Apps assume data is trustworthy, may make wrong decisions | Detect corruption before downstream use |
128
-| **Privacy of monitoring** | Users fear data exfiltration by health apps | Local-only observation, no cloud upload |
129
-
130
-
131
-## 3. Core Architecture
132
-
133
-### 3.1 Design Principles
134
-
135
-1. **Read-only operations** (never modify HealthKit data)
136
-2. **Local-first** (full functionality without network)
137
-3. **Incremental queries** (efficient, avoid repeating work)
138
-4. **Single archive store** (do not split the forensic store per data type; cross-type relationships and shared metadata matter)
139
-5. **Auditability** (every observation logged, timestamped, reproducible)
140
-6. **Privacy by default** (no HealthProbe cloud sync; local storage remains under user control)
141
-7. **Forensic capture** (selected data types are archived locally as complete per-sample records with metadata to preserve evidence against silent rewrites)
142
-
143
-### 3.2 Threading Model
144
-
145
-```
146
-┌─────────────────────────────────────────┐
147
-│  Main Thread (UI)                       │
148
-│  - Display current health status        │
149
-│  - Show alerts & anomalies              │
150
-│  - User interaction                     │
151
-└──────────────┬──────────────────────────┘
152
-               │
153
-               ├─ Delegate query results
154
-               │
155
-┌──────────────▼──────────────────────────┐
156
-│  Background Queue (HealthKit Queries)   │
157
-│  - HKAnchoredObjectQuery (efficient)    │
158
-│  - HKObserverQuery (reactive)           │
159
-│  - Snapshot comparisons                 │
160
-│  - Anomaly detection logic              │
161
-└──────────────┬──────────────────────────┘
162
-               │
163
-               ├─ Write detected anomalies
164
-               │
165
-┌──────────────▼──────────────────────────┐
166
-│  Local Archive Store                    │
167
-│  - Canonical HealthKit samples          │
168
-│  - Sources, devices, metadata           │
169
-│  - Cross-type relationships             │
170
-│  - Fingerprints and verification hashes │
171
-└──────────────┬──────────────────────────┘
172
-               │
173
-┌──────────────▼──────────────────────────┐
174
-│  SwiftData UI Store                     │
175
-│  - Precomputed counts/statistics        │
176
-│  - Visualization state and settings     │
177
-│  - Logs, history, report indexes        │
178
-└─────────────────────────────────────────┘
179
-```
180
-
181
-### 3.3 Storage Model
182
-
183
-**Local Archive Store (source of truth):**
184
-- one robust local database for all archived samples, not one archive per data type
185
-- normalized entities for samples, workouts, sources, source revisions, devices, metadata, relationships, and observations
186
-- multiple fingerprints per sample: HealthKit UUID, strict fingerprint, semantic fingerprint, and fuzzy matching keys for export/backup reconciliation
187
-- append-only observation history (`firstSeen`, `lastSeen`, `lastVerified`, disappearance evidence)
188
-- snapshot-level and table-level hashes for integrity checks
189
-
190
-**SwiftData UI Store (derived/cache layer):**
191
-- settings and selected data types
192
-- import job state and progress
193
-- precomputed counts, temporal bins, display ranges, and summary statistics
194
-- audit log entries and report indexes
195
-- anomaly summaries and links into the archive store
196
-
197
-SwiftData rows must be rebuildable from the local archive store. If the two disagree, the archive store wins.
198
-
199
-
200
-## 4. Monitoring Features (MVP)
201
-
202
-### 4.1 Incremental Change Detection
203
-
204
-**Using `HKAnchoredObjectQuery`:**
205
-```
206
-Query pattern:
207
-├─ Initial query: anchor = 0 → captures all existing data
208
-├─ Store anchor locally
209
-├─ Periodic queries: anchor = stored → captures only new/modified samples
210
-└─ Update anchor → efficient incremental updates
211
-```
212
-
213
-**What triggers a query:**
214
-- App launch
215
-- Background refresh (iOS allows periodic background queries)
216
-- User manually triggers "Check Now"
217
-- Every 12-24 hours (configurable)
218
-
219
-### 4.2 Tracked Sample Types (Extensible)
220
-
221
-| Type | Why Monitored | Anomaly Signal |
222
-|------|---------------|----------------|
223
-| **Workouts** | High-value data, often synced from watch | Historical insertions, duplicates |
224
-| **Heart Rate** | Continuous stream, high modification risk | Gaps, divergence |
225
-| **Activity Summary** | Auto-computed, depends on other types | Recalculation without notice |
226
-| **Steps** | Cross-device (watch/phone), sync-heavy | Duplicate from retries |
227
-| **Sleep** | Frequently "corrected" post-recording | Backdated entries, loss |
228
-| **Blood Pressure** | Manual entry, sync state-dependent | Divergence trends |
229
-| **Audio Exposure** | Often device-specific | Selective loss |
230
-
231
-### 4.3 Anomaly Detection Logic
232
-
233
-#### A. Historical Insertion Detection
234
-```
235
-For each sample:
236
-  Δt = now - startDate  (age of sample)
237
-  Δt_observed = now - firstSeen  (how long we've known about it)
238
-  
239
-  IF Δt >> Δt_observed (e.g., Δt ≈ 6 months, Δt_observed ≈ 5 minutes):
240
-    → Flag as "historical insertion"
241
-    → Severity: MEDIUM (might be legitimate correction)
242
-```
243
-
244
-#### B. Deletion Detection
245
-```
246
-Current snapshot: S_now
247
-Previous snapshot: S_prev
248
-
249
-Missing = S_prev - S_now
250
-  (samples present before, absent now)
251
-
252
-IF |Missing| > 0 AND no HKDeletedObject notification:
253
-  → Flag as "silent deletion"
254
-  → Severity: CRITICAL
255
-  → Record gap_duration = time between last observation and absence
256
-```
257
-
258
-#### C. Duplicate Detection
259
-```
260
-Fingerprint = (sampleType, startDate, value, unit, source)
261
-
262
-IF count(fingerprint) > 1:
263
-  → Flag as "duplicate record"
264
-  → Severity: LOW (data integrity risk)
265
-  → Calculate time between duplicates
266
-```
267
-
268
-#### D. Divergence Detection
269
-```
270
-Track aggregated metrics:
271
-  total_steps_per_day[date]
272
-  active_energy_per_day[date]
273
-  hr_average_per_day[date]
274
-
275
-For each metric over time:
276
-  σ_expected = standard deviation (normal range)
277
-  σ_observed = recent variance
278
-  
279
-  IF σ_observed > 2 * σ_expected:
280
-    → Flag as "divergence trend"
281
-    → Severity: MEDIUM
282
-    → Record trend direction & magnitude
283
-```
284
-
285
-
286
-## 5. Sync Context Logging
287
-
288
-HealthProbe does **not** sync its own archive through iCloud or CloudKit. Observed HealthKit databases can diverge between devices, so cross-device HealthProbe sync would increase complexity without providing a reliable forensic source of truth.
289
-
290
-Health/iCloud state is still useful as **context** for anomalies.
291
-
292
-### 5.1 Context Tracking
293
-
294
-**Observe HealthKit permission & sync state:**
295
-```swift
296
-HKHealthStore().requestAuthorization(...)
297
-// → Detect when user grants/revokes permissions
298
-
299
-// Monitor iCloud state
300
-FileManager.default.ubiquityIdentityToken
301
-// → Detects iCloud sign-in/sign-out
302
-// → Logs context for later correlation
303
-```
304
-
305
-**Capture lifecycle events:**
306
-- iCloud sign-in detected → log context and schedule a local archive verification pass
307
-- iCloud sign-out detected → note local-only mode
308
-- Device backup initiated → pre-backup snapshot
309
-- App backgrounded/foregrounded → check for sync activity
310
-
311
-### 5.2 Context Documentation
312
-
313
-**Audit trail entries:**
314
-```
315
-[2026-05-01 14:23:15] SYNC_STATE_CHANGE: iCloud enabled
316
-  - Previous: local-only
317
-  - Action: archive verification scheduled
318
-  - Result: no HealthProbe cloud sync performed
319
-
320
-[2026-05-01 14:24:02] SYNC_COMPLETE: iCloud data merged
321
-  - Samples added: 87
322
-  - Samples deleted: 3
323
-  - Duplicates found: 2
324
-  - Divergence detected: NO
325
-
326
-[2026-05-01 16:15:00] ANOMALY_DETECTED: Historical insertion
327
-  - Sample: Workout "Running" 
328
-  - Original date: 2024-03-15
329
-  - First observed: 2026-05-01
330
-  - Age: 778 days
331
-  - Severity: MEDIUM
332
-```
333
-
334
-### 5.3 Background Monitoring
335
-
336
-**iOS Background Modes enabled:**
337
-- `background-fetch` — periodic archive and context checks
338
-- `remote-notification` → not required for HealthProbe archive sync
339
-
340
-**Check frequency:**
341
-- Min: 2 hours
342
-- Max: 24 hours
343
-- Adapts based on anomaly detection frequency
344
-
345
-
346
-## 6. Local Archive, Reports & Forensics
347
-
348
-### 6.1 Local Archive Store
349
-
350
-The main backup artifact is the on-device archive store. It is populated incrementally from HealthKit and is not dependent on Apple Health ZIP exports or full encrypted iPhone backups.
351
-
352
-The archive must preserve as much HealthKit information as the API exposes:
353
-- sample UUID, type, start/end date, value, unit, and metadata
354
-- source, source revision, bundle identifier, product type, version/build if available
355
-- device fields exposed by `HKDevice`
356
-- relationships between workouts, samples, events, and other linked records where available
357
-- first-seen / last-seen / last-verified observations
358
-- fingerprints suitable for matching against Apple Health XML exports and extracted backup databases
359
-
360
-The archive is selected by data type for performance and privacy, but it is stored in **one schema** so later analysis can follow relationships between types.
361
-
362
-### 6.2 Reports and Point Exports
363
-
364
-HealthProbe does not need to optimize for routine complete exports. The local archive is the backup.
365
-
366
-Export is scoped to what the user is inspecting:
367
-- anomaly reports
368
-- tables of records shown in UI (e.g., “these 1,000 HK records disappeared”)
369
-- point-in-time manifests and hashes
370
-- selected record sets needed for external analysis
371
-
372
-### 6.3 Forensic Query Examples
373
-
374
-**"Has my step data been compromised?"**
375
-```
376
-1. Load all snapshots for "Steps" type
377
-2. Plot sample count over time
378
-3. Identify gaps > 6 hours
379
-4. Report: when, how many missing, context
380
-```
381
-
382
-**"Did iCloud sync break my data?"**
383
-```
384
-1. Correlate anomalies with observed Health/iCloud state changes
385
-2. Show timeline: before state change, during reconciliation, after
386
-3. Calculate: samples lost, duplicates introduced
387
-```
388
-
389
-**"Is my health data drifting?"**
390
-```
391
-1. Compute daily aggregates (steps, energy, HR)
392
-2. Fit trend line over 30-90 days
393
-3. Report: slope (drift direction), R² (confidence)
394
-4. Compare to device baseline
395
-```
396
-
397
-
398
-## 7. User-Facing Features
399
-
400
-### 7.1 Dashboard (iOS App)
401
-
402
-**Home Screen:**
403
-- **Health Status** — "✅ Healthy" / "⚠️ Check" / "🚨 Critical"
404
-- **Last Check** — timestamp of last monitoring run
405
-- **Quick Stats** — samples tracked, anomalies found (all-time)
406
-- **Active Alerts** — up to 3 most recent anomalies
407
-
408
-**Detail Views:**
409
-- **Anomalies** — sortable list by date/severity
410
-- **Snapshots** — historical timeline of known-good snapshots
411
-- **Audit Trail** — complete immutable log
412
-- **Archive Status** — current local archive health, last verification, selected data types
413
-
414
-**Settings:**
415
-- Check frequency
416
-- Sample types to track
417
-- Alert thresholds
418
-- Local archive retention and report export options
419
-
420
-### 7.2 Alerts
421
-
422
-**Push Notifications (opt-in):**
423
-- 🚨 "Critical data loss detected" (> 10% samples missing)
424
-- ⚠️ "Unexpected historical data inserted" (> 100 samples)
425
-- ℹ️ "Archive check completed, 2 duplicates found"
426
-
427
-
428
-## 8. Future Enhancements (Beyond MVP)
429
-
430
-### 8.1 macOS Companion (Visualization Layer)
431
-- Open and analyze exported HealthProbe reports or archive copies
432
-- Long-term trend visualization (6-12 month history)
433
-- Cross-device anomaly correlation
434
-- Export to reproducible bug reports
435
-
436
-### 8.2 Machine Learning
437
-- Personalized baseline generation
438
-- Anomaly confidence scoring
439
-- Predictive detection (flag drift before threshold hit)
440
-
441
-### 8.3 Community Patterns
442
-- Anonymized digest sharing → identify systemic issues
443
-- Detect if data loss correlates with: iOS version, device model, iCloud region, etc.
444
-- Contribute to DearApple bug reports with statistical evidence
445
-
446
-
447
-## 9. Technical Specifications
448
-
449
-### 9.1 Platform
450
-- **iOS 15.0+** (HealthKit framework support)
451
-- **watchOS 8.0+** (future sync awareness)
452
-- **macOS 12.0+** (visualization, analysis)
453
-
454
-### 9.2 Permissions Required
455
-- `HealthKit` — read-only access to specified types
456
-- `Background Modes` — "Background Fetch"
457
-
458
-### 9.3 Data Storage
459
-- **Local Archive Store:** canonical HealthKit sample archive (source of truth)
460
-- **SwiftData:** derived UI/cache/settings/log/history store
461
-- **No CloudKit sync:** HealthProbe data remains local unless the user exports a report or selected record table
462
-
463
-### 9.4 Performance
464
-- Query time: < 5 seconds (anchored queries)
465
-- Snapshot/index size: ≈ 5-10 KB per type per snapshot in SwiftData
466
-- Archive storage: depends on selected high-frequency data types; report per-type storage costs in settings
467
-
468
-
469
-## 10. Privacy & Security
470
-
471
-### 10.1 What HealthProbe Never Does
472
-- ❌ Exports raw health samples to cloud
473
-- ❌ Identifies users by name/account
474
-- ❌ Shares device location or personal context
475
-- ❌ Modifies any HealthKit data
476
-- ❌ Sells or shares data with third parties
477
-
478
-### 10.2 What HealthProbe Collects (Local Only)
479
-- ✅ Aggregated counts (not samples)
480
-- ✅ Timestamps of anomalies
481
-- ✅ Device model & iOS version (for context)
482
-- ✅ Anomaly types & severity
483
-
484
-**Local archive:**
485
-- ✅ Per-sample archive for user-selected types, stored on-device and exportable by user
486
-- ✅ Metadata needed for recognition in Apple Health XML exports, backup database extracts, and future datasets
487
-
488
-### 10.3 Cloud Policy
489
-- No HealthProbe CloudKit/iCloud sync
490
-- No automatic upload of raw samples, digests, reports, or device fingerprints
491
-- User-triggered exports are explicit, scoped, and local-file based
492
-
493
-
494
-## 11. Success Criteria
495
-
496
-| Objective | Metric | Target |
497
-|-----------|--------|--------|
498
-| **Detect loss** | Time to detection after loss occurs | < 24 hours |
499
-| **Forensic completeness** | % of anomalies with sufficient evidence | > 95% |
500
-| **False positives** | Alerts user shouldn't worry about | < 5% of total |
501
-| **Privacy** | % of users comfortable with data practices | > 90% |
502
-| **Performance** | Background capture battery impact | < 2% drain/day |
503
-| **Adoption** | Users can reproduce bugs with HealthProbe data | High relevance in Apple feedback |
504
-
505
-
506
-## 12. References & Related Work
507
-
508
-- [DearApple Issue #001](https://github.com/overbog/dear-apple/issues/0001-apple-health-mass-data-loss.md) — Sept 2025 mass data loss
509
-- [Apple HealthKit Documentation](https://developer.apple.com/documentation/healthkit/)
510
-- [HKAnchoredObjectQuery](https://developer.apple.com/documentation/healthkit/hkanchoredrobjectquery) — Efficient incremental queries
511
-
512
-
513
-## Appendix A: Example Anomaly Report
514
-
515
-```json
516
-{
517
-  "anomaly_id": "ANML_20260501_001",
518
-  "type": "historical_insertion",
519
-  "timestamp_detected": "2026-05-01T14:35:22Z",
520
-  "severity": "MEDIUM",
521
-  "evidence": {
522
-    "sample_type": "HKWorkout",
523
-    "workout_type": "Running",
524
-    "start_date": "2025-01-15T07:30:00Z",
525
-    "end_date": "2025-01-15T08:15:00Z",
526
-    "duration_minutes": 45,
527
-    "calories": 420,
528
-    "first_observed": "2026-05-01T14:35:00Z",
529
-    "age_days": 106,
530
-    "source": "Health.app",
531
-    "context": "iCloud sync completed 2 hours prior"
532
-  },
533
-  "classification": "Likely data recovery from cloud",
534
-  "recommended_action": "Monitor for similar patterns"
535
-}
536
-```
537
-
538
-
539
-*HealthProbe — Guarding the integrity of your health data.*
+0 -629
HealthProbe/Doc/Implementation Guide.md
@@ -1,639 +0,0 @@
1
-# HealthProbe – Technical Implementation Guide
2
-
3
-**Document Purpose:** Step-by-step guide for iOS app implementation  
4
-**Target Audience:** iOS developers  
5
-**Prerequisite Reading:** "Complete Specification & Motivations"
6
-
7
-
8
-## ⚠️ Privacy Directives — Mandatory
9
-
10
-The following rules apply to **all code, logs, examples, tests, and documentation** in this project:
11
-
12
-- **No credentials** — no API keys, tokens, passwords, or signing certificates
13
-- **No personal data** — no names, email addresses, phone numbers, or dates of birth
14
-- **No device identifiers** — no UDIDs, serial numbers, advertising IDs, or device names
15
-- **No account identifiers** — no Apple IDs, iCloud account info, or CloudKit record IDs
16
-- **No raw health values in the repository** — do not include real health records, measurements, or workouts in code, tests, logs, examples, or documentation. The app may optionally store a user's raw samples **locally on-device** for forensic backup, but nothing real belongs in this repo.
17
-- **No location data** — no GPS coordinates or location history
18
-- **No recognizable patterns** — no logs or exports where combining fields could identify a person or device
19
-
20
-If adding examples, use clearly synthetic data: `"Device: iPhone-TESTDEVICE"`, `"User: Test User"`, `"2000-01-01"`.
21
-
22
-
23
-## 1. HealthKit Integration
24
-
25
-### 1.1 Permission Model
26
-
27
-```swift
28
-import HealthKit
29
-
30
-class HealthKitManager {
31
-    static let shared = HealthKitManager()
32
-    let healthStore = HKHealthStore()
33
-    
34
-    let typesToRead: Set<HKSampleType> = [
35
-        HKWorkoutType.workoutType(),
36
-        HKQuantityType.quantityType(forIdentifier: .heartRate)!,
37
-        HKQuantityType.quantityType(forIdentifier: .stepCount)!,
38
-        HKQuantityType.quantityType(forIdentifier: .activeEnergyBurned)!,
39
-        HKCategoryType.categoryType(forIdentifier: .sleepAnalysis)!,
40
-        HKActivitySummaryType.activitySummaryType(),
41
-    ]
42
-    
43
-    func requestAuthorization(completion: @escaping (Bool, Error?) -> Void) {
44
-        healthStore.requestAuthorization(toShare: [], read: typesToRead) { success, error in
45
-            completion(success, error)
46
-        }
47
-    }
48
-}
49
-```
50
-
51
-### 1.2 Anchored Query Pattern
52
-
53
-**Purpose:** Efficient incremental queries that only fetch changes since last check
54
-
55
-```swift
56
-class AnchoredQueryManager {
57
-    let defaults = UserDefaults(suiteName: "group.com.healthprobe.data")
58
-    
59
-    func loadAnchor(for sampleType: HKSampleType) -> HKQueryAnchor? {
60
-        guard let data = defaults?.data(forKey: "anchor_\(sampleType.identifier)") else {
61
-            return nil
62
-        }
63
-        return try? NSKeyedUnarchiver.unarchivedObject(ofClass: HKQueryAnchor.self, from: data)
64
-    }
65
-    
66
-    func saveAnchor(_ anchor: HKQueryAnchor, for sampleType: HKSampleType) {
67
-        let data = try? NSKeyedArchiver.archivedData(withRootObject: anchor, requiringSecureCoding: true)
68
-        defaults?.set(data, forKey: "anchor_\(sampleType.identifier)")
69
-    }
70
-    
71
-    func executeAnchoredQuery(
72
-        sampleType: HKSampleType,
73
-        completion: @escaping ([HKSample], [HKDeletedObject], HKQueryAnchor) -> Void
74
-    ) {
75
-        let anchor = loadAnchor(for: sampleType) ?? HKQueryAnchor(byAdding: 0)
76
-        let query = HKAnchoredObjectQuery(
77
-            type: sampleType,
78
-            predicate: nil,
79
-            anchor: anchor,
80
-            limit: HKObjectQueryNoLimit
81
-        ) { _, samples, deletedObjects, newAnchor, error in
82
-            guard let newAnchor = newAnchor else { return }
83
-            self.saveAnchor(newAnchor, for: sampleType)
84
-            completion(samples ?? [], deletedObjects ?? [], newAnchor)
85
-        }
86
-        
87
-        healthStore.execute(query)
88
-    }
89
-}
90
-```
91
-
92
-### 1.3 Observer Query (Real-time Changes)
93
-
94
-```swift
95
-class HealthKitObserver {
96
-    func setupObserverQueries(for types: [HKSampleType], handler: @escaping (HKSampleType) -> Void) {
97
-        for sampleType in types {
98
-            let query = HKObserverQuery(sampleType: sampleType, predicate: nil) { _, completionHandler, error in
99
-                if error == nil {
100
-                    handler(sampleType)
101
-                }
102
-                completionHandler()
103
-            }
104
-            
105
-            healthStore.execute(query)
106
-            
107
-            // Important: Keep strong reference to prevent query from being deallocated
108
-            activeQueries.append(query)
109
-        }
110
-    }
111
-    
112
-    // Call this when background notification arrives
113
-    func backgroundFetch(completionHandler: @escaping (UIBackgroundFetchResult) -> Void) {
114
-        // Re-run anchored queries to detect changes
115
-        // Update snapshots and detect anomalies
116
-        // Persist any findings
117
-        completionHandler(.newData)
118
-    }
119
-}
120
-```
121
-
122
-
123
-## 2. Storage Implementation
124
-
125
-HealthProbe uses two storage layers:
126
-
127
-1. **Local Archive Store (source of truth)**
128
-   - Stores canonical HealthKit samples and all metadata exposed by the API
129
-   - Uses one schema for all selected data types, so workouts, samples, sources, devices, and metadata can be related later
130
-   - Maintains `firstSeen`, `lastSeen`, `lastVerified`, strict/semantic/fuzzy fingerprints, and integrity hashes
131
-   - Should be implemented with an explicit local database/archive format (not SwiftData model graphs for millions of samples)
132
-
133
-2. **SwiftData UI Store (derived/cache layer)**
134
-   - Stores settings, logs, import/check history, anomaly summaries, and precomputed values used by charts
135
-   - Can be rebuilt from the archive store
136
-   - Must not be treated as the only forensic copy
137
-
138
-### 2.1 SwiftData UI Models
139
-
140
-```swift
141
-import SwiftData
142
-import Foundation
143
-
144
-// MARK: - Core Models
145
-
146
-@Model
147
-final class HealthSnapshot {
148
-    /// Unique identifier
149
-    @Attribute(.unique) var id: String = UUID().uuidString
150
-    
151
-    /// When this snapshot was captured
152
-    var capturedAt: Date
153
-    
154
-    /// Sample type (e.g., "HKWorkout", "HKQuantity:HeartRate")
155
-    var sampleType: String
156
-    
157
-    /// Source device (e.g., "iPhone 15 Pro", "Apple Watch")
158
-    var sourceDevice: String
159
-    
160
-    /// Total samples of this type at capture time
161
-    var recordCount: Int
162
-    
163
-    /// MD5 of aggregated sample IDs (for integrity checking)
164
-    var integrityChecksum: String
165
-    
166
-    /// Aggregated counts by source: { "iPhone Health": 1200, "Apple Watch": 450 }
167
-    var sourceDistribution: [String: Int]
168
-    
169
-    /// Metadata
170
-    var iosVersion: String
171
-    var appVersion: String
172
-    
173
-    init(
174
-        capturedAt: Date,
175
-        sampleType: String,
176
-        sourceDevice: String,
177
-        recordCount: Int,
178
-        integrityChecksum: String,
179
-        sourceDistribution: [String: Int],
180
-        iosVersion: String,
181
-        appVersion: String
182
-    ) {
183
-        self.capturedAt = capturedAt
184
-        self.sampleType = sampleType
185
-        self.sourceDevice = sourceDevice
186
-        self.recordCount = recordCount
187
-        self.integrityChecksum = integrityChecksum
188
-        self.sourceDistribution = sourceDistribution
189
-        self.iosVersion = iosVersion
190
-        self.appVersion = appVersion
191
-    }
192
-}
193
-
194
-@Model
195
-final class AuditTrailEntry {
196
-    @Attribute(.unique) var id: String = UUID().uuidString
197
-    var timestamp: Date
198
-    var eventType: String  // "snapshot", "sync_event", "anomaly_detected", etc.
199
-    var message: String
200
-    var context: [String: String]  // JSON-serializable context
201
-    
202
-    init(timestamp: Date, eventType: String, message: String, context: [String: String] = [:]) {
203
-        self.timestamp = timestamp
204
-        self.eventType = eventType
205
-        self.message = message
206
-        self.context = context
207
-    }
208
-}
209
-
210
-@Model
211
-final class DetectedAnomaly {
212
-    @Attribute(.unique) var id: String = UUID().uuidString
213
-    var detectedAt: Date
214
-    var type: String  // "historical_insertion", "silent_deletion", "duplicate", "divergence"
215
-    var severity: String  // "info", "warning", "critical"
216
-    var sampleType: String
217
-    var summary: String
218
-    var evidence: [String: String]  // Forensic data
219
-    var resolved: Bool = false
220
-    var resolvedAt: Date?
221
-    
222
-    init(
223
-        detectedAt: Date,
224
-        type: String,
225
-        severity: String,
226
-        sampleType: String,
227
-        summary: String,
228
-        evidence: [String: String] = [:]
229
-    ) {
230
-        self.detectedAt = detectedAt
231
-        self.type = type
232
-        self.severity = severity
233
-        self.sampleType = sampleType
234
-        self.summary = summary
235
-        self.evidence = evidence
236
-    }
237
-}
238
-
239
-@Model
240
-final class ContextStateChange {
241
-    @Attribute(.unique) var id: String = UUID().uuidString
242
-    var timestamp: Date
243
-    var previousState: String  // "local_only", "icloud_enabled", "icloud_sync_active"
244
-    var newState: String
245
-    var details: String
246
-    
247
-    init(timestamp: Date, previousState: String, newState: String, details: String = "") {
248
-        self.timestamp = timestamp
249
-        self.previousState = previousState
250
-        self.newState = newState
251
-        self.details = details
252
-    }
253
-}
254
-
255
-// MARK: - Model Container Setup
256
-
257
-func createModelContainer() throws -> ModelContainer {
258
-    let schema = Schema([
259
-        HealthSnapshot.self,
260
-        AuditTrailEntry.self,
261
-        DetectedAnomaly.self,
262
-        ContextStateChange.self,
263
-    ])
264
-    
265
-    let modelConfiguration = ModelConfiguration(
266
-        schema: schema,
267
-        isStoredInMemoryOnly: false,
268
-        cloudKitDatabase: .none  // Local only in MVP
269
-    )
270
-    
271
-    return try ModelContainer(for: schema, configurations: [modelConfiguration])
272
-}
273
-```
274
-
275
-### 2.2 Local Archive Store Contract
276
-
277
-The archive store should expose a small service interface rather than leaking SQL/archive details into UI code:
278
-
279
-```swift
280
-protocol HealthArchiveStore {
281
-    func upsertSamples(_ samples: [HKSample], observedAt: Date) async throws -> HealthArchiveWriteSummary
282
-    func markVerification(sampleType: HKSampleType, verifiedAt: Date) async throws
283
-    func recordDisappearance(sampleUUIDHash: String, sampleTypeIdentifier: String, observedMissingAt: Date) async throws
284
-    func records(for request: HealthArchiveRecordRequest) async throws -> [ArchivedHealthRecord]
285
-    func exportReport(_ request: HealthArchiveReportRequest) async throws -> URL
286
-}
287
-```
288
-
289
-Archive rows should preserve:
290
-- HealthKit UUID where exposed
291
-- type identifier, start/end date, value, unit
292
-- source, source revision, bundle identifier, version/build/product type where available
293
-- `HKDevice` fields exposed by HealthKit
294
-- full metadata dictionary as structured data
295
-- relationship keys for workouts, events, and related samples where available
296
-- fingerprints for matching records across HealthProbe, Apple Health XML exports, and backup database extracts
297
-
298
-The MVP implementation is `SQLiteHealthArchiveStore`, an actor-isolated SQLite archive in Application Support. It is populated from HealthKit anchored-query pages before SwiftData receives derived snapshot/index rows.
299
-
300
-
301
-## 3. Anomaly Detection Implementation
302
-
303
-```swift
304
-class AnomalyDetector {
305
-    private let modelContext: ModelContext
306
-    private let healthKitManager: HealthKitManager
307
-    
308
-    // MARK: - Historical Insertion Detection
309
-    
310
-    func detectHistoricalInsertions(
311
-        newSamples: [HKSample],
312
-        completion: @escaping ([DetectedAnomaly]) -> Void
313
-    ) {
314
-        var anomalies: [DetectedAnomaly] = []
315
-        let now = Date()
316
-        
317
-        for sample in newSamples {
318
-            let ageInDays = Calendar.current.dateComponents([.day], from: sample.startDate, to: now).day ?? 0
319
-            
320
-            // Check if sample is older than 7 days but was just added
321
-            if ageInDays > 7 {
322
-                let anomaly = DetectedAnomaly(
323
-                    detectedAt: now,
324
-                    type: "historical_insertion",
325
-                    severity: "medium",
326
-                    sampleType: sample.sampleType.identifier,
327
-                    summary: "Sample from \(ageInDays) days ago appeared in HealthKit",
328
-                    evidence: [
329
-                        "original_date": ISO8601DateFormatter().string(from: sample.startDate),
330
-                        "age_days": String(ageInDays),
331
-                        "sample_id": sample.uuid.uuidString,
332
-                    ]
333
-                )
334
-                anomalies.append(anomaly)
335
-            }
336
-        }
337
-        
338
-        completion(anomalies)
339
-    }
340
-    
341
-    // MARK: - Silent Deletion Detection
342
-    
343
-    func detectSilentDeletions(
344
-        previousSnapshot: HealthSnapshot,
345
-        currentSnapshot: HealthSnapshot,
346
-        completion: @escaping ([DetectedAnomaly]) -> Void
347
-    ) {
348
-        var anomalies: [DetectedAnomaly] = []
349
-        
350
-        let previousCount = previousSnapshot.recordCount
351
-        let currentCount = currentSnapshot.recordCount
352
-        let loss = previousCount - currentCount
353
-        
354
-        if loss > 0 {
355
-            let lossPercent = Double(loss) / Double(previousCount) * 100
356
-            let severity = lossPercent > 10 ? "critical" : lossPercent > 5 ? "warning" : "info"
357
-            
358
-            let anomaly = DetectedAnomaly(
359
-                detectedAt: Date(),
360
-                type: "silent_deletion",
361
-                severity: severity,
362
-                sampleType: previousSnapshot.sampleType,
363
-                summary: "\(loss) samples missing (\(String(format: "%.1f", lossPercent))%)",
364
-                evidence: [
365
-                    "previous_count": String(previousCount),
366
-                    "current_count": String(currentCount),
367
-                    "loss_count": String(loss),
368
-                    "loss_percent": String(format: "%.1f", lossPercent),
369
-                    "time_gap": String(describing: Date().timeIntervalSince(previousSnapshot.capturedAt)),
370
-                ]
371
-            )
372
-            anomalies.append(anomaly)
373
-        }
374
-        
375
-        completion(anomalies)
376
-    }
377
-    
378
-    // MARK: - Duplicate Detection
379
-    
380
-    func detectDuplicates(
381
-        samples: [HKSample],
382
-        completion: @escaping ([DetectedAnomaly]) -> Void
383
-    ) {
384
-        var anomalies: [DetectedAnomaly] = []
385
-        var fingerprints: [String: [HKSample]] = [:]
386
-        
387
-        // Group by fingerprint
388
-        for sample in samples {
389
-            let fingerprint = createFingerprint(for: sample)
390
-            fingerprints[fingerprint, default: []].append(sample)
391
-        }
392
-        
393
-        // Find duplicates
394
-        for (fingerprint, dupes) in fingerprints where dupes.count > 1 {
395
-            let anomaly = DetectedAnomaly(
396
-                detectedAt: Date(),
397
-                type: "duplicate",
398
-                severity: "low",
399
-                sampleType: dupes[0].sampleType.identifier,
400
-                summary: "\(dupes.count) duplicate records found",
401
-                evidence: [
402
-                    "fingerprint": fingerprint,
403
-                    "count": String(dupes.count),
404
-                ]
405
-            )
406
-            anomalies.append(anomaly)
407
-        }
408
-        
409
-        completion(anomalies)
410
-    }
411
-    
412
-    // MARK: - Divergence Detection
413
-    
414
-    func detectDivergence(
415
-        currentTrend: [Date: Double],
416
-        historicalBaseline: [Date: Double],
417
-        completion: @escaping ([DetectedAnomaly]) -> Void
418
-    ) {
419
-        // Calculate standard deviations
420
-        let baselineStdDev = standardDeviation(values: Array(historicalBaseline.values))
421
-        let currentStdDev = standardDeviation(values: Array(currentTrend.values))
422
-        
423
-        if currentStdDev > baselineStdDev * 2.0 {
424
-            let anomaly = DetectedAnomaly(
425
-                detectedAt: Date(),
426
-                type: "divergence",
427
-                severity: "medium",
428
-                sampleType: "aggregated_metric",
429
-                summary: "Unusual trend detected (σ increased \(currentStdDev / baselineStdDev)x)",
430
-                evidence: [
431
-                    "baseline_stddev": String(format: "%.2f", baselineStdDev),
432
-                    "current_stddev": String(format: "%.2f", currentStdDev),
433
-                    "ratio": String(format: "%.2f", currentStdDev / baselineStdDev),
434
-                ]
435
-            )
436
-            completion([anomaly])
437
-        } else {
438
-            completion([])
439
-        }
440
-    }
441
-    
442
-    // MARK: - Helpers
443
-    
444
-    private func createFingerprint(for sample: HKSample) -> String {
445
-        let formatter = ISO8601DateFormatter()
446
-        let startStr = formatter.string(from: sample.startDate)
447
-        let endStr = formatter.string(from: sample.endDate)
448
-        let type = sample.sampleType.identifier
449
-        let source = sample.sourceRevision.source.name
450
-        
451
-        return "\(type)|\(startStr)|\(endStr)|\(source)".addingPercentEncoding(withAllowedCharacters: .alphanumerics) ?? ""
452
-    }
453
-    
454
-    private func standardDeviation(values: [Double]) -> Double {
455
-        let mean = values.reduce(0, +) / Double(values.count)
456
-        let squaredDiffs = values.map { pow($0 - mean, 2) }
457
-        let variance = squaredDiffs.reduce(0, +) / Double(values.count)
458
-        return sqrt(variance)
459
-    }
460
-}
461
-```
462
-
463
-
464
-## 4. Context Monitoring (Background Thread)
465
-
466
-HealthProbe does not sync its own database through iCloud/CloudKit. This service only logs Health/iCloud state as context for later forensic correlation.
467
-
468
-```swift
469
-class ContextMonitor {
470
-    private let modelContext: ModelContext
471
-    private let queue = DispatchQueue(label: "com.healthprobe.sync-monitor", qos: .background)
472
-    
473
-    private var previousHealthCloudState: String = "unknown"
474
-    
475
-    func startMonitoring() {
476
-        queue.async {
477
-            self.monitorContext()
478
-        }
479
-    }
480
-    
481
-    private func monitorContext() {
482
-        // Check iCloud state
483
-        let iCloudToken = FileManager.default.ubiquityIdentityToken
484
-        let currentState = iCloudToken != nil ? "icloud_enabled" : "local_only"
485
-        
486
-        if currentState != previousHealthCloudState {
487
-            logContextChange(from: previousHealthCloudState, to: currentState)
488
-            previousHealthCloudState = currentState
489
-            
490
-            // Schedule archive verification on state change
491
-            DispatchQueue.main.async {
492
-                NotificationCenter.default.post(name: NSNotification.Name("HealthContextChanged"), object: nil)
493
-            }
494
-        }
495
-    }
496
-    
497
-    private func logContextChange(from: String, to: String) {
498
-        let change = ContextStateChange(
499
-            timestamp: Date(),
500
-            previousState: from,
501
-            newState: to,
502
-            details: "iCloud state changed"
503
-        )
504
-        
505
-        do {
506
-            modelContext.insert(change)
507
-            try modelContext.save()
508
-            
509
-            let auditEntry = AuditTrailEntry(
510
-                timestamp: Date(),
511
-                eventType: "health_context_change",
512
-                message: "Health cloud context: \(from) → \(to)",
513
-                context: ["previous": from, "current": to]
514
-            )
515
-            modelContext.insert(auditEntry)
516
-            try modelContext.save()
517
-        } catch {
518
-            print("Error logging context change: \(error)")
519
-        }
520
-    }
521
-}
522
-```
523
-
524
-
525
-## 5. Integration into App Lifecycle
526
-
527
-```swift
528
-@main
529
-struct HealthProbeApp: App {
530
-    @StateObject private var healthKitManager = HealthKitManager.shared
531
-    @StateObject private var contextMonitor: ContextMonitor
532
-    let modelContainer: ModelContainer
533
-    
534
-    init() {
535
-        do {
536
-            modelContainer = try createModelContainer()
537
-            let context = ModelContext(modelContainer)
538
-            _contextMonitor = StateObject(wrappedValue: ContextMonitor(modelContext: context))
539
-        } catch {
540
-            fatalError("Could not initialize model container: \(error)")
541
-        }
542
-    }
543
-    
544
-    var body: some Scene {
545
-        WindowGroup {
546
-            ContentView()
547
-                .modelContainer(modelContainer)
548
-                .onAppear {
549
-                    // Request HealthKit permissions
550
-                    healthKitManager.requestAuthorization { success, error in
551
-                        if success {
552
-                            // Start context monitoring and archive capture
553
-                            contextMonitor.startMonitoring()
554
-                            captureInitialSnapshot()
555
-                        }
556
-                    }
557
-                }
558
-                .onReceive(Timer.publish(every: 3600).autoconnect()) { _ in
559
-                    // Periodic check every hour
560
-                    refreshHealthData()
561
-                }
562
-        }
563
-    }
564
-    
565
-    private func captureInitialSnapshot() {
566
-        // Implement snapshot capture
567
-    }
568
-    
569
-    private func refreshHealthData() {
570
-        // Implement periodic refresh
571
-    }
572
-}
573
-```
574
-
575
-
576
-## 6. Testing Strategy
577
-
578
-### Unit Tests
579
-```swift
580
-class AnomalyDetectorTests: XCTestCase {
581
-    var detector: AnomalyDetector!
582
-    
583
-    override func setUp() {
584
-        super.setUp()
585
-        detector = AnomalyDetector(...)
586
-    }
587
-    
588
-    func testDetectsHistoricalInsertion() {
589
-        // Create sample from 30 days ago
590
-        // Assert: anomaly detected
591
-    }
592
-    
593
-    func testDetectsSilentDeletion() {
594
-        // Create two snapshots, second has fewer records
595
-        // Assert: anomaly detected with correct loss percentage
596
-    }
597
-}
598
-```
599
-
600
-### Integration Tests
601
-- ✅ HealthKit query performance (anchor efficiency)
602
-- ✅ Local archive persistence and recovery
603
-- ✅ SwiftData cache rebuild from archive
604
-- ✅ Background context monitoring accuracy
605
-- ✅ Anomaly detection on real HealthKit data
606
-
607
-
608
-## 7. Performance Considerations
609
-
610
-| Operation | Target | Notes |
611
-|-----------|--------|-------|
612
-| Anchored query | < 5 sec | Background, user perceives delay > 2s |
613
-| Anomaly detection | < 2 sec | Should not block UI |
614
-| SwiftData cache update | < 1 sec | Can run on main thread only after archive work completes |
615
-| Archive write | Background | Stream large imports; never build full high-frequency datasets in memory |
616
-| Background check | < 30 sec | iOS allows 30 min for background fetch |
617
-
618
-
619
-## 8. Deployment Checklist
620
-
621
-- [ ] HealthKit read permissions declared in Info.plist
622
-- [ ] Background Modes enabled ("Background Fetch")
623
-- [ ] SwiftData model migrations tested
624
-- [ ] Local archive schema migrations tested
625
-- [ ] Privacy Policy updated (what data is collected)
626
-- [ ] Accessibility review (VoiceOver, Dynamic Type)
627
-
628
-
629
-*HealthProbe Implementation Guide v1.0 — 2026-05-01*
+0 -419
HealthProbe/Doc/Open Source Publication Guidelines.md
@@ -1,438 +0,0 @@
1
-# HealthProbe – Open Source Publication Guidelines
2
-
3
-**Purpose:** Ensure documentation is accurate, responsible, and suitable for public release  
4
-**Date:** 2026-05-01  
5
-**Status:** Pre-publication review
6
-
7
-
8
-## 1. Key Principles for Open Source
9
-
10
-1. **Neutrality:** Describe *observed behavior*, not conspiracy
11
-2. **Precision:** Distinguish between *facts*, *patterns*, and *theories*
12
-3. **Humility:** Acknowledge unknowns and limitations
13
-4. **Responsibility:** Don't speculate about Apple's intentions
14
-5. **Reproducibility:** All claims must be testable
15
-
16
-
17
-## 2. Content Review – Flagged Items
18
-
19
-### 🔴 HIGH PRIORITY: Reframe Tone
20
-
21
-**Issue 1: Section 2.1 "The September 2025 Mass Data Loss Event"**
22
-
23
-**Current language:**
24
-```
25
-Suspected triggers:
26
-  - Device migration (iCloud sync state transitions)
27
-  - OS upgrade/downgrade cycles
28
-  - Backup restore operations
29
-  - HealthKit database re-indexing
30
-  - iCloud sync divergence
31
-```
32
-
33
-**Problem:** Lists "suspected triggers" without evidence; reads like accusations.
34
-
35
-**Revision for open source:**
36
-```
37
-**Preliminary observations from user reports suggest correlation with:**
38
-  - Device migration or iCloud sync state changes
39
-  - OS updates (particularly iOS 26.x)
40
-  - Backup restore operations
41
-  - Data re-indexing
42
-
43
-**NOTE:** These are patterns observed in reports, not confirmed causal links.
44
-Actual root causes require access to Apple system logs.
45
-```
46
-
47
-
48
-**Issue 2: "Root cause theories" sections**
49
-
50
-**Current language:**
51
-```
52
-**Root cause theories:**
53
-  - iCloud sync restoring from outdated backup
54
-  - Third-party fitness app injecting historical reconstructions
55
-  - HealthKit recovery logic applying retroactive corrections
56
-  - Cross-device sync desynchronization
57
-```
58
-
59
-**Problem:** "Theories" is vague. Some are highly speculative; "third-party apps injecting" sounds accusatory.
60
-
61
-**Revision for open source:**
62
-```
63
-**Possible mechanisms** (listed for documentation, not as conclusions):
64
-  - iCloud sync merging data from outdated backup
65
-  - Legitimate algorithmic recalculation (e.g., HR baseline updates)
66
-  - Data misalignment across multiple devices during sync
67
-  - Timestamp reconciliation during restore operations
68
-
69
-**These possibilities are inferred from observed patterns, not system internals.**
70
-Apple has not confirmed mechanisms.
71
-```
72
-
73
-
74
-### 🟡 MEDIUM PRIORITY: Add Disclaimers
75
-
76
-**Issue 3: "Concrete observed cases" section 2.2**
77
-
78
-**Current:** Lists examples without caveats.
79
-
80
-**Add disclaimer:**
81
-```
82
-## 2.2 Observed Anomalies – Data Note
83
-
84
-⚠️ **IMPORTANT:** These patterns have been observed in user reports and 
85
-HealthProbe testing, but represent a limited dataset. They are NOT 
86
-confirmed bugs, and may have benign explanations:
87
-
88
-- Historical insertions could be legitimate corrections/backfills
89
-- Silent deletions could be user actions or incomplete HealthKit queries
90
-- Duplicates could be transient sync artifacts (self-healing within 24h)
91
-- Divergence could reflect algorithm updates or device recalibration
92
-
93
-HealthProbe documents *observations*, not diagnoses.
94
-```
95
-
96
-
97
-**Issue 4: "Why undetected" section**
98
-
99
-**Current language:**
100
-```
101
-**Why undetected:**
102
-- No notification from Apple Health
103
-- Users discover loss retrospectively (weeks/months later)
104
-- No audit trail to identify exactly *when* or *what* was lost
105
-```
106
-
107
-**Problem:** Reads like Apple is hiding data loss intentionally.
108
-
109
-**Revision:**
110
-```
111
-**Why current mechanisms may not catch this:**
112
-- Health.app provides no built-in audit trail for historical changes
113
-- Data loss is often not immediately obvious (daily view may not change much)
114
-- Users cannot easily compare snapshots over time
115
-- Some anomalies resolve automatically within 24-72 hours (self-healing sync)
116
-```
117
-
118
-
119
-### 🟡 MEDIUM PRIORITY: Soften Certainty Language
120
-
121
-**Issue 5: Executive summary opening**
122
-
123
-**Current:**
124
-```
125
-Apple Health data loss events (confirmed Sept 2025 incident, ongoing sporadic reports)
126
-```
127
-
128
-**Problem:** "Confirmed incident" is too strong without official Apple acknowledgment.
129
-
130
-**Revision:**
131
-```
132
-Reports of Apple Health data loss (September 2025 timeframe, ongoing user reports)
133
-```
134
-
135
-
136
-**Issue 6: Throughout documentation**
137
-
138
-**Replace** these phrases:
139
-| Current | Replace with |
140
-|---------|--------------|
141
-| "Apple Health data loss" | "Reported Apple Health data anomalies" or "User-observed data gaps" |
142
-| "confirmed bug" | "potential issue" or "reported anomaly" |
143
-| "undetected" | "not immediately visible to users" |
144
-| "corrupted" | "inconsistent" or "unexpected state" |
145
-
146
-
147
-### 🟡 MEDIUM PRIORITY: Privacy/Security Section Expansion
148
-
149
-**Current Limitation:** Section 10 exists but is brief.
150
-
151
-**Add to "Risks & Limitations" document:**
152
-
153
-```markdown
154
-## Important Caveats for Open Source Users
155
-
156
-### What HealthProbe Cannot Know
157
-- Whether data loss is a bug, user action, or legitimate system operation
158
-- Exact root cause (only observations, not system internals)
159
-- Cross-device behavior (requires manual export from multiple devices)
160
-- iCloud backend state (only observes local HealthKit)
161
-
162
-### What Users Should Understand
163
-- **False positives expected:** Some "anomalies" may resolve automatically
164
-- **Incomplete record:** Uninstalling HealthProbe loses all audit history
165
-- **No guarantees:** HealthProbe itself could have bugs; don't rely solely on it
166
-- **Comparison not validation:** Snapshot comparison detects differences, not errors
167
-
168
-### Recommended Usage
169
-- Use as **documentation tool**, not as truth source
170
-- Export data regularly as backup
171
-- Compare findings with iCloud.com Health export when possible
172
-- Report patterns, not individual anomalies, to Apple
173
-```
174
-
175
-
176
-## 3. Content Audit Checklist
177
-
178
-Before release, verify:
179
-
180
-### Documentation Quality
181
-- [ ] Every claim is either observable fact OR clearly labeled as theory/speculation
182
-- [ ] "Suspected," "possible," "may" used where causality unclear
183
-- [ ] Root causes described as inferences, not conclusions
184
-- [ ] No language implying Apple intentionally hides issues
185
-- [ ] Disclaimers present before speculative sections
186
-
187
-### Technical Accuracy
188
-- [ ] HealthKit API descriptions verified against Apple docs
189
-- [ ] Code examples tested/executable
190
-- [ ] Performance claims have measurement basis
191
-- [ ] Known limitations documented explicitly
192
-
193
-### Privacy Compliance
194
-- [ ] No raw health sample data in examples
195
-- [ ] HealthProbe CloudKit/iCloud sync is not described as a product goal
196
-- [ ] User consent documented
197
-- [ ] Data retention policy clear
198
-- [ ] No tracking/analytics hidden in code
199
-
200
-### Responsible Disclosure
201
-- [ ] References to Apple issues are neutral, not accusatory
202
-- [ ] Links to DearApple properly contextualized
203
-- [ ] No suggestion of intentional misconduct by Apple
204
-- [ ] Recommendations for bug reporting included
205
-
206
-
207
-## 4. Specific Revisions Needed
208
-
209
-### File: "Complete Specification & Motivations.md"
210
-
211
-**Location:** Section 2.1 (3 major edits)
212
-```
213
-CHANGE: "Large-scale loss of Apple Health records reported"
214
-TO: "Reports of large-scale Apple Health data anomalies"
215
-
216
-CHANGE: Entire "Why undetected" subsection
217
-TO: [See Issue #4 above]
218
-
219
-CHANGE: All "suspected triggers" with confidence qualifier
220
-TO: [See Issue #1 above]
221
-```
222
-
223
-**Location:** Section 2.2 (add disclaimer at top)
224
-```
225
-ADD: [See Issue #4 above - the full disclaimer block]
226
-```
227
-
228
-
229
-### File: "Forensics & Limitations.md"
230
-
231
-**Location:** Section 1 "Known Limitations" (add)
232
-```
233
-ADD: Section 1.4 "Data Interpretation"
234
-
235
-1.4 Data Interpretation Risks
236
-
237
-HealthProbe documents observations, not diagnoses:
238
-
239
-| Finding | What it means | What it does NOT mean |
240
-|---------|---------------|----------------------|
241
-| **Silent deletion detected** | Samples in snapshot A absent in B | Data is corrupted or lost forever |
242
-| **Historical insertion** | Sample has old date, recent first-seen | Apple maliciously backdated data |
243
-| **Duplicates found** | Multiple identical samples present | System is broken; may auto-deduplicate |
244
-| **Divergence trend** | Metric value changing over time | Algorithm bug; could be calibration or update |
245
-
246
-Always validate findings before drawing conclusions.
247
-```
248
-
249
-**Location:** Section 2.1 "Privacy Risks"
250
-```
251
-CHANGE: "CRITICAL: user's personal health history exposed"
252
-TO: "CRITICAL RISK IF: raw health data were exfiltrated"
253
-(Emphasis: HealthProbe doesn't do this)
254
-```
255
-
256
-
257
-### File: "Implementation Guide.md"
258
-
259
-**Add Section:** 0.5 "Ethical Implementation Notes"
260
-
261
-```markdown
262
-## 0.5 Ethical Implementation Notes
263
-
264
-As an open-source health monitoring tool, HealthProbe should:
265
-
266
-1. **Never store/transmit raw health data**
267
-   - Code review required before adding any health sample export
268
-   
269
-2. **Always ask before background operations**
270
-   - Background fetch enabled only with user consent
271
-   - Notify user of sync frequency in settings
272
-   
273
-3. **Respect user autonomy**
274
-   - Easy to disable all monitoring
275
-   - Easy to export/delete all data
276
-   - Audit trail visible to user (not hidden)
277
-
278
-4. **Accept limitations gracefully**
279
-   - Don't claim certainty you don't have
280
-   - Document where you're guessing
281
-   - Encourage validation via Apple's tools
282
-```
283
-
284
-
285
-## 5. External References & Linking
286
-
287
-### How to Reference DearApple Issues
288
-
289
-**Current:** Direct references to issues as "bugs"
290
-
291
-**Revision:** Neutral framing
292
-
293
-```markdown
294
-**BAD:**
295
-"See DearApple Issue #001 – mass data loss bug"
296
-
297
-**GOOD:**
298
-"See DearApple Issue #001 – documented reports of Apple Health data loss"
299
-
300
-**BETTER:**
301
-"For additional context on the observed anomalies, see [DearApple Issue #001]
302
-(https://github.com/overbog/dear-apple/issues/...) which collects user reports
303
-of similar patterns."
304
-```
305
-
306
-
307
-## 6. README & Contributing Guidelines
308
-
309
-**Add file:** `CONTRIBUTING.md` (for open source)
310
-
311
-```markdown
312
-# Contributing to HealthProbe
313
-
314
-## Data Integrity First
315
-
316
-When reporting anomalies or contributing code:
317
-
318
-1. **Distinguish facts from theories**
319
-   - Observed: "On 2026-03-15, step count dropped from 5000 to 2500"
320
-   - Theory: "This might be due to iCloud sync"
321
-   - Avoid: "iCloud sync corrupted my data"
322
-
323
-2. **Include evidence**
324
-   - Screenshots of HealthProbe audit trail
325
-   - Export from Health.app for comparison
326
-   - Device model, iOS version, app version
327
-
328
-3. **Respect privacy**
329
-   - Redact dates if identifying
330
-   - Remove specific health values if sensitive
331
-   - Mention: (e.g., "10 days of step data" not exact values)
332
-
333
-4. **Acknowledge unknowns**
334
-   - "I observed X, but I don't know if it's a bug or expected behavior"
335
-
336
-## Code Standards
337
-
338
-- Read-only HealthKit operations only
339
-- No exfiltration of raw health data
340
-- User consent required before new data collection
341
-- Audit trail for all operations
342
-```
343
-
344
-
345
-## 7. Release Checklist
346
-
347
-Before tagging v1.0.0:
348
-
349
-- [ ] All flagged content revised (Issues #1-6)
350
-- [ ] Added disclaimers in 3 places (Issues #3, #4, #7)
351
-- [ ] Softened certainty language throughout (Issue #5)
352
-- [ ] Privacy/Security section expanded (Issue #4)
353
-- [ ] Added "Ethical Implementation" section to code guide
354
-- [ ] New CONTRIBUTING.md with data integrity guidelines
355
-- [ ] License file present (recommend: MIT or Apache 2.0)
356
-- [ ] README includes clear link to DearApple context
357
-- [ ] Code examples tested and run-verified
358
-- [ ] No hardcoded debugging/logging left in
359
-- [ ] Legal review of liability disclaimers
360
-
361
-
362
-## 8. Statement of Purpose (For README)
363
-
364
-```markdown
365
-## Purpose
366
-
367
-HealthProbe is a **documentation and monitoring tool** designed to help users 
368
-understand their Apple HealthKit data state over time.
369
-
370
-**It is NOT:**
371
-- A diagnostic tool (cannot confirm bugs)
372
-- A data recovery tool
373
-- A security auditing tool
374
-- A replacement for Apple's Health app
375
-
376
-**It IS:**
377
-- A local audit trail (what changed, when)
378
-- An anomaly detector (unusual patterns documented)
379
-- A forensic aid (exportable evidence for bug reports)
380
-- Privacy-respecting (all local, no exfiltration)
381
-
382
-**Appropriate uses:**
383
-- Personal monitoring of your own health data
384
-- Documenting anomalies to report to Apple
385
-- Researching HealthKit behavior (with proper ethics)
386
-- Contributing data to DearApple investigation (with consent)
387
-
388
-**Inappropriate uses:**
389
-- Claiming definitive proof of bugs without Apple confirmation
390
-- Identifying or tracking other users
391
-- Replacing professional medical advice
392
-- Distributing unvalidated health data claims
393
-```
394
-
395
-
396
-## 9. Review Before Publish
397
-
398
-Suggested external reviewers:
399
-1. **Apple developer relations** — verify no confidential info disclosed
400
-2. **Privacy researcher** — check data handling assumptions
401
-3. **Legal counsel** — health data liability disclaimers
402
-4. **DearApple maintainers** — coordinate messaging
403
-
404
-
405
-## Summary: Key Changes for v1.0.0 Public Release
406
-
407
-| Issue | Severity | Action | Effort |
408
-|-------|----------|--------|--------|
409
-| Tone: "confirmed bug" → "reported anomaly" | 🔴 HIGH | S&R in 3 docs | 30 min |
410
-| Add data interpretation disclaimers | 🟡 MED | New section in Forensics | 45 min |
411
-| Soften causality language | 🟡 MED | S&R throughout | 20 min |
412
-| Add ethics section to Implementation | 🟡 MED | New section | 30 min |
413
-| Create CONTRIBUTING.md | 🟡 MED | New file | 30 min |
414
-| Final legal/privacy review | 🟡 MED | External | 2-4 hours |
415
-
416
-**Total estimated effort:** 3-5 hours to make publication-ready
417
-
418
-
419
-*HealthProbe – Open Source Governance v1.0*
+90 -73
HealthProbe/Doc/README.md
@@ -1,106 +1,118 @@
1 1
 # HealthProbe Documentation Index
2 2
 
3
-## Quick Navigation
3
+**Canonical documentation root:** `HealthProbe/Doc/`
4 4
 
5
-### 📋 Core Documentation
5
+This directory is the only place for substantive HealthProbe documentation. Root-level `AGENTS.md` and `CLAUDE.md` are bootstrap pointers only, kept so agent tools can find this index.
6 6
 
7
-1. **[Complete Specification & Motivations](HealthProbe%20–%20Complete%20Specification%20&%20Motivations.md)**
8
-   - Complete system design
9
-   - Concrete observed cases (Sept 2025 data loss + ongoing issues)
10
-   - Motivations for each feature
11
-   - Technical architecture & threading model
12
-   - Privacy & security guarantees
7
+## Current Product Direction
13 8
 
14
-2. **[MVP Specification](HealthProbe%20iOS%20–%20Specification%20(MVP).md)** *(original)*
15
-   - Feature scope for iOS 1.0
16
-   - Core HealthKit monitoring approach
9
+HealthProbe is a single-device, local Health DB Time Machine:
10
+- capture selected HealthKit-accessible observations over time;
11
+- reconstruct how the local Health database looked at a chosen observation date;
12
+- explain local changes with consolidation-aware labels;
13
+- preserve recovery-compatible archives and exports;
14
+- keep the iOS app read-only with respect to HealthKit and iOS backups.
17 15
 
16
+Target storage architecture:
17
+- SQLite archive/analysis database is the source of truth;
18
+- observations are stored differentially, not as recurring complete snapshots;
19
+- Core Data is the rebuildable UI/report cache for expensive counts and summaries;
20
+- SwiftData is legacy/prototype only and should not be expanded;
21
+- existing prototype/test databases are disposable and may be reset for archive v2.
18 22
 
19
-## Project Status
23
+## How To Point Agents
20 24
 
21
-| Component | Status | Notes |
22
-|-----------|--------|-------|
23
-| **iOS App Foundation** | ✅ Started | SwiftUI + SwiftData scaffolding in place |
24
-| **Core Architecture** | 📋 Designed | See "Complete Specification" |
25
-| **HealthKit Integration** | ⏳ Pending | Implement anchored queries, observer queries |
26
-| **Anomaly Detection** | 📋 Designed | Logic documented, pending implementation |
27
-| **Sync Context Logging** | 📋 Designed | Log Health/iCloud state as forensic context; do not sync HealthProbe data via iCloud |
28
-| **UI Dashboard** | ⏳ Pending | Wireframes in Complete Specification |
29
-| **Local Archive Store** | 📋 Designed | Robust on-device archive is the source of truth |
30
-| **Reports & Point Exports** | 📋 Designed | Export only selected reports/record tables, not a complete routine dump |
31
-| **macOS Companion** | 🔄 Future | Post-MVP enhancement |
25
+Use the chapter map below. Send agents to the narrowest document that matches their task.
32 26
 
27
+| If the task is about... | Send the agent to... |
28
+|-------------------------|----------------------|
29
+| Overall product scope, non-goals, future parking lot | [`01-product/Product-Specification.md`](01-product/Product-Specification.md) |
30
+| MVP behavior and out-of-scope boundaries | [`01-product/MVP-Specification.md`](01-product/MVP-Specification.md) |
31
+| Database design, archive schema, differential storage, SQL analysis | [`02-architecture/Database-Design.md`](02-architecture/Database-Design.md) |
32
+| Core Data cache schema and invalidation | [`02-architecture/Core-Data-Cache-Design.md`](02-architecture/Core-Data-Cache-Design.md) |
33
+| Export formats, manifests, streaming contract | [`02-architecture/Export-Specification.md`](02-architecture/Export-Specification.md) |
34
+| Implementation workflow, HealthKit capture, exports, tests | [`02-architecture/Implementation-Guide.md`](02-architecture/Implementation-Guide.md) |
35
+| Forensic limits, export meaning, recovery compatibility | [`01-product/Forensics-Limitations.md`](01-product/Forensics-Limitations.md) |
36
+| General agent ownership and handoff rules | [`00-agent-guides/AGENTS.md`](00-agent-guides/AGENTS.md) |
37
+| SwiftUI/UI work | [`00-agent-guides/CLAUDE.md`](00-agent-guides/CLAUDE.md) |
38
+| Refactoring milestones and sequencing | [`04-project/Refactoring-Plan.md`](04-project/Refactoring-Plan.md) |
39
+| Project status and refactoring priorities | [`04-project/IMPLEMENTATION_STATUS.md`](04-project/IMPLEMENTATION_STATUS.md) |
40
+| Historical UI notes only | [`99-archive/`](99-archive/) |
33 41
 
34
-## Motivation: Why HealthProbe Exists
42
+## Chapters
35 43
 
36
-**The Problem:** Apple Health data loss events (confirmed September 2025, ongoing sporadic reports) lack any detection mechanism. Users don't know their data has been lost, corrupted, or silently modified.
44
+### 00 Agent Guides
37 45
 
38
-**Concrete Examples:**
39
-- **Historical insertions:** Workouts from 6+ months ago suddenly appearing
40
-- **Silent deletions:** Multi-week gaps with no deletion notification
41
-- **Duplicates:** Same workout syncing multiple times across devices
42
-- **Divergence:** Metrics (steps, energy, HR) drifting without user action
46
+- [`00-agent-guides/AGENTS.md`](00-agent-guides/AGENTS.md)
47
+  Multi-agent development guide, ownership boundaries, protocol contracts, and current architecture decisions.
43 48
 
44
-See **Complete Specification § 2** for detailed observed cases and forensic implications.
49
+- [`00-agent-guides/CLAUDE.md`](00-agent-guides/CLAUDE.md)
50
+  UI/SwiftUI-specific instructions aligned with the current Time Machine objective.
45 51
 
52
+### 01 Product
46 53
 
47
-## Next Steps
54
+- [`01-product/Product-Specification.md`](01-product/Product-Specification.md)
55
+  Full current product specification, motivations, architecture decision record, and non-goals.
48 56
 
49
-1. **Implement HealthKit Integration** (`Sources/HealthKitMonitor.swift`)
50
-   - `HKAnchoredObjectQuery` for efficient incremental queries
51
-   - `HKObserverQuery` for real-time change notifications
52
-   - Track: Workouts, Heart Rate, Steps, Sleep, Activity Summary
57
+- [`01-product/MVP-Specification.md`](01-product/MVP-Specification.md)
58
+  MVP scope for the iOS app, including read-only behavior, differential storage expectations, exports, and out-of-scope restore/republication.
53 59
 
54
-2. **Build Anomaly Detection** (`Sources/AnomalyDetector.swift`)
55
-   - Historical insertion detection
56
-   - Silent deletion detection
57
-   - Duplicate fingerprinting
58
-   - Divergence trend analysis
60
+- [`01-product/Forensics-Limitations.md`](01-product/Forensics-Limitations.md)
61
+  What the app can and cannot prove, how to interpret changes, and what recovery-compatible exports should preserve.
59 62
 
60
-3. **Implement Local Archive Store** (`Sources/ArchiveStore.swift`)
61
-   - Single robust local database for all archived samples
62
-   - Preserve cross-type relationships, sources, devices, metadata, and fingerprints
63
-   - Keep SwiftData as UI/cache/settings/history only
63
+### 02 Architecture
64 64
 
65
-4. **Create UI Dashboard** (`Views/HealthStatusView.swift`)
66
-   - Show current health status
67
-   - Display active alerts
68
-   - Timeline of anomalies
69
-   - Audit trail viewer
65
+- [`02-architecture/Database-Design.md`](02-architecture/Database-Design.md)
66
+  Canonical database design. Start here for SQLite archive v2, Core Data cache boundaries, differential storage, point-in-time reconstruction, SQL diffs, recovery-compatible exports, reset policy, future migrations, and DB tests.
70 67
 
68
+- [`02-architecture/Core-Data-Cache-Design.md`](02-architecture/Core-Data-Cache-Design.md)
69
+  Target Core Data cache schema, invalidation rules, rebuild order, and legacy-device cache behavior.
71 70
 
72
-## Key Design Decisions
71
+- [`02-architecture/Export-Specification.md`](02-architecture/Export-Specification.md)
72
+  JSON/CSV export envelope, canonical manifest hashing, item hashing, streaming/cancellation behavior, and provenance warnings.
73 73
 
74
-| Decision | Rationale |
75
-|----------|-----------|
76
-| **Read-only + HealthKit** | Never modify health data; pure observation only |
77
-| **Local-first storage** | Full functionality without internet; privacy-first |
78
-| **Archive DB as truth** | Store HealthKit samples and metadata in a robust local database, not split per data type |
79
-| **SwiftData as UI cache** | Keep precomputed values, settings, logs, and history for visualization only |
80
-| **Anchored queries** | Minimize HealthKit load, reduce battery impact |
81
-| **No HealthProbe iCloud sync** | Device HealthKit databases evolve independently; CloudKit sync adds complexity without proven forensic benefit |
82
-| **Selective forensic capture** | When Health/iCloud rewrites or downsamples historical data, counts alone are insufficient; HealthProbe archives complete samples for selected types in one local store |
74
+- [`02-architecture/Implementation-Guide.md`](02-architecture/Implementation-Guide.md)
75
+  Technical implementation guide for HealthKit capture flow, change explanation, exports, context logging, UI integration, and testing. It references the canonical database design instead of redefining it.
83 76
 
77
+### 03 UI
78
+
79
+- [`03-ui/README.md`](03-ui/README.md)
80
+  Entry point for active UI guidance and links to archived UI notes.
81
+
82
+### 04 Project
83
+
84
+- [`04-project/Refactoring-Plan.md`](04-project/Refactoring-Plan.md)
85
+  Checkable milestone plan for the database-led refactor from prototype architecture to SQLite archive v2 + Core Data cache.
86
+
87
+- [`04-project/IMPLEMENTATION_STATUS.md`](04-project/IMPLEMENTATION_STATUS.md)
88
+  Current implementation status and refactoring priorities.
89
+
90
+### 99 Archive
91
+
92
+Files in [`99-archive/`](99-archive/) are historical implementation notes. They are kept for context only and are not product requirements.
93
+
94
+## Removed / Replaced Objectives
95
+
96
+These are not current objectives:
97
+- HealthProbe CloudKit/iCloud sync;
98
+- cross-device record-by-record comparison;
99
+- count-drop-as-data-loss alerting;
100
+- notification-led monitoring;
101
+- community reporting or open-source publication commitments;
102
+- macOS companion as committed product scope;
103
+- in-app backup transplant, restore, or HealthKit re-publication;
104
+- SwiftData as target persistence foundation;
105
+- backward compatibility with prototype/test databases;
106
+- recurring complete snapshots for large HealthKit datasets.
84 107
 
85
-## Document Versions
108
+## Document Maintenance Rules
86 109
 
87
-- **v1.0** — 2026-05-01 — Initial comprehensive specification
88
-  - Concrete cases from DearApple
89
-  - Full technical architecture
90
-  - MVP feature scope + future roadmap
91
-- **v1.1** — 2026-05-17 — Objective extension (post-findings)
92
-  - New observed behavior: Apple-side consolidation/downsampling can rewrite historical samples (not just add/delete)
93
-  - HealthProbe scope extended: from “counter of records” → “forensic backup agent” (local-only archives for selected types)
94
-- **v1.2** — 2026-05-18 — Storage direction update
95
-  - Robust single local archive store becomes the source of truth
96
-  - SwiftData is limited to precomputed UI data, settings, logs, and history
97
-  - CloudKit/iCloud sync removed from product goals; reports and point exports replace routine complete export ambitions
110
+1. Add new substantive docs only under `HealthProbe/Doc/`.
111
+2. Update this index whenever a document is added, renamed, archived, or removed.
112
+3. If a document becomes stale but may still be useful, move it to `99-archive/` and add a warning header.
113
+4. Do not leave old copies in the repository root.
114
+5. Product scope changes must update product docs first, then implementation docs, then code.
98 115
 
99 116
 ---
100 117
 
101
-*HealthProbe: Guarding the integrity of your health data.*
118
+*HealthProbe: local HealthKit observation history, recovery-compatible archives, no in-app restore.*
+76 -0
HealthProbe/Services/Protocols/HealthArchiveStore.swift
@@ -19,13 +19,62 @@ struct HealthArchiveWriteSummary: Equatable, Sendable {
19 19
 struct HealthArchiveRecordRequest: Equatable, Sendable {
20 20
     let sampleTypeIdentifier: String?
21 21
     let fingerprints: Set<String>
22
+    let disappearedOnly: Bool
23
+    let firstSeenAfter: Date?
24
+    let firstSeenBefore: Date?
25
+    let afterCursor: RecordCursor?
22 26
     let limit: Int?
27
+
28
+    init(
29
+        sampleTypeIdentifier: String? = nil,
30
+        fingerprints: Set<String> = [],
31
+        disappearedOnly: Bool = false,
32
+        firstSeenAfter: Date? = nil,
33
+        firstSeenBefore: Date? = nil,
34
+        afterCursor: RecordCursor? = nil,
35
+        limit: Int? = nil
36
+    ) {
37
+        self.sampleTypeIdentifier = sampleTypeIdentifier
38
+        self.fingerprints = fingerprints
39
+        self.disappearedOnly = disappearedOnly
40
+        self.firstSeenAfter = firstSeenAfter
41
+        self.firstSeenBefore = firstSeenBefore
42
+        self.afterCursor = afterCursor
43
+        self.limit = limit
44
+    }
45
+}
46
+
47
+struct RecordCursor: Equatable, Sendable {
48
+    let startDate: Date
49
+    let strictFingerprint: String
23 50
 }
24 51
 
25 52
 struct HealthArchiveReportRequest: Equatable, Sendable {
26 53
     let reportID: UUID
27 54
     let title: String
28 55
     let includedFingerprints: Set<String>
56
+    let typeIdentifierFilter: String?
57
+    let disappearedOnly: Bool
58
+    let firstSeenAfter: Date?
59
+    let firstSeenBefore: Date?
60
+
61
+    init(
62
+        reportID: UUID,
63
+        title: String,
64
+        includedFingerprints: Set<String> = [],
65
+        typeIdentifierFilter: String? = nil,
66
+        disappearedOnly: Bool = false,
67
+        firstSeenAfter: Date? = nil,
68
+        firstSeenBefore: Date? = nil
69
+    ) {
70
+        self.reportID = reportID
71
+        self.title = title
72
+        self.includedFingerprints = includedFingerprints
73
+        self.typeIdentifierFilter = typeIdentifierFilter
74
+        self.disappearedOnly = disappearedOnly
75
+        self.firstSeenAfter = firstSeenAfter
76
+        self.firstSeenBefore = firstSeenBefore
77
+    }
29 78
 }
30 79
 
31 80
 struct ArchivedHealthRecord: Identifiable, Equatable, Sendable, Encodable {
@@ -40,4 +89,31 @@ struct ArchivedHealthRecord: Identifiable, Equatable, Sendable, Encodable {
40 89
     let lastSeenAt: Date?
41 90
     let lastVerifiedAt: Date?
42 91
     let disappearedAt: Date?
92
+
93
+    // Value fields
94
+    let valueKind: String?  // "quantity", "category", "workout", nil
95
+    let value: Double?
96
+    let unit: String?
97
+    let categoryValue: Int?
98
+    let workoutActivityType: Int?
99
+    let durationSeconds: Double?
100
+
101
+    // Source/device metadata
102
+    let sourceName: String?
103
+    let sourceBundleIdentifier: String?
104
+    let deviceName: String?
105
+
106
+    // Display helper
107
+    var displayValue: String? {
108
+        if let value, let unit {
109
+            return "\(String(format: "%.1f", value)) \(unit)"
110
+        }
111
+        if let categoryValue {
112
+            return "Category: \(categoryValue)"
113
+        }
114
+        if let durationSeconds {
115
+            return String(format: "%.1f min", durationSeconds / 60.0)
116
+        }
117
+        return nil
118
+    }
43 119
 }
+431 -6
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -6,12 +6,14 @@ private enum SQLiteHealthArchiveStoreError: Error {
6 6
     case openFailed(String)
7 7
     case prepareFailed(String)
8 8
     case stepFailed(String)
9
+    case incompatibleSchema(Int)
9 10
     case exportEncodingFailed
10 11
 }
11 12
 
12 13
 // Interface updated 2026-05-18 — see AGENTS.md
13 14
 actor SQLiteHealthArchiveStore: HealthArchiveStore {
14 15
     static let shared = SQLiteHealthArchiveStore()
16
+    nonisolated private static let archiveSchemaVersion = 2
15 17
 
16 18
     private let databaseURL: URL
17 19
     private var didPrepareSchema = false
@@ -93,11 +95,25 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
93 95
         if !request.fingerprints.isEmpty {
94 96
             clauses.append("strict_fingerprint IN (\(Array(repeating: "?", count: request.fingerprints.count).joined(separator: ",")))")
95 97
         }
98
+        if request.disappearedOnly {
99
+            clauses.append("disappeared_at IS NOT NULL")
100
+        }
101
+        if request.firstSeenAfter != nil {
102
+            clauses.append("first_seen_at >= ?")
103
+        }
104
+        if request.firstSeenBefore != nil {
105
+            clauses.append("first_seen_at <= ?")
106
+        }
107
+        if request.afterCursor != nil {
108
+            clauses.append("(start_date > ? OR (start_date = ? AND strict_fingerprint > ?))")
109
+        }
96 110
         let whereClause = clauses.isEmpty ? "" : "WHERE \(clauses.joined(separator: " AND "))"
97 111
         let limitClause = request.limit.map { "LIMIT \(max($0, 0))" } ?? ""
98 112
         let sql = """
99 113
         SELECT sample_uuid_hash, type_identifier, strict_fingerprint, semantic_fingerprint,
100
-               start_date, end_date, first_seen_at, last_seen_at, last_verified_at, disappeared_at
114
+               start_date, end_date, first_seen_at, last_seen_at, last_verified_at, disappeared_at,
115
+               value_kind, value, unit, category_value, workout_activity_type, duration_seconds,
116
+               source_name, source_bundle_identifier, device_name
101 117
         FROM archive_samples
102 118
         \(whereClause)
103 119
         ORDER BY start_date ASC, strict_fingerprint ASC
@@ -114,6 +130,22 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
114 130
                 bindText(fingerprint, to: index, in: statement)
115 131
                 index += 1
116 132
             }
133
+            if let firstSeenAfter = request.firstSeenAfter {
134
+                sqlite3_bind_double(statement, index, firstSeenAfter.timeIntervalSinceReferenceDate)
135
+                index += 1
136
+            }
137
+            if let firstSeenBefore = request.firstSeenBefore {
138
+                sqlite3_bind_double(statement, index, firstSeenBefore.timeIntervalSinceReferenceDate)
139
+                index += 1
140
+            }
141
+            if let cursor = request.afterCursor {
142
+                sqlite3_bind_double(statement, index, cursor.startDate.timeIntervalSinceReferenceDate)
143
+                index += 1
144
+                sqlite3_bind_double(statement, index, cursor.startDate.timeIntervalSinceReferenceDate)
145
+                index += 1
146
+                bindText(cursor.strictFingerprint, to: index, in: statement)
147
+                index += 1
148
+            }
117 149
 
118 150
             var records: [ArchivedHealthRecord] = []
119 151
             while sqlite3_step(statement) == SQLITE_ROW {
@@ -128,7 +160,16 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
128 160
                     firstSeenAt: columnDate(statement, 6) ?? Date(timeIntervalSinceReferenceDate: 0),
129 161
                     lastSeenAt: columnDate(statement, 7),
130 162
                     lastVerifiedAt: columnDate(statement, 8),
131
-                    disappearedAt: columnDate(statement, 9)
163
+                    disappearedAt: columnDate(statement, 9),
164
+                    valueKind: columnText(statement, 10),
165
+                    value: columnDouble(statement, 11),
166
+                    unit: columnText(statement, 12),
167
+                    categoryValue: columnInt(statement, 13),
168
+                    workoutActivityType: columnInt(statement, 14),
169
+                    durationSeconds: columnDouble(statement, 15),
170
+                    sourceName: columnText(statement, 16),
171
+                    sourceBundleIdentifier: columnText(statement, 17),
172
+                    deviceName: columnText(statement, 18)
132 173
                 ))
133 174
             }
134 175
             return records
@@ -136,11 +177,15 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
136 177
     }
137 178
 
138 179
     func exportReport(_ request: HealthArchiveReportRequest) async throws -> URL {
139
-        let records = try await records(for: HealthArchiveRecordRequest(
140
-            sampleTypeIdentifier: nil,
180
+        let recordRequest = HealthArchiveRecordRequest(
181
+            sampleTypeIdentifier: request.typeIdentifierFilter,
141 182
             fingerprints: request.includedFingerprints,
183
+            disappearedOnly: request.disappearedOnly,
184
+            firstSeenAfter: request.firstSeenAfter,
185
+            firstSeenBefore: request.firstSeenBefore,
142 186
             limit: nil
143
-        ))
187
+        )
188
+        let records = try await records(for: recordRequest)
144 189
         let payload = HealthArchiveReportPayload(
145 190
             reportID: request.reportID,
146 191
             title: request.title,
@@ -173,6 +218,333 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
173 218
         guard !didPrepareSchema else { return }
174 219
         try execute("PRAGMA journal_mode = WAL", db: db)
175 220
         try execute("PRAGMA foreign_keys = ON", db: db)
221
+
222
+        let existingVersion = try archiveSchemaVersionIfPresent(db)
223
+        if let existingVersion, existingVersion > Self.archiveSchemaVersion {
224
+            throw SQLiteHealthArchiveStoreError.incompatibleSchema(existingVersion)
225
+        }
226
+        if existingVersion != Self.archiveSchemaVersion {
227
+            let needsReset = existingVersion != nil ? true : try hasUserTables(db)
228
+            if needsReset {
229
+                try resetPrototypeSchema(db)
230
+            }
231
+        }
232
+
233
+        try createArchiveV2Schema(db)
234
+        try seedArchiveMetadata(db)
235
+        didPrepareSchema = true
236
+    }
237
+
238
+    private func archiveSchemaVersionIfPresent(_ db: OpaquePointer?) throws -> Int? {
239
+        guard try tableExists("archive_metadata", db: db) else { return nil }
240
+        let sql = "SELECT value FROM archive_metadata WHERE key = 'schema_version' LIMIT 1"
241
+        return try withStatement(sql, db: db) { statement in
242
+            guard sqlite3_step(statement) == SQLITE_ROW,
243
+                  let value = columnText(statement, 0) else {
244
+                return nil
245
+            }
246
+            return Int(value)
247
+        }
248
+    }
249
+
250
+    private func hasUserTables(_ db: OpaquePointer?) throws -> Bool {
251
+        let sql = """
252
+        SELECT name
253
+        FROM sqlite_master
254
+        WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
255
+        LIMIT 1
256
+        """
257
+        return try withStatement(sql, db: db) { statement in
258
+            sqlite3_step(statement) == SQLITE_ROW
259
+        }
260
+    }
261
+
262
+    private func tableExists(_ tableName: String, db: OpaquePointer?) throws -> Bool {
263
+        let sql = """
264
+        SELECT 1
265
+        FROM sqlite_master
266
+        WHERE type = 'table' AND name = ?
267
+        LIMIT 1
268
+        """
269
+        return try withStatement(sql, db: db) { statement in
270
+            bindText(tableName, to: 1, in: statement)
271
+            return sqlite3_step(statement) == SQLITE_ROW
272
+        }
273
+    }
274
+
275
+    private func resetPrototypeSchema(_ db: OpaquePointer?) throws {
276
+        // Prototype/test installs are disposable for archive v2. Future real archives
277
+        // must use explicit migrations instead of destructive reset.
278
+        try execute("PRAGMA foreign_keys = OFF", db: db)
279
+        for objectName in try schemaObjectNames(types: ["view", "trigger"], db: db) {
280
+            try execute("DROP \(objectName.kind.uppercased()) IF EXISTS \(quotedIdentifier(objectName.name))", db: db)
281
+        }
282
+        for tableName in try schemaObjectNames(types: ["table"], db: db) {
283
+            try execute("DROP TABLE IF EXISTS \(quotedIdentifier(tableName.name))", db: db)
284
+        }
285
+        try execute("PRAGMA foreign_keys = ON", db: db)
286
+    }
287
+
288
+    private func schemaObjectNames(types: [String], db: OpaquePointer?) throws -> [(kind: String, name: String)] {
289
+        let typeList = types.map { "'\($0)'" }.joined(separator: ",")
290
+        let sql = """
291
+        SELECT type, name
292
+        FROM sqlite_master
293
+        WHERE type IN (\(typeList)) AND name NOT LIKE 'sqlite_%'
294
+        ORDER BY type, name
295
+        """
296
+        return try withStatement(sql, db: db) { statement in
297
+            var names: [(kind: String, name: String)] = []
298
+            while sqlite3_step(statement) == SQLITE_ROW {
299
+                guard let kind = columnText(statement, 0),
300
+                      let name = columnText(statement, 1) else {
301
+                    continue
302
+                }
303
+                names.append((kind, name))
304
+            }
305
+            return names
306
+        }
307
+    }
308
+
309
+    private func createArchiveV2Schema(_ db: OpaquePointer?) throws {
310
+        try execute("""
311
+        CREATE TABLE IF NOT EXISTS schema_migrations (
312
+            version INTEGER PRIMARY KEY,
313
+            applied_at REAL NOT NULL,
314
+            description TEXT NOT NULL
315
+        )
316
+        """, db: db)
317
+        try execute("""
318
+        CREATE TABLE IF NOT EXISTS archive_metadata (
319
+            key TEXT PRIMARY KEY,
320
+            value TEXT NOT NULL
321
+        )
322
+        """, db: db)
323
+        try execute("""
324
+        CREATE TABLE IF NOT EXISTS device_chains (
325
+            id INTEGER PRIMARY KEY,
326
+            device_chain_hash TEXT NOT NULL UNIQUE,
327
+            created_at REAL NOT NULL,
328
+            recovered_from_keychain INTEGER NOT NULL DEFAULT 0
329
+        )
330
+        """, db: db)
331
+        try execute("""
332
+        CREATE TABLE IF NOT EXISTS observations (
333
+            id INTEGER PRIMARY KEY,
334
+            device_chain_id INTEGER NOT NULL REFERENCES device_chains(id),
335
+            observed_at REAL NOT NULL,
336
+            started_at REAL,
337
+            ended_at REAL,
338
+            status TEXT NOT NULL,
339
+            trigger_reason TEXT NOT NULL,
340
+            app_version TEXT,
341
+            os_version TEXT,
342
+            time_zone_identifier TEXT,
343
+            time_zone_seconds_from_gmt INTEGER,
344
+            schema_version INTEGER NOT NULL,
345
+            selected_type_set_hash TEXT,
346
+            notes TEXT
347
+        )
348
+        """, db: db)
349
+        try execute("CREATE INDEX IF NOT EXISTS idx_observations_device_time ON observations(device_chain_id, observed_at)", db: db)
350
+        try execute("""
351
+        CREATE TABLE IF NOT EXISTS sample_types (
352
+            id INTEGER PRIMARY KEY,
353
+            type_identifier TEXT NOT NULL UNIQUE,
354
+            display_name TEXT,
355
+            category TEXT
356
+        )
357
+        """, db: db)
358
+        try execute("""
359
+        CREATE TABLE IF NOT EXISTS observation_type_runs (
360
+            id INTEGER PRIMARY KEY,
361
+            observation_id INTEGER NOT NULL REFERENCES observations(id),
362
+            sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
363
+            status TEXT NOT NULL,
364
+            started_at REAL,
365
+            ended_at REAL,
366
+            anchor_before BLOB,
367
+            anchor_after BLOB,
368
+            inserted_event_count INTEGER NOT NULL DEFAULT 0,
369
+            deleted_event_count INTEGER NOT NULL DEFAULT 0,
370
+            verified_visible_count INTEGER,
371
+            error_kind TEXT,
372
+            error_message_hash TEXT,
373
+            UNIQUE(observation_id, sample_type_id)
374
+        )
375
+        """, db: db)
376
+        try execute("CREATE INDEX IF NOT EXISTS idx_type_runs_type_observation ON observation_type_runs(sample_type_id, observation_id)", db: db)
377
+        try execute("""
378
+        CREATE TABLE IF NOT EXISTS sources (
379
+            id INTEGER PRIMARY KEY,
380
+            source_name_hash TEXT,
381
+            bundle_identifier TEXT
382
+        )
383
+        """, db: db)
384
+        try execute("""
385
+        CREATE TABLE IF NOT EXISTS source_revisions (
386
+            id INTEGER PRIMARY KEY,
387
+            source_id INTEGER NOT NULL REFERENCES sources(id),
388
+            product_type TEXT,
389
+            version TEXT,
390
+            operating_system_version TEXT,
391
+            UNIQUE(source_id, product_type, version, operating_system_version)
392
+        )
393
+        """, db: db)
394
+        try execute("""
395
+        CREATE TABLE IF NOT EXISTS hk_devices (
396
+            id INTEGER PRIMARY KEY,
397
+            device_hash TEXT,
398
+            manufacturer_hash TEXT,
399
+            model TEXT,
400
+            hardware_version TEXT,
401
+            firmware_version TEXT,
402
+            software_version TEXT,
403
+            local_identifier_hash TEXT,
404
+            udi_hash TEXT
405
+        )
406
+        """, db: db)
407
+        try execute("""
408
+        CREATE TABLE IF NOT EXISTS metadata_blobs (
409
+            id INTEGER PRIMARY KEY,
410
+            metadata_hash TEXT NOT NULL UNIQUE,
411
+            metadata_json TEXT NOT NULL
412
+        )
413
+        """, db: db)
414
+        try execute("""
415
+        CREATE TABLE IF NOT EXISTS samples (
416
+            id INTEGER PRIMARY KEY,
417
+            sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
418
+            sample_uuid_hash TEXT,
419
+            strict_fingerprint TEXT NOT NULL,
420
+            semantic_fingerprint TEXT,
421
+            fuzzy_key TEXT,
422
+            first_seen_observation_id INTEGER NOT NULL REFERENCES observations(id),
423
+            first_seen_at REAL NOT NULL,
424
+            UNIQUE(sample_type_id, strict_fingerprint)
425
+        )
426
+        """, db: db)
427
+        try execute("CREATE INDEX IF NOT EXISTS idx_samples_uuid_hash ON samples(sample_uuid_hash)", db: db)
428
+        try execute("CREATE INDEX IF NOT EXISTS idx_samples_type_semantic ON samples(sample_type_id, semantic_fingerprint)", db: db)
429
+        try execute("""
430
+        CREATE TABLE IF NOT EXISTS sample_versions (
431
+            id INTEGER PRIMARY KEY,
432
+            sample_id INTEGER NOT NULL REFERENCES samples(id),
433
+            payload_hash TEXT NOT NULL,
434
+            start_date REAL NOT NULL,
435
+            end_date REAL NOT NULL,
436
+            value_kind TEXT,
437
+            numeric_value REAL,
438
+            unit TEXT,
439
+            category_value INTEGER,
440
+            workout_activity_type INTEGER,
441
+            duration_seconds REAL,
442
+            source_revision_id INTEGER REFERENCES source_revisions(id),
443
+            hk_device_id INTEGER REFERENCES hk_devices(id),
444
+            metadata_id INTEGER REFERENCES metadata_blobs(id),
445
+            created_observation_id INTEGER NOT NULL REFERENCES observations(id),
446
+            UNIQUE(sample_id, payload_hash)
447
+        )
448
+        """, db: db)
449
+        try execute("CREATE INDEX IF NOT EXISTS idx_sample_versions_sample ON sample_versions(sample_id)", db: db)
450
+        try execute("CREATE INDEX IF NOT EXISTS idx_sample_versions_time ON sample_versions(start_date, end_date)", db: db)
451
+        try execute("""
452
+        CREATE TABLE IF NOT EXISTS sample_observation_events (
453
+            id INTEGER PRIMARY KEY,
454
+            observation_id INTEGER NOT NULL REFERENCES observations(id),
455
+            sample_id INTEGER NOT NULL REFERENCES samples(id),
456
+            version_id INTEGER REFERENCES sample_versions(id),
457
+            event_kind TEXT NOT NULL,
458
+            observed_at REAL NOT NULL,
459
+            evidence_kind TEXT,
460
+            UNIQUE(observation_id, sample_id, event_kind)
461
+        )
462
+        """, db: db)
463
+        try execute("CREATE INDEX IF NOT EXISTS idx_events_observation_kind ON sample_observation_events(observation_id, event_kind)", db: db)
464
+        try execute("CREATE INDEX IF NOT EXISTS idx_events_sample ON sample_observation_events(sample_id, observation_id)", db: db)
465
+        try execute("""
466
+        CREATE TABLE IF NOT EXISTS sample_visibility_ranges (
467
+            sample_id INTEGER NOT NULL REFERENCES samples(id),
468
+            version_id INTEGER REFERENCES sample_versions(id),
469
+            first_observation_id INTEGER NOT NULL REFERENCES observations(id),
470
+            last_observation_id INTEGER REFERENCES observations(id),
471
+            first_seen_at REAL NOT NULL,
472
+            last_seen_at REAL,
473
+            PRIMARY KEY (sample_id, version_id, first_observation_id)
474
+        )
475
+        """, db: db)
476
+        try execute("CREATE INDEX IF NOT EXISTS idx_visibility_open_ranges ON sample_visibility_ranges(last_observation_id)", db: db)
477
+        try execute("CREATE INDEX IF NOT EXISTS idx_visibility_point_lookup ON sample_visibility_ranges(first_observation_id, last_observation_id)", db: db)
478
+        try execute("""
479
+        CREATE TABLE IF NOT EXISTS sample_relationships (
480
+            id INTEGER PRIMARY KEY,
481
+            observation_id INTEGER REFERENCES observations(id),
482
+            source_sample_id INTEGER NOT NULL REFERENCES samples(id),
483
+            target_sample_id INTEGER NOT NULL REFERENCES samples(id),
484
+            relationship_kind TEXT NOT NULL,
485
+            metadata_id INTEGER REFERENCES metadata_blobs(id),
486
+            UNIQUE(observation_id, source_sample_id, target_sample_id, relationship_kind)
487
+        )
488
+        """, db: db)
489
+        try execute("CREATE INDEX IF NOT EXISTS idx_relationship_source ON sample_relationships(source_sample_id, relationship_kind)", db: db)
490
+        try execute("CREATE INDEX IF NOT EXISTS idx_relationship_target ON sample_relationships(target_sample_id, relationship_kind)", db: db)
491
+        try execute("""
492
+        CREATE TABLE IF NOT EXISTS observation_type_summaries (
493
+            observation_id INTEGER NOT NULL REFERENCES observations(id),
494
+            sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
495
+            visible_record_count INTEGER NOT NULL,
496
+            appeared_count INTEGER NOT NULL DEFAULT 0,
497
+            disappeared_count INTEGER NOT NULL DEFAULT 0,
498
+            representation_changed_count INTEGER NOT NULL DEFAULT 0,
499
+            earliest_start_date REAL,
500
+            latest_end_date REAL,
501
+            value_sum REAL,
502
+            value_max REAL,
503
+            aggregate_hash TEXT,
504
+            PRIMARY KEY (observation_id, sample_type_id)
505
+        )
506
+        """, db: db)
507
+        try execute("""
508
+        CREATE TABLE IF NOT EXISTS daily_type_aggregates (
509
+            observation_id INTEGER NOT NULL REFERENCES observations(id),
510
+            sample_type_id INTEGER NOT NULL REFERENCES sample_types(id),
511
+            bucket_start REAL NOT NULL,
512
+            bucket_end REAL NOT NULL,
513
+            visible_record_count INTEGER NOT NULL,
514
+            value_sum REAL,
515
+            value_max REAL,
516
+            source_revision_id INTEGER,
517
+            aggregate_hash TEXT,
518
+            PRIMARY KEY (observation_id, sample_type_id, bucket_start, source_revision_id)
519
+        )
520
+        """, db: db)
521
+        try execute("CREATE INDEX IF NOT EXISTS idx_daily_type_bucket ON daily_type_aggregates(sample_type_id, bucket_start)", db: db)
522
+        try execute("""
523
+        CREATE TABLE IF NOT EXISTS export_manifests (
524
+            id INTEGER PRIMARY KEY,
525
+            export_id TEXT NOT NULL UNIQUE,
526
+            created_at REAL NOT NULL,
527
+            export_kind TEXT NOT NULL,
528
+            from_observation_id INTEGER REFERENCES observations(id),
529
+            to_observation_id INTEGER REFERENCES observations(id),
530
+            filter_json TEXT,
531
+            manifest_hash TEXT NOT NULL,
532
+            record_count INTEGER NOT NULL
533
+        )
534
+        """, db: db)
535
+        try execute("""
536
+        CREATE TABLE IF NOT EXISTS export_items (
537
+            export_manifest_id INTEGER NOT NULL REFERENCES export_manifests(id),
538
+            sample_id INTEGER NOT NULL REFERENCES samples(id),
539
+            version_id INTEGER REFERENCES sample_versions(id),
540
+            item_hash TEXT NOT NULL,
541
+            PRIMARY KEY (export_manifest_id, sample_id, version_id)
542
+        )
543
+        """, db: db)
544
+        try createLegacyArchiveSamplesTable(db)
545
+    }
546
+
547
+    private func createLegacyArchiveSamplesTable(_ db: OpaquePointer?) throws {
176 548
         try execute("""
177 549
         CREATE TABLE IF NOT EXISTS archive_samples (
178 550
             sample_uuid_hash TEXT PRIMARY KEY NOT NULL,
@@ -211,7 +583,46 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
211 583
         """, db: db)
212 584
         try execute("CREATE INDEX IF NOT EXISTS idx_archive_samples_type_date ON archive_samples(type_identifier, start_date)", db: db)
213 585
         try execute("CREATE INDEX IF NOT EXISTS idx_archive_samples_strict_fingerprint ON archive_samples(strict_fingerprint)", db: db)
214
-        didPrepareSchema = true
586
+    }
587
+
588
+    private func seedArchiveMetadata(_ db: OpaquePointer?) throws {
589
+        try upsertMetadata(key: "schema_version", value: "\(Self.archiveSchemaVersion)", db: db)
590
+        try insertMetadataIfMissing(key: "created_at_unix", value: "\(Date().timeIntervalSince1970)", db: db)
591
+        try upsertMetadata(key: "timestamp_convention", value: "unix_seconds_utc_real", db: db)
592
+        try upsertMetadata(key: "identifier_hash_algorithm", value: "hmac-sha256-local-secret", db: db)
593
+        try upsertMetadata(key: "content_hash_algorithm", value: "sha256", db: db)
594
+        try upsertMetadata(key: "prototype_reset_policy", value: "reset_or_reinitialize_test_installs", db: db)
595
+        try withStatement(
596
+            "INSERT OR IGNORE INTO schema_migrations (version, applied_at, description) VALUES (?, ?, ?)",
597
+            db: db
598
+        ) { statement in
599
+            sqlite3_bind_int64(statement, 1, sqlite3_int64(Self.archiveSchemaVersion))
600
+            sqlite3_bind_double(statement, 2, Date().timeIntervalSince1970)
601
+            bindText("Initialize archive v2 schema", to: 3, in: statement)
602
+            guard sqlite3_step(statement) == SQLITE_DONE else {
603
+                throw SQLiteHealthArchiveStoreError.stepFailed(lastErrorMessage(db))
604
+            }
605
+        }
606
+    }
607
+
608
+    private func upsertMetadata(key: String, value: String, db: OpaquePointer?) throws {
609
+        try withStatement("INSERT OR REPLACE INTO archive_metadata (key, value) VALUES (?, ?)", db: db) { statement in
610
+            bindText(key, to: 1, in: statement)
611
+            bindText(value, to: 2, in: statement)
612
+            guard sqlite3_step(statement) == SQLITE_DONE else {
613
+                throw SQLiteHealthArchiveStoreError.stepFailed(lastErrorMessage(db))
614
+            }
615
+        }
616
+    }
617
+
618
+    private func insertMetadataIfMissing(key: String, value: String, db: OpaquePointer?) throws {
619
+        try withStatement("INSERT OR IGNORE INTO archive_metadata (key, value) VALUES (?, ?)", db: db) { statement in
620
+            bindText(key, to: 1, in: statement)
621
+            bindText(value, to: 2, in: statement)
622
+            guard sqlite3_step(statement) == SQLITE_DONE else {
623
+                throw SQLiteHealthArchiveStoreError.stepFailed(lastErrorMessage(db))
624
+            }
625
+        }
215 626
     }
216 627
 
217 628
     private func upsertSamples(_ samples: [HKSample], observedAt: Date, db: OpaquePointer?) throws -> HealthArchiveWriteSummary {
@@ -532,6 +943,20 @@ nonisolated private func columnDate(_ statement: OpaquePointer?, _ index: Int32)
532 943
     return Date(timeIntervalSinceReferenceDate: sqlite3_column_double(statement, index))
533 944
 }
534 945
 
946
+nonisolated private func columnDouble(_ statement: OpaquePointer?, _ index: Int32) -> Double? {
947
+    guard sqlite3_column_type(statement, index) != SQLITE_NULL else { return nil }
948
+    return sqlite3_column_double(statement, index)
949
+}
950
+
951
+nonisolated private func columnInt(_ statement: OpaquePointer?, _ index: Int32) -> Int? {
952
+    guard sqlite3_column_type(statement, index) != SQLITE_NULL else { return nil }
953
+    return Int(sqlite3_column_int(statement, index))
954
+}
955
+
956
+nonisolated private func quotedIdentifier(_ value: String) -> String {
957
+    "\"\(value.replacingOccurrences(of: "\"", with: "\"\""))\""
958
+}
959
+
535 960
 nonisolated private func lastErrorMessage(_ db: OpaquePointer?) -> String {
536 961
     guard let message = sqlite3_errmsg(db) else { return "unknown SQLite error" }
537 962
     return String(cString: message)
+0 -238
IMPLEMENTATION_STATUS.md
@@ -1,239 +0,0 @@
1
-# HealthProbe Implementation Status
2
-
3
-## Overview
4
-
5
-HealthProbe's comprehensive snapshot + delta system has been implemented according to the detailed plan. The project builds successfully with no compilation errors.
6
-
7
-## Completed Components (100%)
8
-
9
-### Models (Step 1-3)
10
-✅ **SnapshotQuality.swift** — All quality states (complete, partial, unauthorized, loading, failed)
11
-✅ **AnomalyType.swift** — All anomaly types + Severity + TypeTransition + TypeDeltaReason enums
12
-✅ **HealthSnapshot.swift** — Chain metadata, quality, trigger context, registry fingerprinting, timezone context
13
-✅ **TypeCount.swift** — Count with hash, date range, quality, yearly counts with cascade relationship
14
-✅ **SnapshotDelta.swift** — Local delta with checksums and cascade relationship to TypeDeltas
15
-✅ **TypeDelta.swift** — Per-type local delta with transition, reason, quality before/after, yearly count note
16
-✅ **AnomalyRecord.swift** — Anomaly record with deltaID set structurally by detector, never by caller
17
-✅ **OperationLog.swift** — Audit trail for destructive operations with JSON-encoded affected snapshot IDs
18
-✅ **YearlyCount.swift** — Per-year sample counts with approximation flag
19
-
20
-### Services (Step 4-12)
21
-
22
-#### Step 5: HashService ✅
23
-- `typeHash()` — SHA256 of typeIdentifier|count|earliest|latest (ISO8601 with fractional seconds)
24
-- `snapshotChecksum()` — Filters on quality==.complete (not hash!=""), concatenates type hashes
25
-- `typeSetHash()` — SHA256 of sorted active typeIdentifiers (covers full intended registry)
26
-
27
-#### Step 11 & 11b: HealthKitService & ObserverService ✅
28
-- Per-type fetch with **15-second combined timeout** (distribution + earliestDate + latestDate)
29
-- Concurrency capped at 6 simultaneous type fetches (prevents HealthKit resource exhaustion)
30
-- Per-type quality detection (unauthorized, failed, complete)
31
-- Real earliestDate/latestDate from separate HKSampleQuery (NOT from bin boundaries)
32
-- YearlyCount population from distribution bins with isApproximate flag
33
-- Snapshot quality aggregation (loading > unauthorized > partial > complete)
34
-- Chain metadata set before save (previousSnapshotID, isChainStart, monitoredTypeSetHash)
35
-- Auto-detect post-restore (full deny → complete transition, or chain start > 1000 records)
36
-- **Post-save pipeline**: DeltaService → AnomalyDetector → OperationLog
37
-- **ObserverService**: debounce (10 min), manual overlap suppression, all-monitored-types snapshot
38
-- **Background delivery**: .immediate for heart rate/steps, .daily for others
39
-
40
-#### Step 7: DeltaService ✅
41
-- Computes and saves SnapshotDelta with TypeDeltas
42
-- **Reason assignment with priority**: authorizationChanged > unsupported > registryChanged > unknown > normal
43
-- **Unavailable count guard**: if either quality != .complete, countDelta = 0 (never from -1)
44
-- **YearlyCount timezone guard**: if timezone changes, set countDelta = 0 and yearlyCountNote
45
-- **Delta merge** (for intermediate deletion):
46
-  - Recomputes checksums from surrounding snapshots (never carries old checksums)
47
-  - Handles disappeared→appeared transition (remove from merged delta if type existed only in deleted snapshot)
48
-  - Applies unavailable count guard and reason priority to merged result
49
-  - Sets timezone note if either source had it
50
-
51
-#### Step 8: AnomalyDetector ✅
52
-- **Pure function**: no context mutation, receives TypeCount maps, returns DetectionResult
53
-- **Quality gate**: both snapshots must be .complete (suppresses ALL detection including first auth after full deny)
54
-- **Registry gate**: skips appeared/disappeared anomalies if reason != .normal
55
-- **count = -1 guard**: skips any TypeDelta with qualityBefore or qualityAfter != .complete
56
-- **Anomaly detection rules**:
57
-  - `historicalInsertion` — countDelta > 0 AND (earlier earliest date OR recent latest with increased count)
58
-  - `deletion` — countDelta < 0 (severity based on % loss)
59
-  - `duplication` — countDelta > 50% AND date ranges within 1 day
60
-  - `silentReplacement` — countDelta == 0 AND hash differs (best-effort, MVP limitation)
61
-  - `syncAnomaly` — ≥4 types with |delta| > 10% (critical severity)
62
-- **isPostRestore suppression**:
63
-  - Suppresses syncAnomaly if previous.isPostRestore && previous.isPostRestoreSuppressedDeltaID == nil
64
-  - Suppression token consumed via DetectionResult, persisted by HealthKitService
65
-  - Forwarded past low-quality successors (quality gate prevents consumption on incomplete snapshots)
66
-- **AnomalyRecord.deltaID**: set internally, structural guarantee (impossible to return record without deltaID)
67
-
68
-#### Step 4: KeychainService ✅
69
-- Stable device ID persisted in Keychain (service: "ro.xdev.healthprobe.deviceid", account: "stable_device_id")
70
-- Detects DB reset: swiftDataStoreIsEmpty + existing keychain ID → recoveredDeviceID = true
71
-- In-process cache for repeated lookups
72
-
73
-#### Step 6 & 9: IntegrityService & Quality Aggregation ✅
74
-- `validate()` — strict mode:
75
-  - Recomputes checksum from TypeCounts
76
-  - Compares with delta.checksumAfter
77
-  - Returns .valid or .checksumMismatch / .missingDelta / .corrupted
78
-- `validateChain()` — walk backwards from latest via previousSnapshotID:
79
-  - **Fork detection**: asserts no duplicate previousSnapshotID (returns .corrupted immediately)
80
-  - Stops at first mismatch (no auto-repair, no skips)
81
-- **Quality aggregation**: loading > unauthorized (only if ALL) > partial (any failed/unauthorized) > complete
82
-
83
-#### Step 10: SnapshotLifecycleService ✅
84
-- `previewDeletion()` — advisory integrity check, surfaces willBreakChain warning to UI
85
-- `delete()` — handles all position cases (oldest, latest, intermediate):
86
-  - **Oldest**: set next as chain start
87
-  - **Latest**: just delete
88
-  - **Intermediate**: merge deltas, recompute checksums, update nextSnapshot.previousSnapshotID
89
-- **OperationLog**: always written atomically with deletive changes
90
-- **Post-save verification**: re-fetches log by ID, recovery re-insert if missing, logs critical error
91
-
92
-#### Step 12: Local-only storage refactor ✅
93
-- Removed CloudKitSyncService and CloudKit-pending chain states
94
-- **ModelContainer split**:
95
-  - uiCacheConfig: HealthSnapshot, TypeCount, YearlyCount, SnapshotDelta, TypeDelta, AnomalyRecord (derived local UI/index data)
96
-  - localConfig: OperationLog, DeviceProfile, MetricTimeoutProfile (local-only settings and operation metadata)
97
-- Added `HealthArchiveStore` protocol for the single local archive store source of truth
98
-- Added `SQLiteHealthArchiveStore`: actor-isolated SQLite archive with WAL, per-sample upsert, disappearance marking, verification timestamps, semantic fingerprints, metadata JSON, and scoped JSON report export
99
-- HealthKit anchored-query pages now archive samples/deletions before SwiftData snapshot/index rows are built
100
-- Schema migration recovery: removes legacy SwiftData stores and retries once on failure
101
-
102
-### UI (Step 13)
103
-
104
-✅ **SnapshotRow** — Shows:
105
-  - Chain indicators: "Chain start" / "DB reset / recovered device ID" / "Post-restore baseline" / "Observer-triggered snapshot"
106
-  - Anomaly warning badge (exclamationmark.triangle) if anomalyFlags non-empty
107
-  - Incomplete snapshot warning if quality != .complete
108
-
109
-✅ **SnapshotTypeCountRow** — Shows:
110
-  - "Unsupported" for isUnsupported = true (read directly, no delta needed)
111
-  - "Unavailable" for count = -1
112
-  - Numeric count with warning color if quality != .complete
113
-  - Delta badge vs. baseline (green/amber)
114
-
115
-✅ **DashboardView** — Anomaly summary section:
116
-  - Counts unresolved anomalies by severity (critical/warning)
117
-  - Shows only if unresolved anomalies exist
118
-
119
-✅ **Full feature coverage**:
120
-  - Snapshot creation with observer triggers
121
-  - Chain visualization and deletion with integrity warnings
122
-  - Quality badges and anomaly indicators
123
-  - Timezone/registry change awareness
124
-  - Baseline comparison across multiple devices
125
-
126
-## Build Status
127
-
128
-```
129
-✅ BUILD SUCCEEDED
130
-  Target: HealthProbe (iOS 26.4)
131
-  No compilation errors or warnings
132
-  App signs successfully
133
-```
134
-
135
-## Verification Checklist (32 items from plan)
136
-
137
-These tests should be run to ensure all backend functionality is correct:
138
-
139
-### Basic Snapshot & Chain (1-3)
140
-- [ ] 1. Build succeeds with no errors
141
-- [ ] 2. First snapshot: isChainStart=true, previousSnapshotID=nil, no delta created
142
-- [ ] 3. Second snapshot: SnapshotDelta created with correct checksumBefore/After
143
-
144
-### Quality & Anomalies (4-7)
145
-- [ ] 4. Revoke permission → type quality=.unauthorized, snapshot=.partial, no anomalies
146
-- [ ] 5. All permissions revoked → snapshot=.unauthorized, no anomalies  
147
-- [ ] 6. Timeout simulation (1ms) → count=-1, quality=.failed, "Unavailable" in UI
148
-- [ ] 7. Post-authorize after full deny → first delta suppressed, snapshot marked post-restore
149
-
150
-### Chain Operations (8-10)
151
-- [ ] 8. 3 snapshots A→B→C, delete B → single merged delta A→C, C.previousSnapshotID==A.id
152
-- [ ] 9. Hash stability → no changes between snapshots = identical hashes/checksums
153
-- [ ] 10. Integrity strict mode → corrupted checksum = validation stops, no auto-repair
154
-
155
-### Advanced Features (11-20)
156
-- [ ] 11. DB reset with Keychain survival → same deviceID, isChainStart=true, recoveredDeviceID=true
157
-- [ ] 12. Local-only launch → app functions without iCloud/CloudKit entitlements
158
-- [ ] 13. Observer debounce → 10 rapid callbacks = exactly 1 snapshot (triggerReason=observerCallback)
159
-- [ ] 14. Unsupported type → TypeCount(count=-1, quality=.failed, isUnsupported=true), "Unsupported" UI
160
-- [ ] 15. YearlyCount timezone → Calendar.current used, isApproximate=true if bucket > day
161
-- [ ] 16. Delta merge with unavailable counts → merged countDelta=0, impaired reason preserved
162
-- [ ] 17. Missing local delta/typeDeltas → integrity validation surfaces the fault, never hides it as sync latency
163
-- [ ] 18. First auth after full deny (quality gate) → no anomalies, current.isPostRestore=true, isPostRestoreInferred=true
164
-- [ ] 19. Chain fork → validateChain() returns .corrupted(reason: "chain fork detected"), stops
165
-- [ ] 20. disappeared→appeared merge with -1 source → merged countDelta=0, reason != .normal
166
-
167
-### Reason Priority & Suppression (21-26)
168
-- [ ] 21. TypeDelta reason priority → .unauthorized wins over .registryChanged simultaneously
169
-- [ ] 22. Debounce + manual overlap → no observer snapshot if manual created during debounce
170
-- [ ] 23. completionHandler unconditional → called via defer, never gated on scheduling success
171
-- [ ] 24. isPostRestore forwarding → suppression forwarded past low-quality, consumed on next .complete
172
-- [ ] 25. Missing delta → validateChain() returns .missingDelta and stops
173
-- [ ] 26. OperationLog verification → recovery re-insert if missing after save, log critical error
174
-
175
-### Coherence & Edge Cases (27-32)
176
-- [ ] 27. Per-type query concurrency → max 6 simultaneous HK queries (not 3N at N=20)
177
-- [ ] 28. YearlyCount timezone drift → countDelta=0, yearlyCountNote set, no anomalies
178
-- [ ] 29. isUnsupported on TypeCount → UI shows "Unsupported" without delta context
179
-- [ ] 30. count/quality coherence assert → debug assert fires, release corrects to -1
180
-- [ ] 31. snapshotChecksum filter → uses quality==.complete, not hash!="" (determinism)
181
-- [ ] 32. AnomalyRecord.deltaID structural → every record has deltaID==delta.id (no external setter)
182
-
183
-## Architectural Highlights
184
-
185
-### Purity & Immutability
186
-- **AnomalyDetector** is pure: no SwiftData mutations, explicit TypeCount maps, DetectionResult metadata
187
-- **DeltaService** never carries old checksums during merge (recomputes from surrounding snapshots)
188
-- **OperationLog** atomicity: log + destructive changes in single context.save()
189
-
190
-### Quality Gates
191
-- **Snapshot quality** aggregation prevents false positives:
192
-  - All detection requires both snapshots .complete
193
-  - Covers first authorization after full deny (quality gate alone is complete suppression)
194
-  - isPostRestore suppression forwarded past low-quality successors
195
-
196
-### Chain Integrity
197
-- **previousSnapshotID** is the sole source of chain truth (not localSequenceNumber)
198
-- **Fork detection** prevents chain divergence (asserts no duplicate previousSnapshotID)
199
-- **Checksum validation** ensures data wasn't corrupted between snapshots
200
-
201
-### Local Archive Direction
202
-- CloudKit/iCloud sync is not a product goal
203
-- SwiftData rows are derived UI/index data and must be rebuildable from the local archive store
204
-- Missing deltas or type deltas are treated as local integrity faults, not remote sync latency
205
-
206
-### Observability
207
-- **Reason priority** makes anomaly suppression deterministic
208
-  - authorizationChanged > unsupported > registryChanged > unknown > normal
209
-  - Prevents .registryChanged from masking .authorizationChanged
210
-- **YearlyCount timezone guard** prevents false loss attribution across DST
211
-- **TypeDelta.yearlyCountNote** signals unreliable year-level attribution
212
-
213
-## Known Limitations (MVP)
214
-
215
-1. **Hash** covers only count + date range, not distribution (silentReplacement is best-effort)
216
-2. **YearlyCount** precision requires daily bucket granularity (noted if isApproximate)
217
-3. **Archive query/report UI is still pending** (store exists, UI still mostly reads SwiftData cache)
218
-4. **No automatic cross-device reconstruction**; cross-device analysis is future macOS/report work
219
-
220
-## Next Steps
221
-
222
-### Immediate (Testing)
223
-1. Run all 32 verification checks against real HealthKit data
224
-2. Create unit tests for delta merge, reason priority, anomaly detection
225
-3. Test observer callback debounce with real HKObserverQuery
226
-4. Add archive status/report UI backed by `HealthArchiveStore`
227
-
228
-### Post-MVP
229
-1. Integrate actual BGTask expiration guard for observer snapshots (capture partial results)
230
-2. Add delta comparison view showing TypeDelta reason and suppression explanations
231
-3. Implement OperationLog viewer in UI (audit trail dashboard)
232
-4. Add historical trend analysis (divergence detection, anomaly patterns)
233
-
234
-
235
-**Built with:** SwiftUI, SwiftData, HealthKit, CryptoKit  
236
-**Minimum iOS:** 17.0  
237
-**Target iOS:** 26.4  
238
-**Swift Version:** 5.9+