Showing 5 changed files with 62 additions and 135 deletions
+2 -0
HealthProbe/Doc/02-architecture/Database-Design.md
@@ -488,6 +488,8 @@ Do not include SQLite row ids in fingerprints. If HealthKit UUID is available, a
488 488
 
489 489
 `payload_hash` is `SHA-256` over the canonical sample payload representation, including dates, value/unit/category/workout fields, source revision fields, device provenance hashes, metadata hash, and relationship payload when available. A new `sample_versions` row is created when `payload_hash` changes.
490 490
 
491
+Implementation note, 2026-05-24: archive v2 capture must derive `payload_hash` from the same normalized row values that are persisted. Unknown HealthKit OS versions, including `0.0.0`, are stored as absent. Capture must not depend on the legacy `archive_samples` mirror, and capture pages must leave that transitional table empty while remaining v2 schema cleanup is in progress.
492
+
491 493
 `semantic_fingerprint` is type-specific and optional. It supports consolidation heuristics and fuzzy backup/export reconciliation, but it is never sufficient by itself to prove record identity.
492 494
 
493 495
 ### 6.5 Timezone And Aggregate Buckets
+4 -4
HealthProbe/Doc/04-project/IMPLEMENTATION_STATUS.md
@@ -25,7 +25,7 @@ There are no real deployments, only test installations. Existing prototype datab
25 25
 |------|----------------|--------------------|
26 26
 | Product docs | Updated | Keep `HealthProbe/Doc/README.md` as canonical index |
27 27
 | HealthKit capture | Prototype exists | Adapt capture to write differential SQLite observations first |
28
-| SQLite archive | Archive v2 schema, differential write path, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; legacy write mirror still exists | Investigate and retire `archive_samples`, then start Core Data cache work |
28
+| SQLite archive | Archive v2 schema, differential write path, daily aggregate rebuilds, integrity report, v2 record reads, SQL diff/count/aggregate/provenance/consolidation-evidence APIs, large synthetic diff pagination coverage, formal timing/memory metrics, and XCTest coverage are in place; capture no longer writes the legacy `archive_samples` mirror | Remove the now-empty legacy schema/update remnants after v2 verification/delete flows are complete, then start Core Data cache work |
29 29
 | Core Data cache | Not implemented | Add rebuildable cache for expensive counts, summaries, report metadata, UI state |
30 30
 | SwiftData cache | Exists | Treat as disposable prototype data; reset/ignore during v2 transition |
31 31
 | UI | Prototype exists | Reframe screens around observations, diffs, export, archive status |
@@ -38,7 +38,7 @@ There are no real deployments, only test installations. Existing prototype datab
38 38
 
39 39
 Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.md).
40 40
 
41
-1. Investigate why removing the legacy `archive_samples` write mirror changes v2 idempotency in tests, then retire it safely.
41
+1. Remove the remaining empty legacy `archive_samples` schema/update remnants once v2 verification/delete paths no longer reference them.
42 42
 2. Add Core Data UI/report cache and rebuild pipeline.
43 43
 3. Replace SwiftData UI dependencies with Core Data/cache DTOs.
44 44
 4. Update UI language from anomaly/status to observation/diff/export.
@@ -50,10 +50,10 @@ Detailed checkable milestones live in [`Refactoring-Plan.md`](Refactoring-Plan.m
50 50
 - SwiftData currently blocks iOS 15-era device support.
51 51
 - Existing `Anomaly*` model/service names are legacy language.
52 52
 - Some screens still imply snapshot-count monitoring rather than Time Machine inspection.
53
-- Current archive schema is not sufficient as the long-term source of truth.
53
+- Remaining `archive_samples` table/update statements are transitional leftovers; capture writes only archive v2 identity/version/visibility tables.
54 54
 - Existing implementation may decode or cache too much data for low-end devices.
55 55
 - Old prototype database compatibility is no longer required.
56
-- Initial SQLite archive tests cover open/init/reset/idempotency, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, and consolidation-evidence labels, but not yet export behavior.
56
+- Initial SQLite archive tests cover open/init/reset/idempotency, legacy mirror non-use, small observation diffs, large synthetic diff pagination, formal timing/memory metrics, materialized aggregate comparison, source/provenance breakdowns, and consolidation-evidence labels, but not yet export behavior.
57 57
 
58 58
 ## Verification Checklist
59 59
 
+3 -1
HealthProbe/Doc/04-project/Refactoring-Plan.md
@@ -130,13 +130,15 @@ Checklist:
130 130
 - [x] Rebuild/update affected type summaries and daily aggregates after capture/delete observations.
131 131
 - [x] Commit SQLite before Core Data/cache work.
132 132
 - [x] Make repeated capture page writes idempotent.
133
+- [x] Stop writing the legacy `archive_samples` mirror during capture.
134
+- [ ] Remove remaining empty `archive_samples` schema/update remnants after v2 verification/delete paths are complete.
133 135
 
134 136
 Acceptance:
135 137
 - [x] Initial import stores identities and versions once.
136 138
 - [x] Re-running same page does not duplicate sample identities or payload versions.
137 139
 - [x] Representation change creates a new version, not a new logical sample.
138 140
 - [x] Disappearance closes visibility range.
139
-- [ ] No full observation copy table is written.
141
+- [x] No full observation copy table is written during capture.
140 142
 
141 143
 ## Milestone 5 - SQL Analysis Layer
142 144
 
+52 -130
HealthProbe/Services/SQLiteHealthArchiveStore.swift
@@ -1374,25 +1374,16 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
1374 1374
         var unchanged = 0
1375 1375
         var touchedTypeIDs = Set<Int64>()
1376 1376
 
1377
-        try withLegacyArchiveSampleStatement(db: db) { legacyStatement in
1378
-            for row in rows {
1379
-                let result = try upsertArchiveV2Sample(row, observationID: observationID, db: db)
1380
-                touchedTypeIDs.insert(result.sampleTypeID)
1381
-                switch result.kind {
1382
-                case .inserted:
1383
-                    inserted += 1
1384
-                case .updated:
1385
-                    updated += 1
1386
-                case .unchanged:
1387
-                    unchanged += 1
1388
-                }
1389
-
1390
-                sqlite3_reset(legacyStatement)
1391
-                sqlite3_clear_bindings(legacyStatement)
1392
-                bind(row, to: legacyStatement)
1393
-                guard sqlite3_step(legacyStatement) == SQLITE_DONE else {
1394
-                    throw SQLiteHealthArchiveStoreError.stepFailed(lastErrorMessage(db))
1395
-                }
1377
+        for row in rows {
1378
+            let result = try upsertArchiveV2Sample(row, observationID: observationID, db: db)
1379
+            touchedTypeIDs.insert(result.sampleTypeID)
1380
+            switch result.kind {
1381
+            case .inserted:
1382
+                inserted += 1
1383
+            case .updated:
1384
+                updated += 1
1385
+            case .unchanged:
1386
+                unchanged += 1
1396 1387
             }
1397 1388
         }
1398 1389
 
@@ -1413,57 +1404,6 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
1413 1404
         )
1414 1405
     }
1415 1406
 
1416
-    private func withLegacyArchiveSampleStatement<T>(
1417
-        db: OpaquePointer?,
1418
-        body: (OpaquePointer?) throws -> T
1419
-    ) throws -> T {
1420
-        let sql = """
1421
-        INSERT INTO archive_samples (
1422
-            sample_uuid_hash, type_identifier, strict_fingerprint, semantic_fingerprint,
1423
-            start_date, end_date, first_seen_at, last_seen_at, last_verified_at,
1424
-            disappeared_at, observed_count, value_kind, value, unit, category_value,
1425
-            workout_activity_type, duration_seconds, source_name, source_bundle_identifier,
1426
-            source_product_type, source_version, source_operating_system_version,
1427
-            device_name, device_manufacturer, device_model, device_hardware_version,
1428
-            device_firmware_version, device_software_version, device_local_identifier,
1429
-            device_udi_device_identifier, metadata_json, archived_at
1430
-        ) VALUES (
1431
-            ?, ?, ?, ?, ?, ?, ?, ?, ?, NULL, 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
1432
-        )
1433
-        ON CONFLICT(sample_uuid_hash) DO UPDATE SET
1434
-            strict_fingerprint = excluded.strict_fingerprint,
1435
-            semantic_fingerprint = excluded.semantic_fingerprint,
1436
-            start_date = excluded.start_date,
1437
-            end_date = excluded.end_date,
1438
-            last_seen_at = excluded.last_seen_at,
1439
-            last_verified_at = excluded.last_verified_at,
1440
-            disappeared_at = NULL,
1441
-            observed_count = archive_samples.observed_count + 1,
1442
-            value_kind = excluded.value_kind,
1443
-            value = excluded.value,
1444
-            unit = excluded.unit,
1445
-            category_value = excluded.category_value,
1446
-            workout_activity_type = excluded.workout_activity_type,
1447
-            duration_seconds = excluded.duration_seconds,
1448
-            source_name = excluded.source_name,
1449
-            source_bundle_identifier = excluded.source_bundle_identifier,
1450
-            source_product_type = excluded.source_product_type,
1451
-            source_version = excluded.source_version,
1452
-            source_operating_system_version = excluded.source_operating_system_version,
1453
-            device_name = excluded.device_name,
1454
-            device_manufacturer = excluded.device_manufacturer,
1455
-            device_model = excluded.device_model,
1456
-            device_hardware_version = excluded.device_hardware_version,
1457
-            device_firmware_version = excluded.device_firmware_version,
1458
-            device_software_version = excluded.device_software_version,
1459
-            device_local_identifier = excluded.device_local_identifier,
1460
-            device_udi_device_identifier = excluded.device_udi_device_identifier,
1461
-            metadata_json = excluded.metadata_json,
1462
-            archived_at = excluded.archived_at
1463
-        """
1464
-        return try withStatement(sql, db: db, body: body)
1465
-    }
1466
-
1467 1407
     private func upsertArchiveV2Sample(
1468 1408
         _ row: ArchiveSampleRow,
1469 1409
         observationID: Int64,
@@ -2261,38 +2201,6 @@ actor SQLiteHealthArchiveStore: HealthArchiveStore {
2261 2201
         return try body(statement)
2262 2202
     }
2263 2203
 
2264
-    private func bind(_ row: ArchiveSampleRow, to statement: OpaquePointer?) {
2265
-        bindText(row.sampleUUIDHash, to: 1, in: statement)
2266
-        bindText(row.typeIdentifier, to: 2, in: statement)
2267
-        bindText(row.strictFingerprint, to: 3, in: statement)
2268
-        bindText(row.semanticFingerprint, to: 4, in: statement)
2269
-        sqlite3_bind_double(statement, 5, row.startDate.timeIntervalSinceReferenceDate)
2270
-        sqlite3_bind_double(statement, 6, row.endDate.timeIntervalSinceReferenceDate)
2271
-        sqlite3_bind_double(statement, 7, row.observedAt.timeIntervalSinceReferenceDate)
2272
-        sqlite3_bind_double(statement, 8, row.observedAt.timeIntervalSinceReferenceDate)
2273
-        sqlite3_bind_double(statement, 9, row.observedAt.timeIntervalSinceReferenceDate)
2274
-        bindText(row.valueKind, to: 10, in: statement)
2275
-        bindDouble(row.value, to: 11, in: statement)
2276
-        bindText(row.unit, to: 12, in: statement)
2277
-        bindInt(row.categoryValue, to: 13, in: statement)
2278
-        bindInt(row.workoutActivityType, to: 14, in: statement)
2279
-        bindDouble(row.durationSeconds, to: 15, in: statement)
2280
-        bindText(row.sourceName, to: 16, in: statement)
2281
-        bindText(row.sourceBundleIdentifier, to: 17, in: statement)
2282
-        bindText(row.sourceProductType, to: 18, in: statement)
2283
-        bindText(row.sourceVersion, to: 19, in: statement)
2284
-        bindText(row.sourceOperatingSystemVersion, to: 20, in: statement)
2285
-        bindText(row.deviceName, to: 21, in: statement)
2286
-        bindText(row.deviceManufacturer, to: 22, in: statement)
2287
-        bindText(row.deviceModel, to: 23, in: statement)
2288
-        bindText(row.deviceHardwareVersion, to: 24, in: statement)
2289
-        bindText(row.deviceFirmwareVersion, to: 25, in: statement)
2290
-        bindText(row.deviceSoftwareVersion, to: 26, in: statement)
2291
-        bindText(row.deviceLocalIdentifier, to: 27, in: statement)
2292
-        bindText(row.deviceUDI, to: 28, in: statement)
2293
-        bindText(row.metadataJSON, to: 29, in: statement)
2294
-        sqlite3_bind_double(statement, 30, row.observedAt.timeIntervalSinceReferenceDate)
2295
-    }
2296 2204
 }
2297 2205
 
2298 2206
 private struct ArchiveSampleRow {
@@ -2340,6 +2248,8 @@ private struct ArchiveSampleRow {
2340 2248
     nonisolated init(sample: HKSample, observedAt: Date) {
2341 2249
         let sampleUUID = sample.uuid.uuidString
2342 2250
         let typeIdentifier = sample.sampleType.identifier
2251
+        let startDate = sample.startDate
2252
+        let endDate = sample.endDate
2343 2253
         let quantity = ArchiveSampleRow.quantityPayload(sample)
2344 2254
         let category = sample as? HKCategorySample
2345 2255
         let workout = sample as? HKWorkout
@@ -2351,7 +2261,16 @@ private struct ArchiveSampleRow {
2351 2261
         let categoryValue = category?.value
2352 2262
         let workoutActivityType = workout.map { Int($0.workoutActivityType.rawValue) }
2353 2263
         let durationSeconds = workout?.duration
2264
+        let sourceBundleIdentifier = sourceRevision.source.bundleIdentifier
2265
+        let sourceProductType = sourceRevision.productType
2266
+        let sourceVersion = sourceRevision.version
2354 2267
         let sourceOperatingSystemVersion = ArchiveSampleRow.operatingSystemVersionString(sourceRevision.operatingSystemVersion)
2268
+        let deviceModel = device?.model
2269
+        let deviceHardwareVersion = device?.hardwareVersion
2270
+        let deviceFirmwareVersion = device?.firmwareVersion
2271
+        let deviceSoftwareVersion = device?.softwareVersion
2272
+        let deviceLocalIdentifier = device?.localIdentifier
2273
+        let deviceUDI = device?.udiDeviceIdentifier
2355 2274
         let metadataJSON = ArchiveSampleRow.metadataJSONString(sample.metadata)
2356 2275
         let metadataHash = metadataJSON.map { HashService.archiveContentHash(domain: "hp:v2:metadata", parts: [$0]) }
2357 2276
 
@@ -2360,18 +2279,18 @@ private struct ArchiveSampleRow {
2360 2279
         self.strictFingerprint = HashService.sampleFingerprint(
2361 2280
             typeIdentifier: typeIdentifier,
2362 2281
             sampleUUID: sampleUUID,
2363
-            startDate: sample.startDate,
2364
-            endDate: sample.endDate
2282
+            startDate: startDate,
2283
+            endDate: endDate
2365 2284
         )
2366 2285
         self.semanticFingerprint = HashService.archiveSemanticFingerprint(
2367 2286
             typeIdentifier: typeIdentifier,
2368
-            startDate: sample.startDate,
2369
-            endDate: sample.endDate,
2287
+            startDate: startDate,
2288
+            endDate: endDate,
2370 2289
             value: numericValue,
2371 2290
             unit: unit,
2372 2291
             categoryValue: categoryValue,
2373 2292
             workoutActivityType: workout?.workoutActivityType.rawValue,
2374
-            sourceBundleIdentifier: sourceRevision.source.bundleIdentifier
2293
+            sourceBundleIdentifier: sourceBundleIdentifier
2375 2294
         )
2376 2295
         self.valueKind = valueKind
2377 2296
         self.value = numericValue
@@ -2380,47 +2299,47 @@ private struct ArchiveSampleRow {
2380 2299
         self.workoutActivityType = workoutActivityType
2381 2300
         self.durationSeconds = durationSeconds
2382 2301
         self.sourceName = sourceRevision.source.name
2383
-        self.sourceBundleIdentifier = sourceRevision.source.bundleIdentifier
2384
-        self.sourceProductType = sourceRevision.productType
2385
-        self.sourceVersion = sourceRevision.version
2302
+        self.sourceBundleIdentifier = sourceBundleIdentifier
2303
+        self.sourceProductType = sourceProductType
2304
+        self.sourceVersion = sourceVersion
2386 2305
         self.sourceOperatingSystemVersion = sourceOperatingSystemVersion
2387 2306
         self.deviceName = device?.name
2388 2307
         self.deviceManufacturer = device?.manufacturer
2389
-        self.deviceModel = device?.model
2390
-        self.deviceHardwareVersion = device?.hardwareVersion
2391
-        self.deviceFirmwareVersion = device?.firmwareVersion
2392
-        self.deviceSoftwareVersion = device?.softwareVersion
2393
-        self.deviceLocalIdentifier = device?.localIdentifier
2394
-        self.deviceUDI = device?.udiDeviceIdentifier
2308
+        self.deviceModel = deviceModel
2309
+        self.deviceHardwareVersion = deviceHardwareVersion
2310
+        self.deviceFirmwareVersion = deviceFirmwareVersion
2311
+        self.deviceSoftwareVersion = deviceSoftwareVersion
2312
+        self.deviceLocalIdentifier = deviceLocalIdentifier
2313
+        self.deviceUDI = deviceUDI
2395 2314
         self.metadataJSON = metadataJSON
2396 2315
         self.metadataHash = metadataHash
2397 2316
         self.payloadHash = HashService.archiveContentHash(
2398 2317
             domain: "hp:v2:payload",
2399 2318
             parts: [
2400 2319
                 typeIdentifier,
2401
-                ArchiveSampleRow.timestampString(sample.startDate),
2402
-                ArchiveSampleRow.timestampString(sample.endDate),
2320
+                ArchiveSampleRow.timestampString(startDate),
2321
+                ArchiveSampleRow.timestampString(endDate),
2403 2322
                 valueKind,
2404 2323
                 numericValue.map { String(format: "%.17g", $0) },
2405 2324
                 unit,
2406 2325
                 categoryValue.map(String.init),
2407 2326
                 workoutActivityType.map(String.init),
2408 2327
                 durationSeconds.map { String(format: "%.17g", $0) },
2409
-                sourceRevision.source.bundleIdentifier,
2410
-                sourceRevision.productType,
2411
-                sourceRevision.version,
2412
-                ArchiveSampleRow.operatingSystemVersionString(sourceRevision.operatingSystemVersion),
2413
-                device?.model,
2414
-                device?.hardwareVersion,
2415
-                device?.firmwareVersion,
2416
-                device?.softwareVersion,
2417
-                device?.localIdentifier,
2418
-                device?.udiDeviceIdentifier,
2328
+                sourceBundleIdentifier,
2329
+                sourceProductType,
2330
+                sourceVersion,
2331
+                sourceOperatingSystemVersion,
2332
+                deviceModel,
2333
+                deviceHardwareVersion,
2334
+                deviceFirmwareVersion,
2335
+                deviceSoftwareVersion,
2336
+                deviceLocalIdentifier,
2337
+                deviceUDI,
2419 2338
                 metadataHash
2420 2339
             ]
2421 2340
         )
2422
-        self.startDate = sample.startDate
2423
-        self.endDate = sample.endDate
2341
+        self.startDate = startDate
2342
+        self.endDate = endDate
2424 2343
         self.observedAt = observedAt
2425 2344
     }
2426 2345
 
@@ -2488,6 +2407,9 @@ private struct ArchiveSampleRow {
2488 2407
     }
2489 2408
 
2490 2409
     nonisolated private static func operatingSystemVersionString(_ version: OperatingSystemVersion) -> String? {
2410
+        guard version.majorVersion != 0 || version.minorVersion != 0 || version.patchVersion != 0 else {
2411
+            return nil
2412
+        }
2491 2413
         guard (0...100).contains(version.majorVersion),
2492 2414
               (0...1_000).contains(version.minorVersion),
2493 2415
               (0...1_000).contains(version.patchVersion) else {
+1 -0
HealthProbeTests/SQLiteHealthArchiveStoreTests.swift
@@ -73,6 +73,7 @@ final class SQLiteHealthArchiveStoreTests: XCTestCase {
73 73
         XCTAssertEqual(try countRows(in: "sample_versions", at: url), 1, versionDebugRows)
74 74
         XCTAssertEqual(try countRows(in: "sample_visibility_ranges", at: url), 1)
75 75
         XCTAssertEqual(try countRows(in: "source_revisions", at: url), 1)
76
+        XCTAssertEqual(try countRows(in: "archive_samples", at: url), 0)
76 77
         XCTAssertEqual(secondWrite.insertedCount, 0)
77 78
         XCTAssertEqual(secondWrite.updatedCount, 0)
78 79
         XCTAssertEqual(secondWrite.unchangedCount, 1)