|
594
|
594
|
| 2026-06-04 | `f73f076` | Add archive finalization phase timings to diagnostics. | The post-`7e3b997` report proved the top-level `finalizeElapsed` bucket was too opaque for the next optimization. Diagnostics now split finalization into event-count/previous-summary lookup, type-summary work, daily-aggregate work, observation-type-run update, and residual other time. Follow-up report with `reportSchemaVersion: 3` and `buildFingerprint: 1.0(1)-1780606903-92064` completed in `22.6s`, with `127/127` complete, `CaptureModes: unchangedDelta=119, delta=8`, and `DeltaEvents: 27`. Finalization was `10.3s`: event-count/previous-summary lookup `1.8s`, type-summary `0.0s`, daily aggregates `7.3s`, run update `0.0s`, and other `1.2s`. Heart Rate had `9` delta events and spent `4.8s` finalizing, of which `3.8s` was daily aggregate work and `0.9s` was event-count/previous-summary lookup. Conclusion: the remaining finalize bottleneck is not type-summary fallback; it is changed-type daily aggregate maintenance, especially Heart Rate. |
|
|
595
|
595
|
| 2026-06-04 | older build / schema v2 | Captured large first-import baseline on a bigger device database. | Initial full-profile snapshot on an older build completed with `127/127` metrics and `8,421,978` records, but it used `reportSchemaVersion: 2` and has no build fingerprint. Treat it as a volume/shape baseline, not a precise current-build comparison. Wall clock was `166m10s`; summed fetch `5m19s`, processing `20m29s`, insert `137m31s`, finalize `1m53s`. The high-volume types dominated: Heart Rate `2,225,738` records and `46m57s` total (`39m16s` insert), Active Energy `1,914,449` records and `41m35s` total (`35m21s` insert), another high-volume type around `2,007,920` records and `41m20s` total (`34m29s` insert), and Basal Energy `1,116,074` records and `21m37s` total (`17m48s` insert). Conclusion: for clean first imports on very large databases, SQLite insert/index/write-path cost remains the central risk; incremental daily-aggregate optimization should not add first-import indexes without measurement. |
|
|
596
|
596
|
| 2026-06-05 | `6041bac` | Split daily aggregate finalization timings. | The first finalization phase report identified daily aggregate work as the remaining changed-type bottleneck, but `finalizeDailyAggregateElapsed` still mixed affected-bucket lookup, previous aggregate copy, destination delete, affected-bucket rebuild, replacement insert, and residual SQL/transaction overhead. Diagnostics now emit aggregate and per-type daily subphase fields: bucket lookup, copy, delete, rebuild, insert, and other. Follow-up report with `buildFingerprint: 1.0(1)-1780618540-92064` completed in `23.5s`, with `127/127` complete, `CaptureModes: unchangedDelta=118, delta=9`, and `DeltaEvents: 97`. Finalization was `10.5s`, daily aggregate work was `7.4s`, and daily rebuild alone was `6.9s`; daily copy was only `0.5s`. Heart Rate had `40` delta events and spent `4.8s` finalizing, of which `3.8s` was daily aggregate rebuild. Conclusion: copying previous materialized daily rows is not the bottleneck; affected-bucket rebuild scans are. |
|
|
597
|
|
-| 2026-06-05 | pending | Rebuild changed daily aggregate buckets from time-ranged versions. | The changed-bucket rebuild query previously started from all samples for the type and only then filtered version `start_date`; for Heart Rate this can traverse roughly `900k` visible rows to rebuild a few affected days. The query now starts from `sample_versions(start_date, sample_id)` for the affected date window, joins to `samples` for type filtering, and joins open visibility ranges by `(sample_id, version_id, last_observation_id)`. Expected signal: repeated full-profile captures should reduce `SummedFinalizeDailyAggregateRebuildElapsed`, especially Heart Rate's `3.8s` daily rebuild. Risk to monitor: the new `sample_versions(start_date, sample_id)` index adds first-import write/index cost, so keep checking large first-import insert timing before accepting this as a permanent schema tradeoff. |
|
|
|
597
|
+| 2026-06-05 | `bf5a861` | Rebuild changed daily aggregate buckets from time-ranged versions. | Confirmed on two full-profile repeated captures with `buildFingerprint: 1.0(1)-1780640325-92064`. The overnight-data run completed in `21.8s` with `127/127` complete, `CaptureModes: unchangedDelta=111, delta=16`, and `DeltaEvents: 322`; daily aggregate rebuild dropped from `6.9s` to `0.0s`, with daily aggregate work now only `0.6s` copy. The low-delta run completed in `6.6s` with `CaptureModes: unchangedDelta=125, delta=2`; daily aggregate rebuild again stayed `0.0s`. Conclusion: the time-ranged `sample_versions(start_date, sample_id)` query solved the affected-bucket rebuild bottleneck. Continue monitoring first-import insert timing because the new index is a write-path tradeoff. |
|
|
|
598
|
+| 2026-06-05 | pending | Count observation events from the event table first. | After `bf5a861`, remaining finalization cost moved to event-count / previous-summary lookup: `2.5s` on the overnight-data run and `0.2s` on low-delta. `eventCounts` still started from `samples` filtered by type, which can scan a high-volume type to count a few events. The query now starts from `sample_observation_events(observation_id, event_kind)` and joins to `samples` only to filter type. Expected signal: lower `SummedFinalizeEventCountElapsed`, especially Heart Rate's `0.9s` and Cycling Distance's `0.7s` in the overnight report. |
|
|
598
|
599
|
|
|
599
|
600
|
## Current Diagnosis
|
|
600
|
601
|
|
|
717
|
718
|
`3.8s` rebuilding affected daily buckets. The next experiment changes the
|
|
718
|
719
|
affected-bucket rebuild query shape so it starts from time-ranged
|
|
719
|
720
|
`sample_versions` instead of all samples of the type.
|
|
|
721
|
+- The `bf5a861` follow-up reports validated that time-ranged daily aggregate
|
|
|
722
|
+ rebuilds worked: `SummedFinalizeDailyAggregateRebuildElapsed` was `0.0s` on
|
|
|
723
|
+ both an overnight-data run (`322` delta events, `21.8s` wall clock) and a
|
|
|
724
|
+ low-delta run (`2` delta events, `6.6s` wall clock). Daily aggregate cost is
|
|
|
725
|
+ no longer the active repeated-capture bottleneck.
|
|
|
726
|
+- After daily rebuild was removed, the next measured finalization floor is
|
|
|
727
|
+ event-count / previous-summary lookup. The old event-count query started from
|
|
|
728
|
+ `samples(sample_type_id)` and could traverse a high-volume type to count a
|
|
|
729
|
+ handful of events. The next query shape starts from
|
|
|
730
|
+ `sample_observation_events(observation_id, event_kind)` and joins to samples
|
|
|
731
|
+ for type filtering.
|
|
720
|
732
|
- A large older-build first import on an `8.4M`-record database completed but
|
|
721
|
733
|
took `166m10s`, with `137m31s` summed insert time. This confirms that full
|
|
722
|
734
|
authorized backup volume can be much larger than the original 15-type test
|
|
780
|
792
|
identity unless the build provenance is otherwise certain. `sourceCommit`
|
|
781
|
793
|
and `sourceDirty` are useful when present, but may be `unknown` for normal
|
|
782
|
794
|
Xcode test installs.
|
|
783
|
|
-8. Run a repeated full-profile capture after the time-ranged daily aggregate
|
|
784
|
|
- rebuild query. Compare `SummedFinalizeDailyAggregateRebuildElapsed` and Heart
|
|
785
|
|
- Rate `finalizeDailyAggregateRebuildElapsed` against the `6.9s` total /
|
|
786
|
|
- `3.8s` Heart Rate baseline from `6041bac`. Also watch first-import insert
|
|
787
|
|
- timing on the next clean large-database import because the new
|
|
788
|
|
- `sample_versions(start_date, sample_id)` index is a write-path tradeoff.
|
|
789
|
|
-9. Investigate replacing legacy compact `recordArchiveData` delta rebuild with
|
|
|
795
|
+8. Run a repeated full-profile capture after counting observation events from
|
|
|
796
|
+ the event table first. Compare `SummedFinalizeEventCountElapsed` and per-type
|
|
|
797
|
+ event-count time against the post-`bf5a861` overnight baseline: `2.5s` total,
|
|
|
798
|
+ Heart Rate `0.9s`, and Cycling Distance `0.7s`.
|
|
|
799
|
+9. Keep watching first-import insert timing on the next clean large-database
|
|
|
800
|
+ import because the new `sample_versions(start_date, sample_id)` index from
|
|
|
801
|
+ `bf5a861` is a write-path tradeoff.
|
|
|
802
|
+10. Investigate replacing legacy compact `recordArchiveData` delta rebuild with
|
|
790
|
803
|
a SQLite-derived capture-state/hash path. The current repeated full-profile
|
|
791
|
804
|
reports still spend about `4s` processing Heart Rate for tiny deltas because
|
|
792
|
805
|
the Swift compact archive is decoded and rewritten for the whole 900k-row
|
|
793
|
806
|
type.
|
|
794
|
|
-10. Investigate full-profile empty anchored-query cost for zero-count types.
|
|
|
807
|
+11. Investigate full-profile empty anchored-query cost for zero-count types.
|
|
795
|
808
|
Compare slow empty types across reports before changing behavior; any skip or
|
|
796
|
809
|
lower-frequency strategy must preserve the promise that full authorized
|
|
797
|
810
|
backup can notice newly appearing data.
|
|
798
|
|
-11. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
|
|
799
|
|
-12. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
|
|
800
|
|
-13. Profile whether index maintenance dominates first-import insert cost.
|
|
801
|
|
-14. Consider a guarded bulk-import mode for first observations:
|
|
|
811
|
+12. Run a non-chain-start/full-scan benchmark after skipping unchanged `verified` events and fast-pathing already-open visibility ranges. Compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, `Steps insertElapsed`, and `Walking + Running Distance insertElapsed`.
|
|
|
812
|
+13. Reduce any remaining per-sample SQLite writes for unchanged existing samples during non-chain-start full scans.
|
|
|
813
|
+14. Profile whether index maintenance dominates first-import insert cost.
|
|
|
814
|
+15. Consider a guarded bulk-import mode for first observations:
|
|
802
|
815
|
- keep archive semantics unchanged;
|
|
803
|
816
|
- only relax work that can be safely reconstructed or validated;
|
|
804
|
817
|
- re-enable normal idempotent paths for incremental observations.
|
|
805
|
|
-15. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
|
|
806
|
|
-16. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
|
|
807
|
|
-17. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
|
|
808
|
|
-18. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
|
|
|
818
|
+16. Run a fresh first-import benchmark after the unused-index removal and compare `SummedInsertElapsed`, `Heart Rate insertElapsed`, and `Active Energy insertElapsed`.
|
|
|
819
|
+17. Investigate whether first-import-only deferred index creation or temporary staging tables can reduce `samples` / `sample_versions` / `sample_observation_events` write cost without weakening final archive integrity.
|
|
|
820
|
+18. Revisit adaptive page sizes only after SQLite write-path costs are reduced.
|
|
|
821
|
+19. Revisit background / scheduled collection once initial import can finish reliably and post-import UI recovery is bounded.
|
|
809
|
822
|
|
|
810
|
823
|
## Verification Checklist For Each Optimization
|
|
811
|
824
|
|