Roadmap
A multi-device biometric awareness app for iPhone — training interoceptive awareness through contemplative AI reflection.
The working product
Notice is in closed beta via TestFlight with 8+ testers from the Jhourney contemplative community. The core loop works end-to-end: tap your Apple Watch, Garmin, or phone when you notice a shift → the app captures biometric context from whichever device delivered the data → you debrief with a felt-sense emotion picker → Claude generates a contemplative reflection grounded in your patterns. Oura Ring feeds overnight baseline context automatically — enriching the reflection without requiring a tap.
Fifteen development sessions have produced:
The core snap-debrief-reflection loop. Watch tap or phone floating action button captures a Frame Snap with heart rate and HRV from HealthKit. The phone FAB (iOS 26 liquid glass) means you don't need a Watch to use Notice — it's a full-loop experience on iPhone alone. Debrief screen features a description-first layout and a two-layer felt-sense picker organized by somatic texture (Alive, Settled, Open, Heavy, Stirred, Tight — six groups × three emotion labels each, grounded in Gendlin's Focusing). Claude streams a contemplative reflection using a system prompt built on Jhourney pedagogy — orienting toward how you're relating to experience, never prescribing what to feel.
Three tiers of AI reflection. Brief reflections at snap time (one sentence, oriented toward conductivity). Exploratory reflections during debrief (a paragraph, oriented toward curiosity). Daily and weekly synthesis reflections that surface longitudinal patterns across snaps — built hierarchically so weekly reflections consume daily syntheses rather than re-aggregating raw data.
On-device intelligence via Apple Foundation Models. A two-tier AI architecture: Tier 1 (Apple Foundation Models, on-device) handles context assembly — reading HealthKit trends, calendar, location, recent snaps, and producing a structured summary that strips all absolute values and identifying details. Tier 2 (Claude via cloud) sees only those summaries. The on-device interpreter runs in under a second. Context assembly completes in under three seconds. Nothing raw leaves the phone.
Voice-initiated snaps. Siri and AirPods integration for hands-free capture. “Hey Siri, I noticed something” triggers a Frame Snap without looking at a screen — critical for capturing shifts during activities where pulling out a phone breaks the moment.
Stateless API proxy. A Cloudflare Worker sits between the app and the Claude API, holding the API key server-side and validating device identity via Apple's App Attest before forwarding any request. No on-device key storage, rotation without app updates, per-device rate limiting.
Privacy architecture as compliance strategy. Privacy in a multi-device system is about what path data traveled, not a binary on/off. Apple Watch and Garmin data stays entirely on-device — raw biometrics never leave the phone. Oura data transits Oura’s cloud (ring → Oura app → Oura Cloud → REST API → iPhone) but never touches Notice infrastructure — the fetch is client-side. All paths converge to the same de-identified format before reaching Claude. Only structured summaries transit through the proxy. This isn’t just a privacy choice — it’s a regulatory strategy that keeps Notice within the FDA’s General Wellness Guidance and limits FTC Health Breach Notification Rule exposure.
Multi-device biometric integration. Three hardware ecosystems integrated through a single BiometricSnapshot protocol. Apple Watch captures snap-time HR and HRV (SDNN) via HealthKit and WatchConnectivity. Garmin Enduro 3 captures snap-time HR, HRV (RMSSD), stress, and Body Battery via the Connect IQ Companion SDK over BLE — topologically identical to Apple Watch, mechanically different. Oura Ring 3 provides overnight baseline context (nighttime RMSSD, sleep stages, readiness scores) via cloud REST API with OAuth authentication — a different trust topology that feeds the baseline context path rather than the snap-time biometric path.
Key architectural decisions. BiometricSnapshot normalizes all three sources into a single value type — downstream consumers never know which device delivered the data. SDNN and RMSSD are tracked as separate fields with separate relative descriptor functions, preventing the category error of comparing metrics that measure different physiological signals. Garmin button-press timestamps are preserved through the pipeline (converted from Garmin epoch) — the snap happened when the user pressed the button, not when the phone received the BLE packet. Relative descriptor functions (relativeHRV, relativeHrvRMSSD, relativeStressScore, relativeBodyBattery) strip absolute values before data reaches Claude. The full architectural analysis is in Trust Topologies.
Immediate priorities
On-Device Reflection Model
The highest-leverage technical milestone. Moving .brief reflections on-device eliminates the largest cost center (~80% of API calls), makes the Core pricing tier viable at zero marginal cost, and delivers the privacy promise in its fullest form.
Runtime reality. MLX is currently blocked for 3B models on iPhone due to memory overhead (~15 GB for Llama 3.2 3B 4-bit vs. llama.cpp's ~3.67 GB). Two viable paths: llama.cpp for 3B models (4–8 second generation, higher quality) or MLX for 1B–1.7B models (under 2 seconds, lower ceiling).
Model candidates. SmolLM3-3B is the leading candidate — purpose-built for on-device, strong instruction-following, Apache 2.0. Llama 3.2 3B is the safe default. Qwen3 1.7B for the MLX/small-model path. Recommendation: benchmark SmolLM3-3B via llama.cpp against Qwen3 1.7B via MLX on target hardware.
Training pipeline. Teacher-student distillation from Claude API. Target: 1,200 reflection examples plus 150 correction examples that demonstrate constraint boundaries. The correction examples are critical — LoRA fine-tuning on domain-specific output degrades general instruction-following without them.
Hybrid routing. .brief reflections: on-device primary, cloud fallback — covering snaps from all three device sources (Apple Watch, Garmin, Oura baseline). .exploratory: cloud primary for now. .daily and .weekly synthesis: always cloud. On-device .brief covers ~80% of API calls.
Three-tier evaluation. Tier 1: automated constraint gate (no diagnostic language, no raw biometric values, no prescriptive framing). Tier 2: LLM-as-judge scoring relational orientation, phenomenological precision, novelty, tone. Tier 3: blind A/B with experienced practitioners. Ship threshold: Tier 1 >99% pass, Tier 2 within 15% of Claude baseline, Tier 3 preference >40%.
Foundation Models Integration
Apple's on-device Foundation Models need hardware validation. The interpreter should complete in under one second, context assembly in under three seconds — on physical devices, not simulators. If latency exceeds budget, fallback to direct framework calls is straightforward. Adapted Tool infrastructure exists for HealthKit, Calendar, Location, and recent snap retrieval. The key unknown: how the interpreter performs with complex multi-source assembly instructions on A17/A18 silicon under real memory pressure from HealthKit background delivery. Multi-device support makes this more relevant — the context window now includes data from three devices with different temporal characteristics (snap-time from Watch/Garmin, overnight baseline from Oura).
Beta Support and Feedback
Structured feedback capture. TestFlight's built-in mechanism loses context. A lightweight in-app mechanism (shake to report, or a prompt after the 5th snap) captures context-rich feedback at the moment of use. The taxonomy resonance question — “which words do you actually reach for?” — is both a research question and an explicit feedback prompt.
Tester segmentation. Not all testers are the same. Experienced meditators push the felt-sense vocabulary hard; newer practitioners surface onboarding friction. A simple tracking table (practice background, device/Watch pairing, Apple Intelligence availability) lets you interpret feedback correctly.
Lapsed tester outreach. The most valuable beta data isn't what active users do — it's why people stop. A simple email (not push notification) to lapsed testers surfaces the mundane friction that kills adoption. One lapsed-tester interview is worth twenty active-user feature requests.
Bug reproduction context. A lightweight diagnostic log (stored locally, shared only on user-initiated report) capturing app state transitions and error codes — never snap content, emotion labels, or biometric data.
Engineering Resilience
Degraded and offline behavior. What happens when the Claude API is unreachable? When HealthKit returns no recent samples? When the Watch disconnects mid-snap? Snaps must capture and persist regardless. Each failure mode needs an explicit design: no-network (queue reflections), no-HealthKit-data (snap without biometrics), Watch-disconnect (phone-side snap still works), API-error (retry with backoff), Garmin-BLE-disconnect (snap persists on the watch and retransmits on reconnection), Oura-API-unavailable (baseline context is stale but snaps still work — the system degrades gracefully because Oura feeds context, not the core loop).
API cost control. Per-user usage budgets, a soft daily cap on exploratory reflections, batching daily synthesis to a single API call, and monitoring token usage per user during beta to establish the cost curve before setting prices.
Crash reporting under privacy constraints. Most third-party crash reporting SDKs are risky under the FTC Health Breach Notification Rule. Apple's built-in crash reports via Xcode Organizer are the path of least resistance — no third-party SDK, data stays within Apple's ecosystem. MetricKit for performance diagnostics. No third-party analytics or crash reporting SDKs unless they can be proven to never exfiltrate health-adjacent data.
SwiftData schema migration planning. Document the expected schema evolution now — future features will require new model fields and relationships. SwiftData lightweight migrations handle additive changes, but anything more complex needs explicit migration plans.
API proxy and key management (resolved). Cloudflare Worker proxy with App Attest attestation. Eliminates all on-device key storage, enables rotation without app updates, provides per-device rate limiting and abuse detection, adds a server-side kill switch.