NOTICE

Roadmap

A multi-device biometric awareness app for iPhone — training interoceptive awareness through contemplative AI reflection.

Last updated March 4, 2026

← Notice

01 What's Shipped

The working product

Notice is in closed beta via TestFlight with 8+ testers from the Jhourney contemplative community. The core loop works end-to-end: tap your Apple Watch, Garmin, or phone when you notice a shift → the app captures biometric context from whichever device delivered the data → you debrief with a felt-sense emotion picker → Claude generates a contemplative reflection grounded in your patterns. Oura Ring feeds overnight baseline context automatically — enriching the reflection without requiring a tap.

Fifteen development sessions have produced:

The core snap-debrief-reflection loop. Watch tap or phone floating action button captures a Frame Snap with heart rate and HRV from HealthKit. The phone FAB (iOS 26 liquid glass) means you don't need a Watch to use Notice — it's a full-loop experience on iPhone alone. Debrief screen features a description-first layout and a two-layer felt-sense picker organized by somatic texture (Alive, Settled, Open, Heavy, Stirred, Tight — six groups × three emotion labels each, grounded in Gendlin's Focusing). Claude streams a contemplative reflection using a system prompt built on Jhourney pedagogy — orienting toward how you're relating to experience, never prescribing what to feel.

Three tiers of AI reflection. Brief reflections at snap time (one sentence, oriented toward conductivity). Exploratory reflections during debrief (a paragraph, oriented toward curiosity). Daily and weekly synthesis reflections that surface longitudinal patterns across snaps — built hierarchically so weekly reflections consume daily syntheses rather than re-aggregating raw data.

On-device intelligence via Apple Foundation Models. A two-tier AI architecture: Tier 1 (Apple Foundation Models, on-device) handles context assembly — reading HealthKit trends, calendar, location, recent snaps, and producing a structured summary that strips all absolute values and identifying details. Tier 2 (Claude via cloud) sees only those summaries. The on-device interpreter runs in under a second. Context assembly completes in under three seconds. Nothing raw leaves the phone.

Voice-initiated snaps. Siri and AirPods integration for hands-free capture. “Hey Siri, I noticed something” triggers a Frame Snap without looking at a screen — critical for capturing shifts during activities where pulling out a phone breaks the moment.

Stateless API proxy. A Cloudflare Worker sits between the app and the Claude API, holding the API key server-side and validating device identity via Apple's App Attest before forwarding any request. No on-device key storage, rotation without app updates, per-device rate limiting.

Privacy architecture as compliance strategy. Privacy in a multi-device system is about what path data traveled, not a binary on/off. Apple Watch and Garmin data stays entirely on-device — raw biometrics never leave the phone. Oura data transits Oura’s cloud (ring → Oura app → Oura Cloud → REST API → iPhone) but never touches Notice infrastructure — the fetch is client-side. All paths converge to the same de-identified format before reaching Claude. Only structured summaries transit through the proxy. This isn’t just a privacy choice — it’s a regulatory strategy that keeps Notice within the FDA’s General Wellness Guidance and limits FTC Health Breach Notification Rule exposure.

Multi-device biometric integration. Three hardware ecosystems integrated through a single BiometricSnapshot protocol. Apple Watch captures snap-time HR and HRV (SDNN) via HealthKit and WatchConnectivity. Garmin Enduro 3 captures snap-time HR, HRV (RMSSD), stress, and Body Battery via the Connect IQ Companion SDK over BLE — topologically identical to Apple Watch, mechanically different. Oura Ring 3 provides overnight baseline context (nighttime RMSSD, sleep stages, readiness scores) via cloud REST API with OAuth authentication — a different trust topology that feeds the baseline context path rather than the snap-time biometric path.

Key architectural decisions. BiometricSnapshot normalizes all three sources into a single value type — downstream consumers never know which device delivered the data. SDNN and RMSSD are tracked as separate fields with separate relative descriptor functions, preventing the category error of comparing metrics that measure different physiological signals. Garmin button-press timestamps are preserved through the pipeline (converted from Garmin epoch) — the snap happened when the user pressed the button, not when the phone received the BLE packet. Relative descriptor functions (relativeHRV, relativeHrvRMSSD, relativeStressScore, relativeBodyBattery) strip absolute values before data reaches Claude. The full architectural analysis is in Trust Topologies.

02 Now

Immediate priorities

On-Device Reflection Model

The highest-leverage technical milestone. Moving .brief reflections on-device eliminates the largest cost center (~80% of API calls), makes the Core pricing tier viable at zero marginal cost, and delivers the privacy promise in its fullest form.

Runtime reality. MLX is currently blocked for 3B models on iPhone due to memory overhead (~15 GB for Llama 3.2 3B 4-bit vs. llama.cpp's ~3.67 GB). Two viable paths: llama.cpp for 3B models (4–8 second generation, higher quality) or MLX for 1B–1.7B models (under 2 seconds, lower ceiling).

Model candidates. SmolLM3-3B is the leading candidate — purpose-built for on-device, strong instruction-following, Apache 2.0. Llama 3.2 3B is the safe default. Qwen3 1.7B for the MLX/small-model path. Recommendation: benchmark SmolLM3-3B via llama.cpp against Qwen3 1.7B via MLX on target hardware.

Training pipeline. Teacher-student distillation from Claude API. Target: 1,200 reflection examples plus 150 correction examples that demonstrate constraint boundaries. The correction examples are critical — LoRA fine-tuning on domain-specific output degrades general instruction-following without them.

Hybrid routing. .brief reflections: on-device primary, cloud fallback — covering snaps from all three device sources (Apple Watch, Garmin, Oura baseline). .exploratory: cloud primary for now. .daily and .weekly synthesis: always cloud. On-device .brief covers ~80% of API calls.

Three-tier evaluation. Tier 1: automated constraint gate (no diagnostic language, no raw biometric values, no prescriptive framing). Tier 2: LLM-as-judge scoring relational orientation, phenomenological precision, novelty, tone. Tier 3: blind A/B with experienced practitioners. Ship threshold: Tier 1 >99% pass, Tier 2 within 15% of Claude baseline, Tier 3 preference >40%.

Foundation Models Integration

Apple's on-device Foundation Models need hardware validation. The interpreter should complete in under one second, context assembly in under three seconds — on physical devices, not simulators. If latency exceeds budget, fallback to direct framework calls is straightforward. Adapted Tool infrastructure exists for HealthKit, Calendar, Location, and recent snap retrieval. The key unknown: how the interpreter performs with complex multi-source assembly instructions on A17/A18 silicon under real memory pressure from HealthKit background delivery. Multi-device support makes this more relevant — the context window now includes data from three devices with different temporal characteristics (snap-time from Watch/Garmin, overnight baseline from Oura).

Beta Support and Feedback

Structured feedback capture. TestFlight's built-in mechanism loses context. A lightweight in-app mechanism (shake to report, or a prompt after the 5th snap) captures context-rich feedback at the moment of use. The taxonomy resonance question — “which words do you actually reach for?” — is both a research question and an explicit feedback prompt.

Tester segmentation. Not all testers are the same. Experienced meditators push the felt-sense vocabulary hard; newer practitioners surface onboarding friction. A simple tracking table (practice background, device/Watch pairing, Apple Intelligence availability) lets you interpret feedback correctly.

Lapsed tester outreach. The most valuable beta data isn't what active users do — it's why people stop. A simple email (not push notification) to lapsed testers surfaces the mundane friction that kills adoption. One lapsed-tester interview is worth twenty active-user feature requests.

Bug reproduction context. A lightweight diagnostic log (stored locally, shared only on user-initiated report) capturing app state transitions and error codes — never snap content, emotion labels, or biometric data.

Engineering Resilience

Degraded and offline behavior. What happens when the Claude API is unreachable? When HealthKit returns no recent samples? When the Watch disconnects mid-snap? Snaps must capture and persist regardless. Each failure mode needs an explicit design: no-network (queue reflections), no-HealthKit-data (snap without biometrics), Watch-disconnect (phone-side snap still works), API-error (retry with backoff), Garmin-BLE-disconnect (snap persists on the watch and retransmits on reconnection), Oura-API-unavailable (baseline context is stale but snaps still work — the system degrades gracefully because Oura feeds context, not the core loop).

API cost control. Per-user usage budgets, a soft daily cap on exploratory reflections, batching daily synthesis to a single API call, and monitoring token usage per user during beta to establish the cost curve before setting prices.

Crash reporting under privacy constraints. Most third-party crash reporting SDKs are risky under the FTC Health Breach Notification Rule. Apple's built-in crash reports via Xcode Organizer are the path of least resistance — no third-party SDK, data stays within Apple's ecosystem. MetricKit for performance diagnostics. No third-party analytics or crash reporting SDKs unless they can be proven to never exfiltrate health-adjacent data.

SwiftData schema migration planning. Document the expected schema evolution now — future features will require new model fields and relationships. SwiftData lightweight migrations handle additive changes, but anything more complex needs explicit migration plans.

API proxy and key management (resolved). Cloudflare Worker proxy with App Attest attestation. Eliminates all on-device key storage, enables rotation without app updates, provides per-device rate limiting and abuse detection, adds a server-side kill switch.

03 Later

On the horizon

Deepening the Reflection Layer

Scaffolding decay. The central design challenge beyond the MVP. Three phases: Full support (reflections after every snap, active suggestions, full biometric context), Reduced (reflections on-demand, simplified biometric display, suggestions fade), Minimal (no automatic reflections, the app becomes a quiet archive). Phase transitions triggered by behavioral signals — snap count thresholds, label diversity ceiling, biometric-label convergence — and confirmed by the user. The app never decides for you that you're ready.

Snap depth calibration. Claude adapts reflection depth based on accumulated data density. Three tiers: Sparse (1–3 snaps, no pattern claims), Thin (4–15 snaps, tentative pattern observations), Rich (20+, full vocabulary). Prevents the most common failure mode: making confident claims about patterns that don't exist in the data.

Biometric-label divergence detection. When the user's subjective label diverges from biometric data, the gap itself is information. Claude treats the user's felt sense as primary and frames the biometric data as a mirror. Over time, divergence patterns may reveal interoceptive blind spots or growing precision.

Dam Model pattern detection. Claude's system prompt includes vocabulary for suppression-explosion oscillation, narrow emotional range, absence patterns, and rigidity. The deeper work is calibrating sensitivity.

Memory reconsolidation support. Notice shouldn't guide active reconsolidation — that's clinical work. But the app may quietly build prerequisite capacities: mapping dams, developing relational stance, training felt-sense precision.

View interventions. Beyond frame-spotting, Claude can surface the meta-level view users bring to the practice itself. Detection signals: asymmetric emotion distribution, fix/solve/manage language in notes, absence of snaps during neutral states.

Expanding the Emotion Taxonomy

Custom emotion labels. The 18-word taxonomy is scaffolding designed to eventually feel inadequate. When users start adding notes that contradict their label, that's the decay signal. Custom labels let advanced practitioners add their own vocabulary — contemplative-specific terms like pīti, vedanā, somatic descriptors. Tagged as user-generated in the data model so Claude can handle them.

UI/UX Evolution

Floating emotion labels — the lazy river. The current grid treats emotion labeling as a selection task. The contemplative reframe: labeling is a noticing task. Labels float across the screen in a gentle drift. The user watches and taps when one resonates. Based on Shinzen Young's See Hear Feel noting system. The philosophical difference: the grid says “what are you feeling? choose one.” The lazy river says “what do you notice passing through?”

Buttons as first-class interaction. The debrief screen should lead with tappable elements, not text entry. Emotion picker as hero, note field secondary.

Design language. Warm, reflective, unhurried. Type scale with serif for reflections (contemplative voice) and sans for labels (instrument voice). Dark mode that feels like candlelight, not a dashboard.

Debrief screen evolution. Three horizons: near-term (fix keyboard default, make emotion picker hero), mid-term (lazy river, subtle texture-group animations), long-term (debrief as contemplative micro-ritual).

Measuring What Matters

Interoceptive lead time. The single most innovative metric Notice could surface. The temporal gap between biometric shift and conscious noticing is measurable and trainable. Shrinking that gap is interoceptive development. Multi-device support strengthens this metric: with both SDNN (Apple Watch) and RMSSD (Garmin/Oura) tracked as separate fields, the system has richer physiological signal diversity for detecting shifts — and temporal anchoring ensures that the “when did you notice” timestamp is faithful to the first-person act regardless of which device delivered the biometric data. This gives Notice: a training outcome that isn't a score, a scaffolding decay signal grounded in what the app actually trains, a data moat no competitor can replicate, and a research contribution validatable against MAIA-2 assessments.

Outcome measurement. Behavioral proxies already implicit in the data: emotion label diversity, snap frequency patterns, biometric-label convergence. Formal validated assessments (MAIA-2, FFMQ-15) as optional periodic check-ins.

System-Initiated Awareness

Passive biometric anomaly detection. When background HRV monitoring detects a significant shift, surface a Smart Stack widget: “Your body shifted — did you notice something?” Philosophically delicate — the app trains self-initiated noticing, not dependency on prompts.

Anticipatory State Navigation

Bidirectional awareness. Longitudinal snap data combined with calendar context enables a prospective mode. Pattern-informed prompts based on the user's own snap history — not generic calendar-triggered wellness reminders. The user learns to read their own anticipatory body states. Scaffolding decay applies with particular force here: if the user develops the capacity to feel upcoming shifts independently, the prompts should reduce and stop.