Case studies

Proof-of-Work Writeups.

Short technical writeups for the Modus product systems: what was hard, what was built, what ran, and where each system points next.

Public product: Modus Verify

Modus Verify: verifier-guided RL for proof generation

Problem: Reasoning models need reward signals that track correctness, not surface fluency.
System: A Lean-backed training loop using verifier feedback as the reward signal for proof generation.
Technical Core: GRPO/RL over theorem tasks with Lean reward oracles, trajectory analysis, tactic monitoring, and curriculum design.
Evidence: A controlled run improved proof success from 2% to 72% using binary Lean verification reward.
Next Direction: Generalize verifier-guided learning from proofs into code-generation and agent-task environments.

Public product: Modus Memory

Problem: Agents lose context when work spans files, terminal output, screenshots, documents, and build state.
System: A local workstate layer that indexes artifacts and emits structured events from screens, OCR, terminals, builds, files, and processes.
Technical Core: Append-only event storage, artifact indexes, rollback-safe ledgers, OCR caches, local memory search, and bounded capture.
Evidence: The substrate already supports organizer scans, safe moves, OCR, memory search, screen events, and workstate ingestion paths.
Next Direction: Connect workstate timelines directly to agent evaluation, debugging, and task-generation loops.

Public product: Modus Sentinel

Problem: Long-running agents need observation, routing, failure capture, and governance around tool use and session state.
System: A harness that records events, routes them through modes, and supports review, coaching, and loop-worker missions.
Technical Core: Project-scoped watch hooks, event routing, mode queues, harness separation, failure logs, and conservative injection rules.
Evidence: The harness has been wired across Claude and Codex with verified project-scoped delivery and explicit watch semantics.
Next Direction: Productize the capability as agent evaluation infrastructure rather than exposing harness internals.

Product line: Modus Workbench

Problem: Local agents are powerful but hard to monitor and steer away from the machine running them.
System: A human-facing iPhone and Mac companion runtime for pairing, monitoring, and controlling local agent sessions.
Technical Core: Bonjour pairing, Mac runtime, iOS app, Watch and widget surfaces, live status, and local control flows.
Evidence: The product has app, Mac helper, release infrastructure, TestFlight work, and on-device validation history.
Next Direction: Position Pairling as the approachable app inside Modus Workbench, powered by the deeper evaluation and workstate platform.

Public product: Modus Capture

Problem: Important product and research intent is often trapped in meetings, voice notes, and transcripts.
System: Local-first capture that turns audio into transcripts, structured events, reports, and agent handoffs.
Technical Core: On-device recording, Whisper/MLX transcription paths, event storage, bridge receivers, and dispatcher workflows.
Evidence: The existing systems cover iOS capture, Mac bridge flows, local transcription, and high-speed remote transcription paths.
Next Direction: Fold meeting capture into the broader agent workstate and task-generation layer.