Skip to content

Design Decisions

This document explains the key architectural choices behind Mirach.

Local-first STT and TTS, cloud LLM

Mirach keeps speech-to-text and text-to-speech local (Whisper and Piper) but uses a cloud LLM via OpenCode. This balances:

  • Privacy: Voice recordings never leave your machine. Only the transcribed text is sent to the LLM.
  • Speed: Local STT and TTS have ~0.5s latency each, independent of network conditions.
  • Intelligence: Cloud LLMs provide capabilities that local models can't match (yet).

For fully local operation, you can point OpenCode to a local Ollama instance.

Daemon architecture

Models stay in RAM permanently. The alternative — loading Whisper and Piper on every invocation — would add 3-7 seconds of cold-start latency per interaction.

Single hotkey, three states

One key for record, process, and interrupt. This matches how people actually use voice assistants: you want to interrupt a wrong response and immediately say something different, without reaching for a second button.

Whisper medium + int8

The original setup used large-v3-turbo with float16 (~2.3 GB VRAM). The switch to medium with int8 (~700 MB VRAM) trades 0.2s of transcription speed for 1.6 GB of VRAM savings — a worthwhile trade since the LLM call takes seconds anyway.

User scripts over hardcoded patterns

Custom voice commands live in a gitignored user_scripts/ directory with metadata comments. This makes triggers personal (not in the repo), extensible (just add a file), and self-documenting.

Progressive feedback

During long LLM calls, escalating feedback (fillers → notifications → "still working" messages) prevents users from thinking the daemon has crashed. Fillers are pre-baked as WAV files for zero-latency playback.

Markdown conversation logs

Conversations are saved as Markdown, not JSON or a database. This makes them human-readable, git-friendly, and compatible with any Markdown tool. The latest.md symlink provides quick access.

Session persistence

The OpenCode session ID is saved to disk so conversations survive daemon restarts and system reboots. A 1-hour idle timeout ensures stale sessions eventually expire.