Your computer,
on voice command.
JARVIS is a local-first AI desktop assistant for macOS. Speak naturally — it opens apps, runs commands, browses the web, controls your mouse and keyboard, and remembers what matters. No cloud. No subscription. Your data stays on your machine.
How it works
One voice loop, one local LLM, twenty-nine tools. Every component runs on your machine — the network is optional.
You speak
A continuous VAD listener captures your voice. faster-whisper transcribes it locally — no audio leaves your machine.
Ollama thinks
A local LLM (qwen3.5, llama3.1, etc.) reads your request through LangChain, with the full set of tools bound for tool-calling.
Tools run
The model picks tools — open Safari, run a shell command, search the web, click your mouse — and chains them as needed.
It speaks back
Streaming response renders character-by-character in the UI; macOS say speaks each sentence as it arrives. Self-mute prevents echo.
mic ──▶ VAD ──▶ faster-whisper ──▶ LangChain ──▶ Ollama │ ▼ tool-calls (29 tools) │ ▼ streaming text ──▶ macOS say ──▶ you
Twenty-nine tools. One voice.
See it in action
A real session. Voice in, tool calls out, streamed reply, spoken response.
Built with the good stuff
Modular by design. Swap the model, swap the STT, swap the UI — the agent core stays the same.
Up and running in four steps
macOS only for now. Linux and Windows are on the roadmap.
git clone https://github.com/sumitkumarraju/jarvis
cd jarvispython3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtbrew install ollama && ollama serve &
ollama pull qwen3.5./jarvis # or: python main.py