v1.0 · LOCAL · PRIVATE

Your computer,
on voice command.

JARVIS is a local-first AI desktop assistant for macOS. Speak naturally — it opens apps, runs commands, browses the web, controls your mouse and keyboard, and remembers what matters. No cloud. No subscription. Your data stays on your machine.

29
tools
100%
local
<1s
first token
SCROLL ↓
ARCHITECTURE

How it works

One voice loop, one local LLM, twenty-nine tools. Every component runs on your machine — the network is optional.

01

You speak

A continuous VAD listener captures your voice. faster-whisper transcribes it locally — no audio leaves your machine.

02

Ollama thinks

A local LLM (qwen3.5, llama3.1, etc.) reads your request through LangChain, with the full set of tools bound for tool-calling.

03

Tools run

The model picks tools — open Safari, run a shell command, search the web, click your mouse — and chains them as needed.

04

It speaks back

Streaming response renders character-by-character in the UI; macOS say speaks each sentence as it arrives. Self-mute prevents echo.

DATA · FLOW
mic ──▶ VAD ──▶ faster-whisper ──▶ LangChain ──▶ Ollama
                                                
                                                
                                            tool-calls (29 tools)
                                                
                                                
streaming text ──▶ macOS say ──▶ you
CAPABILITIES

Twenty-nine tools. One voice.

Tools are exposed to the LLM via LangChain's tool-calling. The model picks and chains them as needed.
Web search & browsing
DuckDuckGo, page fetch, content extraction.
App control
Open, quit, switch, list — via AppleScript.
Shell execution
Run any command. Confirms before destructive ops.
File system
List, read, write, find — anywhere on disk.
Keyboard & mouse
Type text, press hotkeys, click, drag — full automation.
Screenshots
Full screen or interactive selection, saved to disk.
System control
Volume, mute, brightness, lock, sleep, battery.
Notifications
Native macOS banners from inside any task.
Clipboard
Read and write the pasteboard at will.
Persistent memory
Remember facts across sessions — by key.
100% local
Audio, prompts, and data never leave your Mac.
Streaming TTS
Speaks each sentence the instant the LLM finishes it.
DEMO

See it in action

A real session. Voice in, tool calls out, streamed reply, spoken response.

JARVIS · localhost
Hey jarvis, what's my battery and disk space looking like?
→ get_battery()
→ run_shell(command="df -h /")
Battery is at 82%, charging. Your disk has 312GB free of 500GB.
Open Safari and search for the weather in Mumbai
→ open_app(app_name="Safari")
→ web_search(query="weather mumbai today")
Mumbai is 31°C, partly cloudy. Safari is open.
STACK

Built with the good stuff

Modular by design. Swap the model, swap the STT, swap the UI — the agent core stays the same.

01Python 3.14
Runtime
02LangChain
Agent + tool routing
03Ollama
Local LLM serving
04qwen3.5
Tool-calling model
05faster-whisper
Speech-to-text
06macOS say
Text-to-speech
07sounddevice
Audio + VAD
08pywebview
Native desktop UI
09AppleScript
App automation
10pyautogui
Keyboard / mouse
11DuckDuckGo
Web search
12BeautifulSoup
Page parsing
QUICKSTART

Up and running in four steps

macOS only for now. Linux and Windows are on the roadmap.

01Clone & enter
git clone https://github.com/sumitkumarraju/jarvis
cd jarvis
02Install Python deps
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
03Pull a tool-calling model
brew install ollama && ollama serve &
ollama pull qwen3.5
04Launch the app
./jarvis     # or: python main.py
Then just say it.
“jarvis, take a screenshot and put it on the desktop” — that's the whole interface.