v1.0 · LOCAL · PRIVATE

Your computer,
on voice command.

JARVIS is a local-first AI desktop assistant for macOS. Speak naturally — it opens apps, runs commands, browses the web, controls your mouse and keyboard, and remembers what matters. No cloud. No subscription. Your data stays on your machine.

Get Started How it works

tools

100%

local

<1s

first token

● LISTENING

jarvis, open notes…

⟶ open_app

app="Notes"

SCROLL ↓

ARCHITECTURE

How it works

One voice loop, one local LLM, twenty-nine tools. Every component runs on your machine — the network is optional.

You speak

A continuous VAD listener captures your voice. faster-whisper transcribes it locally — no audio leaves your machine.

Ollama thinks

A local LLM (qwen3.5, llama3.1, etc.) reads your request through LangChain, with the full set of tools bound for tool-calling.

Tools run

The model picks tools — open Safari, run a shell command, search the web, click your mouse — and chains them as needed.

It speaks back

Streaming response renders character-by-character in the UI; macOS say speaks each sentence as it arrives. Self-mute prevents echo.

DATA · FLOW

mic ──▶ VAD ──▶ faster-whisper ──▶ LangChain ──▶ Ollama
                                                │
                                                ▼
                                            tool-calls (29 tools)
                                                │
                                                ▼
streaming text ──▶ macOS say ──▶ you

CAPABILITIES

Twenty-nine tools. One voice.

Tools are exposed to the LLM via LangChain's tool-calling. The model picks and chains them as needed.

Web search & browsing

DuckDuckGo, page fetch, content extraction.

App control

Open, quit, switch, list — via AppleScript.

Shell execution

Run any command. Confirms before destructive ops.

File system

List, read, write, find — anywhere on disk.

Keyboard & mouse

Type text, press hotkeys, click, drag — full automation.

Screenshots

Full screen or interactive selection, saved to disk.

System control

Volume, mute, brightness, lock, sleep, battery.

Notifications

Native macOS banners from inside any task.

Clipboard

Read and write the pasteboard at will.

Persistent memory

Remember facts across sessions — by key.

100% local

Audio, prompts, and data never leave your Mac.

Streaming TTS

Speaks each sentence the instant the LLM finishes it.

DEMO

See it in action

A real session. Voice in, tool calls out, streamed reply, spoken response.

JARVIS · localhost

LISTENING

Hey jarvis, what's my battery and disk space looking like?

→ get_battery()

→ run_shell(command="df -h /")

Battery is at 82%, charging. Your disk has 312GB free of 500GB.

Open Safari and search for the weather in Mumbai

→ open_app(app_name="Safari")

→ web_search(query="weather mumbai today")

Mumbai is 31°C, partly cloudy. Safari is open.

STACK

Built with the good stuff

Modular by design. Swap the model, swap the STT, swap the UI — the agent core stays the same.

01Python 3.14

Runtime

02LangChain

Agent + tool routing

03Ollama

Local LLM serving

04qwen3.5

Tool-calling model

05faster-whisper

Speech-to-text

06macOS say

Text-to-speech

07sounddevice

Audio + VAD

08pywebview

Native desktop UI

09AppleScript

App automation

10pyautogui

Keyboard / mouse

11DuckDuckGo

Web search

12BeautifulSoup

Page parsing

QUICKSTART

Up and running in four steps

macOS only for now. Linux and Windows are on the roadmap.

01Clone & enter

git clone https://github.com/sumitkumarraju/jarvis
cd jarvis

02Install Python deps

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

03Pull a tool-calling model

brew install ollama && ollama serve &
ollama pull qwen3.5

04Launch the app

./jarvis     # or: python main.py

⚡

Then just say it.

“jarvis, take a screenshot and put it on the desktop” — that's the whole interface.

Your computer,on voice command.