Local AI agent that sees and controls your desktop
A fully local AI system that uses Ollama (Mistral 7B for reasoning + LLaVA 13B for vision) to understand what's on your screen and take actions via keyboard and mouse automation. No cloud APIs — runs entirely on your GPU with a hardware kill switch for safety.
Dual-model architecture: Mistral 7B for reasoning/planning, LLaVA 13B for visual understanding
Screen capture and analysis via mss + Pillow, fed to LLaVA for scene description
Keyboard/mouse automation through pyautogui with safety-filtered action execution
Hardware kill switch — physical key combo to instantly halt all automation
Command audit trail logging every action taken for accountability
Fully local — zero cloud dependencies, runs on consumer GPU (RTX 3090 Ti)
Event loop with prompt_toolkit CLI. User issues natural language commands. Orchestrator captures screen via mss, sends to LLaVA 13B for visual context, combines with user intent, sends to Mistral 7B for action planning. Safety filter validates planned actions against allowlist before execution. Actions executed via pyautogui. All commands and actions logged to audit file. Kill switch monitored via keyboard listener on separate thread.