Skip to main content
AI / Python

Local AI agent that sees and controls your desktop

A fully local AI system that uses Ollama (Mistral 7B for reasoning + LLaVA 13B for vision) to understand what's on your screen and take actions via keyboard and mouse automation. No cloud APIs — runs entirely on your GPU with a hardware kill switch for safety.

Links Coming Soon
Tech Stack
PythonOllamaMistral 7BLLaVA 13BpyautoguimssPillowprompt_toolkit
Key Features

01

Dual-model architecture: Mistral 7B for reasoning/planning, LLaVA 13B for visual understanding

02

Screen capture and analysis via mss + Pillow, fed to LLaVA for scene description

03

Keyboard/mouse automation through pyautogui with safety-filtered action execution

04

Hardware kill switch — physical key combo to instantly halt all automation

05

Command audit trail logging every action taken for accountability

06

Fully local — zero cloud dependencies, runs on consumer GPU (RTX 3090 Ti)

Architecture

Event loop with prompt_toolkit CLI. User issues natural language commands. Orchestrator captures screen via mss, sends to LLaVA 13B for visual context, combines with user intent, sends to Mistral 7B for action planning. Safety filter validates planned actions against allowlist before execution. Actions executed via pyautogui. All commands and actions logged to audit file. Kill switch monitored via keyboard listener on separate thread.

Screenshot / Demo Coming Soon