Dev Portfolio | SproulTech

AI / Python

Local AI agent that sees and controls your desktop

A fully local AI system that uses Ollama (Mistral 7B for reasoning + LLaVA 13B for vision) to understand what's on your screen and take actions via keyboard and mouse automation. No cloud APIs — runs entirely on your GPU with a hardware kill switch for safety.

Links Coming Soon

Tech Stack

PythonOllamaMistral 7BLLaVA 13BpyautoguimssPillowprompt_toolkit

Key Features

Dual-model architecture: Mistral 7B for reasoning/planning, LLaVA 13B for visual understanding

Screen capture and analysis via mss + Pillow, fed to LLaVA for scene description

Keyboard/mouse automation through pyautogui with safety-filtered action execution

Hardware kill switch — physical key combo to instantly halt all automation

Command audit trail logging every action taken for accountability

Fully local — zero cloud dependencies, runs on consumer GPU (RTX 3090 Ti)

Architecture

Event loop with prompt_toolkit CLI. User issues natural language commands. Orchestrator captures screen via mss, sends to LLaVA 13B for visual context, combines with user intent, sends to Mistral 7B for action planning. Safety filter validates planned actions against allowlist before execution. Actions executed via pyautogui. All commands and actions logged to audit file. Kill switch monitored via keyboard listener on separate thread.

Screenshot / Demo Coming Soon

Next Project

Multimodal video understanding with Whisper + LLaVA

→

AI PC Control

What It Does

How It's Built

Video Analyzer