Skip to main content
Full-Stack AI

AI video editor with multimodal scene analysis

Web-based AI video editor that analyzes footage at scale. Uploads video, generates a proxy, chunks into segments, runs two-pass AI analysis (Pass 1: multimodal scene extraction, Pass 2: text-only clip selection), and produces an EDL timeline. Includes manual timeline editor with source relinking and speech-aware clip selection.

Links Coming Soon
Tech Stack
FastAPIReact 19TypeScriptViteTailwind CSSGoogle GeminiffmpegSSE
Key Features

01

Two-pass analysis: Pass 1 multimodal (expensive) extracts scenes, Pass 2 text-only (cheap) selects clips

02

Proxy generation and chunking for handling large video files

03

EDL (Edit Decision List) timeline with manual override capability

04

Speech enforcement — demotes clips with poor transcript overlap

05

Streaming SSE progress feedback during analysis

06

Source file relinking for missing media

Architecture

FastAPI backend receives video upload, generates downscaled proxy via ffmpeg, chunks into ~2min segments. Pass 1 sends video chunks to Gemini multimodal for scene/audio/transcript extraction. Pass 2 merges all extraction results and runs text-only Gemini call for clip selection (much cheaper). Results returned as EDL. React frontend renders interactive timeline editor with drag/trim. SSE streams progress updates during processing.

Screenshot / Demo Coming Soon