AI video editor with multimodal scene analysis
Web-based AI video editor that analyzes footage at scale. Uploads video, generates a proxy, chunks into segments, runs two-pass AI analysis (Pass 1: multimodal scene extraction, Pass 2: text-only clip selection), and produces an EDL timeline. Includes manual timeline editor with source relinking and speech-aware clip selection.
Two-pass analysis: Pass 1 multimodal (expensive) extracts scenes, Pass 2 text-only (cheap) selects clips
Proxy generation and chunking for handling large video files
EDL (Edit Decision List) timeline with manual override capability
Speech enforcement — demotes clips with poor transcript overlap
Streaming SSE progress feedback during analysis
Source file relinking for missing media
FastAPI backend receives video upload, generates downscaled proxy via ffmpeg, chunks into ~2min segments. Pass 1 sends video chunks to Gemini multimodal for scene/audio/transcript extraction. Pass 2 merges all extraction results and runs text-only Gemini call for clip selection (much cheaper). Results returned as EDL. React frontend renders interactive timeline editor with drag/trim. SSE streams progress updates during processing.