Loading…
Thursday April 9, 2026 3:00pm - 5:00pm GMT+07

Authors - Aung Nyein Chan Paing, Sudhir Kumar Sharma
Abstract - This paper presents a semantic video search system that supports natural lan guage querying over video content using vision–language models and vector similarity search. The proposed system processes videos offline by extract ing representative frames through similarity-based filtering, generating textual descriptions using a pre-trained BLIP (Bootstrapping Language–Image Pre training) image captioning model, and encoding the captions into dense vector embeddings. These embeddings are indexed in a vector database to enable effi cient retrieval of relevant video segments based on textual queries. The system architecture comprises a Python-based backend with GPU acceleration for video processing and a web-based interface for query interaction. Experimental obser vations indicate that similarity-based frame filtering reduces redundant frames by approximately 50–70% while preserving semantic information. Qualitative eval uation demonstrates that the system effectively retrieves semantically relevant video timestamps in response to natural language queries. The proposed frame work serves as a modular prototype for content-based video retrieval and semantic video analysis applications.
Paper Presenter
Thursday April 9, 2026 3:00pm - 5:00pm GMT+07
Virtual Room E Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link