Authors - Aung Nyein Chan Paing, Sudhir Kumar Sharma Abstract - This paper presents a semantic video search system that supports natural lan guage querying over video content using vision–language models and vector similarity search. The proposed system processes videos offline by extract ing representative frames through similarity-based filtering, generating textual descriptions using a pre-trained BLIP (Bootstrapping Language–Image Pre training) image captioning model, and encoding the captions into dense vector embeddings. These embeddings are indexed in a vector database to enable effi cient retrieval of relevant video segments based on textual queries. The system architecture comprises a Python-based backend with GPU acceleration for video processing and a web-based interface for query interaction. Experimental obser vations indicate that similarity-based frame filtering reduces redundant frames by approximately 50–70% while preserving semantic information. Qualitative eval uation demonstrates that the system effectively retrieves semantically relevant video timestamps in response to natural language queries. The proposed frame work serves as a modular prototype for content-based video retrieval and semantic video analysis applications.