Loading…
Saturday April 11, 2026 12:15pm - 2:15pm GMT+07

Authors - Hemamalini Siranjeevi, Swaminathan Venkatraman, Dharshini V, Gayathri A, Sushma Sri R
Abstract - Urban environments generate massive video data from surveillance and mobile sensors, necessitating efficient and intelligent summarization for smart city and transportation systems. This paper proposes a multimodal video summarization framework that moves beyond object-centric analysis toward high-level urban scene understanding. Unlike traditional methods that rely on low-level visual features or isolated object detection, the proposed approach captures contextual relationships and temporal continuity through a multi-stage pipeline. The system integrates multimodal perception, combining deep learning-based object detection, multi-object tracking, and acoustic analysis to preserve entity identities and environmental context. We employ relational inference and motion heuristics to model spatial and semantic interactions, which are then structured into a Dynamic Knowledge Graph (DKG) representing entities, interactions, and temporal events. A semantic synthesis module, powered by a transformer-based language model, generates concise, coherent, and semantically meaningful summaries. This architecture enables scalable, context-aware video summarization adaptable to real-world urban applications.
Paper Presenter
Saturday April 11, 2026 12:15pm - 2:15pm GMT+07
Virtual Room G Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link