Authors - Poorna Pragnya H, Neha V Malage, Pranav Muppuru, Sanya Vashist, Surabhi Narayan Abstract - This work introduces a novel Sequence-to-Sequence (Seq2Seq) framework that converts Electroencephalography (EEG) signals and related metadata into coherent natural language descriptions. The key innovation is a spatio-temporal EEG encoder built using Dense Graph Convolutional Networks (GCNs), which effectively model spatial relationships among electrodes as well as their temporal dynamics in multi-channel EEG data. This encoder is coupled with an attentiondriven Gated Recurrent Unit (GRU) decoder to generate textual sequences. To strengthen learning, the model adopts a multi-task objective that simultaneously predicts scene-level attributes, such as colors and objects, alongside caption generation, promoting better alignment between EEG features and language outputs. Experiments on a large-scale dataset demonstrate competitive results, achieving a BLEU score of 0.21, ROUGE-1 of 0.4519, and ROUGE-L of 0.4447. The generated captions are further used as inputs to a text-to-video generation module. While precise pixel-level matching remains difficult, evaluation shows strong semantic alignment between generated and reference videos, with an SSIM of 0.19 and a CLIP-based semantic similarity score of 0.746. Overall, the results highlight the promise of GCN-based EEG representations for complex language decoding and downstream video generation tasks.