Authors - Pranay Kavthankar, Rutuj Koli, Ronit Ghadi, Yug Mora, Abhijit Joshi Abstract - Speech-to-Speech Translation (S2ST) has evolved from cas caded pipelines into end-to-end neural architectures. However, preserv ing emotion, prosody, and speaker identity across languages remains challenging. This survey examines state-of-the-art emotion and identity preserving S2ST and neural TTS systems, covering discrete-representation models, end-to-end systems, and cascaded pipelines. We analyze architec tures including Translatotron, VQ-Translatotron, SeamlessM4T, VALL E, VALL-E X, VITS, YourTTS, StyleTTS2, and XTTSv2. The survey discusses speaker identity preservation (x-vectors, d-vectors, codec repre sentations), prosody modeling (pitch, duration, energy), emotion reten tion (categorical, dimensional, embeddings), datasets, evaluation met rics, and challenges including data scarcity, cross-lingual emotion trans fer, and computational costs. We propose future directions toward large scale expressive datasets, improved cross-lingual modeling, and respon sible AI practices.