Name: Objective Evaluation of YOLO Architectures for Crowd Detection
Start: 2026-04-11T15:00:00+0700
End: 2026-04-11T17:00:00+0700

Objective Evaluation of YOLO Architectures for Crowd Detection

Saturday April 11, 2026 3:00pm - 5:00pm GMT+07

Virtual Room B

Open Zoom

Authors - Nidhi Pruthi, Rajiv Singh, Swati Nigam
Abstract - Automatic Speech Recognition (ASR) systems have achieved remarkable progress through deep learning and Transformer-based architectures, demonstrating near-human accuracy on clean audio. However, their performance degrades significantly under challenging conditions and specialized domains. This comprehensive study evaluates leading commercial ASR APIs—Google Cloud Speech-to-Text, Microsoft Azure Speech Service, AssemblyAI, Deepgram, OpenAI Whisper, Speechmatics, and others—across multiple dimensions: general speech recognition, low-quality forensic-like audio, domain-specific mathematical notation, and personalized speaker adaptation. Results demonstrate 100% accuracy on clean audio for leading systems (Deepgram, Speechmatics, Webkit SpeechRecognition), but dramatic performance degradation to 10− 81% word error rates on forensic-like audio. Analysis of domain-specific challenges reveals that none of the tested commercial ASR systems natively support direct transcription of mathematical symbols and Greek letters into structured symbolic output (e.g., LaTeX). The study identifies critical limitations in robustness, modularity, and domain adaptation, while highlighting promising customization mechanisms including custom vocabularies, language models, and post-processing integration. Performance improvements through speaker personalization ranged from 3% for natural voices to 10% for synthetic voices. Despite notable advances in end-to-end and Transformer-based approaches, ASR systems remain unsuitable for forensic applications and specialized domains without substantial customization and post-processing. Future research must address low-resource performance, linguistic diversity, robustness in extreme noise, and the integration of Large Language Models for semantic understanding. This paper synthesizes recent advances and critical gaps, providing a roadmap for advancing ASR technology in specialized and challenging acoustic environments.

Paper Presenter

Nidhi Pruthi

India

Saturday April 11, 2026 3:00pm - 5:00pm GMT+07
Virtual Room B Bangkok, Thailand

Virtual Room_12B, Virtual Room B

11th International Conference on ICTIS

Nidhi Pruthi

Get help with the event

11th International Conference on ICTIS

Nidhi Pruthi

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event