Authors - Taslima Ferdous Supty, Fahima Hossain, Era Aich, Ananna Datta, MD Sahadat Hossen Tanim Abstract - The Newborns mostly use infant crying as their main form of communication and it represents a great variety of physiological and emotional conditions. Despite the high potential of automated infant cry analysis in early diagnosis and support of caregivers, the application in real-life still has low usage rates because of environmental noise, imbalance of classes, low interpretability, and high computational cost. This paper is a compilation of an effective, interpretable, and real-time infant cry classification system using a two-step hierarchical methodology. The first stage involves a distinction of cry and non-cry sounds to reduce the rate of false alarms due to background noise. The second stage involves categorizing detected cries into a particular intent. An adaptive feature fusion strategy based on reinforcement learning, gives the cepstral and prosodic and qualitative acoustic features dynamic weights, and SHAP-based explainability offers explicit feature interpretations. Data augmentation, SMOTE-Audio, and model pruning are used to find solutions to the issues of class imbalance, noise robustness, and deployment constraints. Experimental evidence shows that the proposed approach outperforms single feature base-lines, it is also stable in noisy environments and also attains significant parameter reduction without significant loss in performance, making it possible to run in resource-constrained devices in real time. The system is tested on a publicly available infant cry dataset which contains 889 audio samples of cry and non-cry signals in five categories of cry intent and was recorded in realistic conditions.