Authors - Shruti Thakur, Shilpa Nikhil Bhosale, Priti Prakash Jorvekar, Sandeep Muktinath Chitalkar, Harshala Shingne, Rupali Vairagade Abstract - This study examines the effectiveness of ensemble learning models for detecting fraud in e-wallet transactions under extreme class imbalance and temporal dependence. Using the PaySim bench-mark dataset, a time-aware experimental framework is developed that incorporates forward-chaining evaluation, imbalance-aware resampling, hyperparameter optimisation, probability calibration, and cost-sensitive threshold tuning to reflect real-world deployment conditions. RF and XGBoost are systematically compared across multiple dataset scales and train–test splitting strategies. Empirical findings show that XGBoost consistently outperforms RF, achieving the highest F1-score, maintaining PR-AUC above 0.88, and demonstrating near-perfect ROC-AUC, indicating strong discriminative capability. Following isotonic calibration, XGBoost also produces the lowest Brier score, highlighting superior probability reliability for risk-based decisions. Performance gains plateau beyond a 75% training share, while XGBoost preserves stable performance as the test window expands, unlike RF. Overall, the results support prioritising gradient boosting models, adopting time-aware validation, and integrating calibrated risk scoring in operational e-wallet fraud detection systems.