Authors - Busrat Jahan, Kevin Osei-Onomah, Mansi Bhavsar, Hermela Dessie, Apu Chandra Bhowmik Abstract - In the global health sector, Diabetes is a major concern which needs accurate and effective models for early prediction. This work is quantitative re-search work. The dataset was collected from CDC Diabetes Health Indicators, and we used Light Gradient Boosting Machine (LightGBM) model for predicting diabetes. Since this research work is binary classification-based work, in our data preprocessing stage, we used Synthetic Minority Oversampling Technique (SMOTE) for controlling class imbalance and for feature selection we used Chi-square test to improve the model performance. The proposed LightGBM model showed its ability to recognize complex correlation between diabetes-related health indicators with the training accuracy of 92% and a ROC-AUC score of 0.97 on the test dataset. Overall, the findings highlight that predictive accuracy is significantly improved after applying both imbalance data controlling and most correlated feature selection techniques.