Loading…
Friday April 10, 2026 9:30am - 11:30am GMT+07

Authors - Mutiara Ayu Mawaddah, Norhalina Senan, Mohd Norasri Ismail, Larisang, Muchlis Almubaraq
Abstract - With the growing use of smart meters, massive amounts of electricity consumption data are being generated every day. Managing and analyzing this data efficiently is a big challenge. In this study, we generated a smart meter dataset of 10 million records, adding realistic anomalies such as missing values, noise, and unusual spikes to reflect real-world conditions. The data was stored in Hadoop Distributed File System (HDFS) on a single-node virtual machine running on Kali Linux for distributed processing . Using Apache PySpark, we cleaned the data, filled in missing values, identified outliers, and normalized features. For predicting electricity consumption, we trained a linear regression model which achieved a Root Mean Squared Error (RMSE) of 0.0141 and a R2 score of 0.9891, showing that the model predicts consumption very accurately. Overall, this study demonstrates a practical end-to-end approach that combines big data tools and machine learning for smart meter analytics. In the future, this workflow could be extended to multi-node clusters to improve fault tolerance and handle even larger datasets.
Paper Presenter
Friday April 10, 2026 9:30am - 11:30am GMT+07
Virtual Room B Bangkok, Thailand

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link