Mengoptimalkan Proses Pembersihan Data dalam Analisis Big Data Menggunakan Pipeline Berbasis AI
DOI:
https://doi.org/10.51903/elkom.v17i2.2311Keywords:
Machine Learning, Deep Learning, Data PreprocessingAbstract
This study aims to develop an automated pipeline for data cleaning using Pandas and Scikit-learn. The data cleaning process is often performed manually, requiring a long time and prone to errors. This study uses a quantitative experimental method with a dataset of 100,000 rows of e-commerce transaction data. The results show that the automated pipeline reduces missing values by 95.7% and outliers by 91.7%, and accelerates processing time by 35% compared to manual methods. The distribution of data after cleaning becomes more stable, allowing for more accurate analysis. This study contributes to the development of a more efficient and accurate automated data cleaning approach.Keywords: Systematic Literature Review, Artificial Intelligence and Marketing Strategy.