A Comparative Study on Customer Churn Analysis Using Machine Learning and Data Enrichment Techniques





Churn Analysis , Data Mining, Customer Relationship Management, Machine Learning Algorithms, RFM Analysis


With the increasing amount of online shopping, companies can collect more customer data. Companies use this data to get to know their customers better and provide customized services. Churn analysis is one of the most essential analyses derived from the vast amount of data collected, which provides information about when a customer will stop shopping with the company. In this study, we perform a churn analysis using machine learning (ML) algorithms to analyse the customer behavior data of a fashion retail company. To perform churn analysis, we performed a four-stage methodology. First, we carried out data preparation and visualization studies, and then we created models using various ML algorithms. After examining the baseline data, we added the RFM (Recency, Frequency, Monetary) score to the data with the data enrichment technique and performed the analysis again. We used the Synthetic Minority Oversampling Technique (SMOTE) to eliminate the data irregularity and performed parameter optimization on the algorithms in SMOTE data. We compared the accuracy and F1 score values obtained after this four-stage process and examined the effect of the algorithms. In the last stage, we divided whole data into clusters using the k-means technique and applied ML algorithms to clustered data. Then, we compared all these results and examined the effect of segmentation on the results. The analysis shows that the extreme gradient boosting algorithm provides better accuracy and F1 score values.  Using these results, the company can identify customers likely to churn and begin funding Customer Relationship Management (CRM) efforts. Additionally, experts can determine the company's development directions by organizing campaigns for these customers and analysing their reasons for churn in more detail.


Wagh, S. K., Andhale, A. A., Wagh, K. S., Pansare, J. R., Ambadekar, S. P., & Gawande, S. H. (2024). Customer churn prediction in telecom sector using machine learning techniques. Results in Control and Optimization, 14(October 2023), 100342. https://doi.org/10.1016/j.rico.2023.100342

Gil-Gomez, H., Guerola-Navarro, V., Oltra-Badenes, R., & Lozano-Quilis, J. A. (2020). Customer relationship management: digital transformation and sustainable business model innovation. Economic Research-Ekonomska Istrazivanja , 33(1), 2733–2750. https://doi.org/10.1080/1331677X.2019.1676283

Matuszelański, K., & Kopczewska, K. (2022). Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach. Journal of Theoretical and Applied Electronic Commerce Research, 17(1), 165–198. https://doi.org/10.3390/jtaer17010009

Kaynar, O., Tuna, M. F., Görmez, Y., & Deveci, M. A. (2017). Customer Churn Analysis Using Machine Learning Methods. C.U. Journal of Economics and Administrative Sciences, 18(1), 1–14.

Hamdy, I., & Kandel, A. (2018). A Comparative Study of Tree-Based Models for Churn Prediction: A Case Study in the Telecommunication Sector. NOVA Information Management School, 56.

Celik, O., & Osmanoglu, U. O. (2019). Comparing to Techniques Used in Customer Churn Analysis. Journal of Multidisciplinary Developments, 4(1), 30–38.

Cooper, H. (2020). Comparison of Classification Algorithms and Undersampling Methods on Employee Churn Prediction: A Case Study of a Tech Company (Issue December) [Faculty of California Polytechnic State University, San Luis Obispo]. https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=3753&context=theses

Wu, S., Yau, W. C., Ong, T. S., & Chong, S. C. (2021). Integrated Churn Prediction and Customer Segmentation Framework for Telco Business. IEEE Access, 9, 62118–62136. https://doi.org/10.1109/ACCESS.2021.3073776

Dingli, A., Marmara, V., & Fournier, N. S. (2017). Comparison of deep learning algorithms to predict customer churn within a local retail industry. International Journal of Machine Learning and Computing, 7(5), 128–132. https://doi.org/10.18178/ijmlc.2017.7.5.634

Asthana, P. (2018). A comparison of machine learning techniques for customer churn prediction. International Journal of Pure and Applied Mathematics, 119(10), 1149–1169. https://acadpubl.eu/jsi/2018-119-10/articles/10b/2.pdf

Aleksandrova, Y. (2018). Application of machine learning for churn prediction based on transactional data (RFM analysis). International Multidisciplinary Scientific GeoConference Surveying Geology and Mining Ecology Management, SGEM, 18(2.1), 125–132. https://doi.org/10.5593/sgem2018/2.1/S07.016

Ahmad Naz, N., Shoaib, U., & Shahzad Sarfraz, M. (2018). A Review on Customer Churn Prediction Data Mining Modeling Techniques. Indian Journal of Science and Technology, 11(27), 1–7. https://doi.org/10.17485/ijst/2018/v11i27/121478

Stucki, O. (2019). Predicting the customer churn with machine learning methods - CASE: private insurance customer data [School of Business and Management Lappeenranta-Lahti University of Technology LUT]. https://lutpub.lut.fi/bitstream/handle/10024/160081/Thesis_Oskar_stucki.pdf?sequence=1&isAllowed=y

Wadikar, D. (2020). Customer Churn Prediction [Technological University Dublin]. https://doi.org/10.17148/iarjset.2021.8692

Makruf, M., Bramantoro, A., Alyamani, H. J., Alesawi, S., & Alturki, R. (2021). Classification methods comparison for customer churn prediction in the telecommunication industry. International Journal of Advanced and Applied Sciences, 8(12), 1–8. https://doi.org/10.21833/ijaas.2021.12.001

Patel, P. C., Struckell, E. M., Ojha, D., & Manikas, A. S. (2020). Retail store churn and performance – The moderating role of sales amplitude and unpredictability. International Journal of Production Economics, 222(May 2019). https://doi.org/10.1016/j.ijpe.2019.09.031

Geiler, L., Affeldt, S., & Nadif, M. (2022). An effective strategy for churn prediction and customer profiling. Data and Knowledge Engineering, 142(August), 102100. https://doi.org/10.1016/j.datak.2022.102100

Baghla, S., & Gupta, G. (2022). Performance Evaluation of Various Classification Techniques for Customer Churn Prediction in E-commerce. Microprocessors and Microsystems, 94, 104680. https://doi.org/10.1016/j.micpro.2022.104680

Prabadevi, B., Shalini, R., & Kavitha, B. R. (2023). Customer churning analysis using machine learning algorithms. International Journal of Intelligent Networks, 4(May), 145–154. https://doi.org/10.1016/j.ijin.2023.05.005

Shobana, J., Gangadhar, C., Arora, R. K., Renjith, P. N., Bamini, J., & Chincholkar, Y. devidas. (2023). E-commerce customer churn prevention using machine learning-based business intelligence strategy. Measurement: Sensors, 27(December 2022), 100728. https://doi.org/10.1016/j.measen.2023.100728

Haddadi, S. J., Farshidvard, A., Silva, F. dos S., dos Reis, J. C., & da Silva Reis, M. (2024). Customer churn prediction in imbalanced datasets with resampling methods: A comparative study. Expert Systems with Applications, 246(September 2023), 123086. https://doi.org/10.1016/j.eswa.2023.123086

Singh, P. P., Anik, F. I., Senapati, R., Sinha, A., Sakib, N., & Hossain, E. (2024). Investigating customer churn in banking: A machine learning approach and visualization app for data science and management. Data Science and Management, 7(1), 7–16. https://doi.org/10.1016/j.dsm.2023.09.002

Fávero, L. P., Belfiore, P., & de Freitas Souza, R. (2023). Chapter 21 - Random forests. In L. P. Fávero, P. Belfiore, & R. de Freitas Souza (Eds.), Data Science, Analytics and Machine Learning with R (pp. 429–440). Academic Press. https://doi.org/https://doi.org/10.1016/B978-0-12-824271-1.00018-4

Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 2017(7). https://doi.org/10.7717/peerj-cs.127

Dong, Q., Chen, X., & Huang, B. (2024). Chapter 5 - Logistic regression. In Q. Dong, X. Chen, & B. Huang (Eds.), Data Analysis in Pavement Engineering (pp. 141–152). Elsevier. https://doi.org/https://doi.org/10.1016/B978-0-443-15928-2.00001-X

Wibawa, A. P., Kurniawan, A. C., Murti, D. M. P., Adiperkasa, R. P., Putra, S. M., Kurniawan, S. A., & Nugraha, Y. R. (2019). Naïve Bayes Classifier for Journal Quartile Classification. International Journal of Recent Contributions from Engineering, Science & IT (IJES), 7(2), 91. https://doi.org/10.3991/ijes.v7i2.10659



How to Cite

Karaarslan, H., Baştuğ, M., Güngör Şen, C., & IŞIK, E. E. (2024). A Comparative Study on Customer Churn Analysis Using Machine Learning and Data Enrichment Techniques. Journal of Soft Computing and Decision Analytics, 2(1), 225-235. https://doi.org/10.31181/jscda21202441