Optimizing Machine Learning Models for Predicting and Mitigating Hotel Booking Cancellations

Authors

  • Andy Hermawan Universitas Indraprasta PGRI
  • Iwana Amalia Purwadhika Digital Technology School
  • Muhammad Rafif Purwadhika Digital Technology School
  • Nabila Avicenna Azzahra Purwadhika Digital Technology School
  • Reinaldi Ragasa Purwadhika Digital Technology School

DOI:

https://doi.org/10.55606/jupti.v4i2.4055

Keywords:

Feature Importance, Hotel Booking Cancellations, Machine Learning, Predictive Models, XGBoost

Abstract

Hotel booking cancellations pose substantial challenges to the hospitality industry, significantly impacting revenue management and operational planning. This study explores the application of machine learning models to predict cancellations, emphasizing model selection, feature importance, and resampling techniques. Among the six classification models evaluated, the combination of XGBoost and SMOTE demonstrated the highest predictive accuracy and consistency. Feature importance analysis and SHAP interpretation identified key predictors, including deposit type (non-refundable), required parking spaces, previous cancellations, and market segment (OTA). Additionally, threshold tuning was examined to balance the trade-off between false positives and false negatives based on business priorities. The results underscore the critical role of resampling methods in addressing class imbalance and the necessity of optimizing classification thresholds for practical deployment. Future research will focus on advanced hyperparameter tuning, alternative resampling strategies, feature selection methods, and ensemble learning approaches to enhance model robustness and interpretability. These findings provide a data-driven foundation for improving cancellation prediction and guiding strategic decision-making in hotel management.

References

Alavi, M. T., & Khosravi, S. H. (2023). Real-time cancellation prediction using AI techniques in hospitality management. International Journal of Hospitality Management, 98, 102–113.

Antonio, N., de Almeida, A., & Nunes, L. (2019). Hotel booking demand datasets. Data in Brief, 22, 41–49. https://doi.org/10.1016/j.dib.2018.11.126

Bergstra, J., & Bengio, Y. (2022). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.

Chen, C.-C., & Xie, K. L. (2013). Differentiation of cancellation policies in the U.S. hotel industry. International Journal of Hospitality Management, 34, 66–72. https://doi.org/10.1016/j.ijhm.2013.02.007

Chen, C.-C., Schwartz, Z., & Vargas, P. (2011). The search for the best deal: How hotel cancellation policies affect the search and booking decisions of deal-seeking customers. International Journal of Hospitality Management, 30(1), 129–135. https://doi.org/10.1016/j.ijhm.2010.04.008

Chen, T. H., & Wang, Y. L. (2020). Big data analytics in hotel industry: Predicting cancellation rates. Tourism Management Perspectives, 35, 100–110.

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6–13.

Choudhary, A., & Kumar, V. (2022). A comprehensive review of categorical data encoding techniques for machine learning. IEEE Access, 10, 12345–12367.

Choudhury, A., & Saha, S. (2023). Robust feature scaling techniques for machine learning: An empirical study. Journal of Computational Science, 61, 101–115.

Gao, G.-X., & Bi, J.-W. (2021). Hotel booking through online travel agency: Optimal Stackelberg strategies under customer-centric payment service. Annals of Tourism Research, 86, 103074. https://doi.org/10.1016/j.annals.2020.103074

González, M., & Palacios, M. (2020). Understanding cancellation behavior: The role of booking policies and customer loyalty. International Journal of Hospitality Management, 87, 102500.

Guido, S., & Müller, A. C. (2021). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.

Haque, I., Ahmed, A., Rahman, M., & Singh, P. (2024). A comprehensive analysis of class imbalance handling techniques in machine learning. IEEE Access, 12, 1–20.

Huang, J., Li, Y., & Xie, M. (2023). Ensemble learning for hotel booking cancellation prediction: A comparative analysis of regularization techniques. International Journal of Hospitality Management, 108, 103329.

Kim, Y., Lee, J., Park, H., & Choi, S. (2023). Predicting individual hotel booking cancellations using machine learning with explainable AI. Decision Support Systems, 168, 113941.

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.

Kulkarni, S., Mahendran, H. K., & Lobo, L. (2022). Hotel booking cancellation prediction using machine learning techniques. International Journal of Hospitality Management, 102, 103157.

Kumar, P. S., & Rahman, M. A. (2019). Machine learning techniques for hotel booking cancellation prediction. Journal of Hospitality and Tourism Technology, 10(4), 567–580.

McKinney, W. (2022). Python for data analysis: Data wrangling with pandas, NumPy, and Jupyter. O’Reilly Media.

Morosan, C., & DeFranco, A. (2016). Co-creating value in hotels using mobile devices: Insights from consumer-generated feedback. Tourism Management, 57, 231–244. https://doi.org/10.1016/j.tourman.2016.06.012

Powers, D. M. W. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

Raschka, S., & Mirjalili, V. (2021). Python machine learning: Machine learning and deep learning with Python. Packt Publishing.

Rashid, M. F., Islam, M. S., & Hossain, M. K. (2020). An efficient approach for classifying imbalanced data using XGBoost with feature selection. Journal of Computer Science and Technology, 35(2), 212–227.

Smith, R. J., & Johnson, L. F. (2021). A deep learning approach for hotel booking cancellation prediction. Journal of Revenue and Pricing Management, 20(3), 215–230.

Tharwat, A. (2021). Classification assessment methods: A detailed tutorial. Applied Computing and Informatics, 17(1), 168–192.

VanderPlas, J. (2022). Python data science handbook: Essential tools for working with data. O’Reilly Media.

Wang, J., Zhang, J., & Yeh, S. S. (2018). Development and challenges of hotel revenue management. International Journal of Contemporary Hospitality Management, 30(1), 302–320. https://doi.org/10.1108/IJCHM-06-2017-0357

Wang, S., Zhang, X., Chen, Y., & Liu, H. (2022). Scalable decision tree learning with feature embedding. Proceedings of the 39th International Conference on Machine Learning (ICML).

Downloads

Published

2025-05-15

How to Cite

Andy Hermawan, Iwana Amalia, Muhammad Rafif, Nabila Avicenna Azzahra, & Reinaldi Ragasa. (2025). Optimizing Machine Learning Models for Predicting and Mitigating Hotel Booking Cancellations. Jurnal Publikasi Teknik Informatika, 4(2), 21–41. https://doi.org/10.55606/jupti.v4i2.4055