Classifying Thyroid Disease through Machine Learning Ap-proach

Authors

  • Teddy Al Fatah Department of information Studies, University College London
  • Mila Desi Anasanti Bart and London Genome Center, Queen Mary University of London

DOI:

https://doi.org/10.55606/jeei.v5i3.5742

Keywords:

Classification, Early detection, Gradient Boosting, Healthcare analytics, Machine learning

Abstract

Thyroid illness is one of the most prevalent medical problems that has a direct impact on a person's physical and emotional well-being. The 2017–2020 NHANES data, which is extensive and contains a wide variety of 6,992 people and XX characteristics, is the source of the ML used in this study. Improving the early identification and classification of vulnerable people is the goal of this study. The machine learning techniques used in this study include K-Nearest Neighbor (KNN), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR), Extreme Gradient Boosting (EGB), LightGBM (LGBM), Multi-Layer Perceptron (MLP), and Gradient Boosting. Evaluation of these algorithms revealed that RF, EGB, and LGBM exhibited exceptional accuracy, reaching an impressive 0.90. Among them, RF demonstrated the highest precision at 0.98, showcasing its ability to correctly identify individuals at risk with a high degree of confidence. Moreover, the study identified KNN as the algorithm with the highest recall value, reaching 0.73, highlighting its effectiveness in capturing a substantial proportion of true positive cases. EGB emerged with the highest F1-Score, shows a proportionate balance between recall and accuracy. Additionally, EGB displayed the highest Area Under the Curve (AUC) at 0.82, underscoring its robust predictive capabilities. This research underscores the pivotal role of ML algorithms in predicting and classifying thyroid disease risk, offering valuable insights for early intervention and personalized healthcare strategies. The high accuracy, precision, and recall values observed with RF, EGB, and LGBM suggest their potential as powerful tools for improving diagnostic capabilities in the realm of thyroid disease, contributing to more effective and timely patient care. As advancements in machine learning continue, the integration of these techniques into healthcare frameworks holds promise for enhancing our understanding and management of thyroid disorders.

References

Abbad Ur Rehman, H., Lin, C. Y., Mushtaq, Z., & Su, S. F. (2021). Performance analysis of machine learning algorithms for thyroid disease. Arabian Journal for Science and Engineering, 46(10), 9437–9449. https://doi.org/10.1007/s13369-020-05206-x

Abbas, M. A., Al-Mudhafar, W. J., & Wood, D. A. (2023). Improving permeability prediction in carbonate reservoirs through gradient boosting hyperparameter tuning. Earth Science Informatics, 16(4), 3417–3432.

Alhassan, A. M., & Zainon, W. M. N. W. (2021). Review of feature selection, dimensionality reduction and classification for chronic disease diagnosis. IEEE Access, 9, 87310–87317.

Aliferis, C. F., Tsamardinos, I., & Statnikov, A. (2003). HITON: A novel Markov blanket algorithm for optimal variable selection. AMIA Annual Symposium Proceedings, 21–25.

Alyas, T., Hamid, M., Alissa, K., Faiz, T., Tabassum, N., & Ahmad, A. (2022). Empirical method for thyroid disease classification using a machine learning approach. BioMed Research International, 2022. https://doi.org/10.1155/2022/9809932

Aslandogan, Y. A., Mahajani, G. A., & Taylor, S. (2004). Evidence combination in medical data mining. In International Conference on Information Technology: Coding and Computing (ITCC) (Vol. 2, pp. 465–469). IEEE. https://doi.org/10.1109/ITCC.2004.1286697

Begum, A., & Parkavi, A. (2019). Prediction of thyroid disease using data mining techniques. In 2019 5th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 342–345). IEEE. https://doi.org/10.1109/ICACCS.2019.8728320

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2020). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5

Centers for Disease Control and Prevention (CDC). (2020). National Health and Nutrition Examination Survey (NHANES). National Center for Health Statistics. http://www.cdc.gov/nchs/nhanes/about_nhanes.htm

Centers for Disease Control and Prevention (CDC). (2020). National Health and Nutrition Examination Survey (NHANES), Continuous NHANES 2017–2020. National Center for Health Statistics. https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Cycle=2017-2020

Chaganti, R., & Rustam, F. (2020). Thyroid disease prediction using selective features and machine learning techniques. In Statistical Data Analysis of Microarrays Using R and Bioconductor (pp. 999–1024). https://doi.org/10.1201/b11566-34

Chandel, K., Kunwar, V., Sabitha, S., Choudhury, T., & Mukherjee, S. (2016). A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classification techniques. CSI Transactions on ICT, 4(2–4), 313–319. https://doi.org/10.1007/s40012-016-0100-5

Chaubey, G., Bisen, D., Arjaria, S., & Yadav, V. (2021). Thyroid disease prediction using machine learning approaches. National Academy Science Letters, 44(3), 233–238. https://doi.org/10.1007/s40009-020-00979-z

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

Duggal, P., & Shukla, S. (2018). Prediction of thyroid disease using machine learning techniques, 10(2), 787–793.

Ioniță, I., & Ioniță, L. (2019). Prediction of thyroid disease using data mining techniques. In 2019 5th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 342–345). IEEE. https://doi.org/10.1109/ICACCS.2019.8728320

Islam, S., Mynuddin, M., Sultana, S., Mondal, S. K., Hossain, M. A., Paul, G. K., et al. (2025). A performance comparison of machine learning models for robotic navigation using imbalanced and SMOTE-enhanced data. Global Journal of Management Studies (GJMS), 2(1), 10–37.

Kumar, A., Dhanka, S., Sharma, A., Sharma, A., Maini, S., Fahlevi, M., et al. (2025). Comprehensive framework for thyroid disorder diagnosis: Integrating advanced feature selection, genetic algorithms, and machine learning for enhanced accuracy and other performance matrices. PLOS ONE, 20(6), e0325900.

Latif, M. A., Mushtaq, Z., Arif, S., Rehman, S., Qureshi, M. F., Samee, N. A., et al. (2024). Improving thyroid disorder diagnosis via ensemble stacking and bidirectional feature selection. Computers, Materials & Continua, 78(3).

Lu, B., Huang, H., Wu, Z., Zhang, T., Gu, Y., Wang, F., & Shu, Z. (2025). Utilizing LightGBM to explore the characterization of PM2.5 emission patterns from broadleaf tree combustion in Northeastern China. Forests, 16(5), 836.

Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q., & Niu, X. (2018). Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGBoost algorithms according to different high dimensional data cleaning. Electronic Commerce Research and Applications, 31, 24–39. https://doi.org/10.1016/j.elerap.2018.08.002

Malik, P. K., Bhatt, H., & Sharma, M. (2025). AI integration in healthcare systems—A review of the problems and potential associated with integrating AI in healthcare for disease detection and diagnosis. In AI in Disease Detection: Advancements and Applications (pp. 191–213).

Mansour, R. F. (2024). Quantum mayfly optimization-based feature subset selection with hybrid CNN for biomedical Parkinson’s disease diagnosis. Neural Computing and Applications, 36(15), 8383–8396.

Prasad, V., Rao, T. S., & Babu, M. S. P. (2016). Thyroid disease diagnosis via hybrid architecture composing rough data sets theory and machine learning algorithms. Soft Computing, 20(3), 1179–1189. https://doi.org/10.1007/s00500-014-1581-5

Priyadharshini, C. A., & Arulkumaran, G. (2025). Multi-constraints feature selection-based cross-pattern heterogeneous ensemble learning model for diabetic mellitus prediction under data-imbalance and insufficiency. SN Computer Science, 6(7), 831.

Raisinghani, S., Shamdasani, R., Motwani, M., Bahreja, A., & Lalitha, P. R. N. (2019). Thyroid prediction using machine learning techniques (Vol. 1045). Springer Singapore.

Rao, A. R., & Renuka, B. S. (2020). A machine learning approach to predict thyroid disease at early stages of diagnosis. In 2020 IEEE International Conference on Innovative Technology (INOCON) (pp. 1–4). IEEE. https://doi.org/10.1109/INOCON50539.2020.9298252

Razia, S., & Narasinga Rao, M. R. (2016). Machine learning techniques for thyroid disease diagnosis—A review. Indian Journal of Science and Technology, 9(28). https://doi.org/10.17485/ijst/2016/v9i28/93705

Razia, S., Kumar, P. S., & Rao, A. S. (2020). Machine learning techniques for thyroid disease diagnosis: A systematic review. In Studies in Computational Intelligence (Vol. 885, pp. 203–212). Springer. https://doi.org/10.1007/978-3-030-38445-6_15

Razia, S., Prathyusha, P. S., Krishna, N. V., & Sumana, N. S. (2018). A comparative study of machine learning algorithms on thyroid disease prediction. International Journal of Engineering and Technology, 7(2.8), 315–319. https://doi.org/10.14419/ijet.v7i2.8.10432

Rhodes, D. R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T., Pandey, A., & Chinnaiyan, A. M. (2004). Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proceedings of the National Academy of Sciences of the United States of America, 101(25), 9309–9314.

Sakib, M. N., Sheakh, M. A., Tahosin, M. S., Sadik, M. R., Islam, M. A., & Akter, L. (2024). Accurate thyroid disease detection with ensemble learning models. In Proceedings of the 2024 4th International Conference on Artificial Intelligence and Signal Processing (AISP) (pp. 1–6).

Salman, K., & Sonuc, E. (2021). Thyroid disease classification using machine learning algorithms. Journal of Physics: Conference Series, 1963(1). https://doi.org/10.1088/1742-6596/1963/1/012140

Shen, X., & Lin, Y. (2004). Gene expression data classification using SVM-KNN classifier. In 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP) (pp. 149–152). IEEE. https://doi.org/10.1109/ISIMP.2004.1434022

Szymańska, C., & Baszko, A. (2025). Artificial intelligence tools in myocardial infarction prognosis: Evaluating the performance of machine learning and deep learning models. Current Cardiology Reviews.

Touzani, S., Granderson, J., & Fernandes, S. (2018). Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy and Buildings, 158, 1533–1543. https://doi.org/10.1016/j.enbuild.2017.11.039

Turanoglu-Bekar, E., Ulutagay, G., & Kantarcı-Savas, S. (2016). Classification of thyroid disease by using data mining models: A comparison of decision tree algorithms. Oxford Journal of Intelligent Decision and Data Science, 2016(2), 13–28. https://doi.org/10.5899/2016/ojids-00002

Tyagi, A., Mehra, R., & Saxena, A. (2018). Interactive thyroid disease prediction system using machine learning technique. In 2018 5th International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 689–693). IEEE. https://doi.org/10.1109/PDGC.2018.8745910

Wardhana, I., Ariawijaya, M., Isnaini, V. A., & Wirman, R. P. (2022). Gradient Boosting Machine, Random Forest dan Light GBM untuk klasifikasi kacang kering. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 6(1), 92–99. https://doi.org/10.29207/resti.v6i1.3682

Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308–324. https://doi.org/10.1016/j.trc.2015.02.019

Downloads

Published

2025-10-30

How to Cite

Teddy Al Fatah, & Mila Desi Anasanti. (2025). Classifying Thyroid Disease through Machine Learning Ap-proach. Journal of Engineering, Electrical and Informatics, 5(3), 71–82. https://doi.org/10.55606/jeei.v5i3.5742