Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link

Hairani Hairani, HH and Anthony Anggrawan, AA and Dadang Priyanto, DP (2023) Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link, 7 (1). 258 -264. ISSN 2549-9610

[img]
Preview
Text
Bukti Korespondensi-JOIV.pdf

Download (1MB) | Preview
[img]
Preview
Text
SImilarity-JOIV.pdf

Download (2MB) | Preview
[img]
Preview
Text
Bukti Pendukung-JOIV.pdf

Download (4MB) | Preview

Abstract

Most of the health data contained unbalanced data that affected the performance of the classification method. Unbalanced data causes the classification method to classify the majority data more and ignore the minority class. One of the health data that has unbalanced data is Pima Indian Diabetes. Diabetes is a deadly disease caused by the body's inability to produce enough insulin. Complications of diabetes can cause heart attacks and strokes. Early diagnosis of diabetes is needed to minimize the occurrence of more severe complications. In the diabetes dataset used, there is an imbalanced data between positive and negative diabetes classes. Diabetes negative class data (500 data) is more than diabetes positive class (268), so it can affect the performance of the classification method. Therefore, this study aims to apply the Smote-Tomeklink and Random Forest methods in the classification of diabetes. The research methodology used is the collection of diabetes data obtained from Kaggle, as many as 768 data with eight input attributes and 1 output attribute as a class, pre-processing data is used to balance the dataset with Smote-Tomeklink, classification using the random forest method, and performance evaluation based on accuracy, sensitivity, precision, and F1-score. Based on the tests conducted by dividing data using 10-fold cross-validation, the Random Forest algorithm with Smote-TomekLink gets the highest accuracy, sensitivity, precision, and F1-score compared to Random Forest with Smote. The Random Forest algorithm with Smote-Tomeklink has 86.4% accuracy, 88.2% sensitivity, 82.3% precision, and 85.1% F1-score. Thus, using Smote-Tomeklink can improve the performance of the random forest method based on accuracy, sensitivity, precision, and F1-score.

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Fakultas Teknik dan Desain > Ilmu Komputer
Depositing User: Dr. Dadang Priyanto, S.Kom.,M.Kom
Date Deposited: 09 Oct 2023 00:31
Last Modified: 09 Oct 2023 00:31
URI: http://repository.universitasbumigora.ac.id/id/eprint/3373

Actions (login required)

View Item View Item