Chronic Kidney Disease Prediction Using Machine Learning

The healthcare industry is one of the most important domains for data mining. Every day, the healthcare business creates extended data regarding patients, diseases, hospitals, medical equipment, treatment costs, etc. Data mining assists clinicians in making proper treatment decisions and disease prediction in the early stages, which helps avoid or lessen the effects of diseases like heart disease, cancer, and chronic kidney disease, among others.
Chronic Kidney Disease (CKD) is characterized by kidney damage or decreased function, as indicated by a glomerular filtration rate (GFR) of less than 60 ml/min per 1.73 m2 or both for at least three months. The kidneys are unable to filter blood properly.
Approximately 80 million Americans are at risk of CKD. So, predicting chronic kidney disease is critical for clinicians to make an informed judgment about whether the patient is infected and to give treatment in the early stages to prevent the patient from developing the disease.
In 2016, CKD was the tenth most significant cause of death in the US. Over 500000 individuals received dialysis, and 200000 underwent kidney transplants. It affects an estimated 37 million people in the United States, or around 15% of adults. It is more common in women (15%) than in males (12%). Approximately 80 million Americans are at risk of CKD. And almost 90% of those people are unaware they have CKD.
The dataset collects, analyzes, and distributes information regarding chronic kidney and stage renal disease in the United States. CKD is a condition in which kidneys are damaged and cannot filter blood as well as they should. Because of this, additional fluid and wastes accumulate from the blood remains in the body and may cause other health problems such as heart disease and stroke.
So, this dataset will help to predict the machine learning techniques. SVM in classificaction problems SVM classifies the output into Two classes with CKD and without C KD main objective of this study is to predict the patient with CKD using a smaller number of attributes while maintaining the access accuracy
Our main parameter will be the GLOMERULAR FILTRATION RATE called vital parameters. Another parameter will be blood circulation rate, age, gender, and other characteristics can be used to calculate this.

Recommended Reading

Background Study

The majority of CKD is increasing worldwide. In the United States, over 37 million people have CKD, with the majority of cases going untreated. CKD is also a prominent cause of death, especially among elderly persons. Diabetes, high blood pressure, obesity, and smoking are all risk factors for chronic kidney disease (CKD). Despite efforts to improve the early detection and management of CKD, many patients do not receive adequate care, and better techniques for diagnosing and managing CKD are required.

Objective of Study

The primary goal of this study is to create and test machine learning models for predicting the risk of CKD using patient data. We want to develop models that accurately predict the existence of CKD using demographic, clinical, and laboratory data.
➢ Evaluate the effectiveness of various feature extraction strategies for finding key predictors of CKD.
➢ Evaluate the effectiveness of several machine learning methods for predicting CKD risk.
➢ Evaluate the effect of sample size and data imbalance on model performance.
➢ Identify critical factors linked with CKD risk and improve diagnosis and management.

Classification Algorithms

The classification techniques used in this research:

Logistic Regression: a statistical model used to predict a dependent variable based on a given set of independent variables; it uses a logistic function to build a model to predict binary values.
Naive Bayes: a classifier calculated the probability of a given dataset to perform classification. Each attribute in data is independent of others. The highest likelihood of class is the output class.
Decision Tree: This technique is one of the decision support techniques that apply a graph model and its likely values; it consists of nodes, branches and leaves; each node represents a test of variables, branches represent the test results, and the leaves represent the class label. Also, it is a way of presenting a conditional algorithm.

K-Nearest Neighbor (KNN) is among the simplest machine learning algorithms. It is the non-parametric method used for classification and prediction. It can be used to give weight to the contributions of the neighbours, so the nearer neighbours contribute more to the average than more distant ones.
Support Vector Machine (SVM): is machine learning algorithm that is very useful i n solving classification problems. It is used to classify data in an imaginary line, providing that a barrier separates the points from each other.

Data mining Tools

Python
WEKA tool
Orange tool

Recommended Reading

Data mining technique in chronic kidney disease

▪ Many researchers use data mining techniques to predict kidney disease (Kunwar et al., 2016). The authors used the classification techniques like Naive Bayes and Artificia l Neural Network (ANN); their experiment was in the Rapid Miner tool; the tool showed that Naive Bayes is more accurate. It obtained 100% accuracy compared to ANN, which has 72.73% accuracy (Vijayarani & Dhayanand, 2015). They used the Naive Bayes and Support Vector Machine (SVM) to predict four types of kidney disease, and the result showed that the SVM it the best performance and accuracy. It was 76.32 when compared to Naive Bayes, which has 70.96%.
▪ They used Probabilistic Neural Networks (PNN), Multilayers Perceptron (MLP), Support Vector Machine (SVM), and Radial Basis Function (RBF) techniques for the prediction n stages of kidney disease; the result showed that the PNN is the highest accuracy 96 .7%, comparing with others, the SVM 60.7%, RBF 87%, MlP 51.5%, (Subas et al., 20 17) they used ANN, SVM, C4.5 decision tree, KNN, and Random Forest, the result showed that the KNN 95.75%, C4.5 decision tree 99%, SVM 98.5%, ANN 98%, and finally the random forest it was highest accuracy 100%.
▪ Similarly, (Webster, 2017) used C4.5, SVM, KNN, NB, and MLP. The result showed that C4.5 63% has the highest accuracy, SVM 60.25, KNN 58.25%, NB 57.5%, and MLP 62.25% (Kriplani, 2019). They used a Deep Neural Network to predict kidney disease and obtain an accuracy of 97%. The study (Otunaiya, 2019) focuses on using Naive Bayes, Multilayer Perceptron, and J48 Decision Tree; the result showed that J4 8 had the highest accuracy at 87.3%.

INTERNATIONAL JOURNAL OF SPECIAL EDUCATION Vol.37, No.3, 2 022

Multilayer Perceptron 85.4%, and Naive Bayes 86%. (Devika, 2019) they present a com parative study of a different classifier to predict kidney disease Naive Bayes, KNN, Random Forest, the result showed that the NB 99.635%, KNN 87.78%, and Random Forest t hat obtain the highest accuracy 99.844%.(Zeynu,2018) they used feature selection and ense mble model to improve the accuracy of machine learning classifiers, the techniques are used KNN, J48, ANN, NB, and SVM, the result showed that the KNN 99%, J48 98.75%, ANN 9 9.5%, NB 99%, SVM 98.25%. (Abdelaziz et al, 2018) they proposed a hybrid model for pre diction the kidney disease based on cloud-

IoT by using two techniques, which are Linear Regression and Neural Network, the result sh owed that their model obtains on 97.8% of accuracy. In our research we used python tool t o preprocessing the data, and used five data mining techniques to predict CKD it include s KNN, NB, SVM, Decision Tree, and Logistic Regression. Our experiment was on P ython, Weka, and Orange.

Diagrams and Tables

The area/domain of CKD can be illustrated using a table that outlines the stages of CKD bas ed on the glomerular filtration rate (GFR), as mentioned earlier.

In addition, a diagram can be used to show the common causes of CKD, including diabetes, high blood pressure, glomerulonephritis, and polycystic kidney disease.

Here is the methodology diagram related to chronic kidney disease.

Here is a diagram related to the prediction of kidney disease stages using data mining algorithms:

Methodology Block Diagram of Chronic Kidney Disease (CKD)

Machine learning is helping predict chronic kidney disease in healthcare. By using different algorithms, we can detect the disease early, provide personalized care, and improve patient outcomes. Ongoing research and innovation will continue to enhance these models for better effectiveness in healthcare.

Chronic Kidney Disease Prediction Source Code

To Learn More

Are machine learning models biased in CKD prediction?

Machine learning models can inherit biases present in training data. It is crucial to address and mitigate bias by incorporating diverse datasets and employing fairness-aware algorithms. Continuous efforts are being made to enhance the fairness and equity of these models.

How can patients benefit from CKD prediction using machine learning?

Early detection through machine learning models allows for timely intervention and personalized treatment plans. Patients can benefit from improved outcomes, reduced complications, and a better quality of life.