Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine Learning The healthcare industry is one of the most important domains for data mining. Every day, the healthcare business creates extended data regarding patients, diseases, hospitals, medical equipment, treatment costs, etc. Data mining assists clinicians in making proper treatment decisions and disease prediction in the early stages, which helps avoid or lessen the effects of diseases like heart disease, cancer, and chronic kidney disease, among others. Chronic Kidney Disease (CKD) is characterized by kidney damage or decreased function, as indicated by a glomerular filtration rate (GFR) of less than 60 ml/min per 1.73 m2 or both for at least three months. The kidneys are unable to filter blood properly. Approximately 80 million Americans are at risk of CKD. So, predicting chronic kidney disease is critical for clinicians to make an informed judgment about whether the patient is infected and to give treatment in the early stages to prevent the patient from developing the disease.In 2016, CKD was the tenth most significant cause of death in the US. Over 500000 individuals received dialysis, and 200000 underwent kidney transplants. It affects an estimated 37 million people in the United States, or around 15% of adults. It is more common in women (15%) than in males (12%). Approximately 80 million Americans are at risk of CKD. And almost 90% of those people are unaware they have CKD. The dataset collects, analyzes, and distributes information regarding chronic kidney and stage renal disease in the United States. CKD is a condition in which kidneys are damaged and cannot filter blood as well as they should. Because of this, additional fluid and wastes accumulate from the blood remains in the body and may cause other health problems such as heart disease and stroke. So, this dataset will help to predict the machine learning techniques. SVM in classificaction problems SVM classifies the output into Two classes with CKD and without C KD main objective of this study is to predict the patient with CKD using a smaller number of attributes while maintaining the access accuracy Our main parameter will be the GLOMERULAR FILTRATION RATE called vital parameters. Another parameter will be blood circulation rate, age, gender, and other characteristics can be used to calculate this. Recommended Reading AI Music Composer using Machine Learning Real-Time Object Detection Using Machine Learning 30 Creative Final Year Projects with Source Code Background Study The majority of CKD is increasing worldwide. In the United States, over 37 million people have CKD, with the majority of cases going untreated. CKD is also a prominent cause of death, especially among elderly persons. Diabetes, high blood pressure, obesity, and smoking are all risk factors for chronic kidney disease (CKD). Despite efforts to improve the early detection and management of CKD, many patients do not receive adequate care, and better techniques for diagnosing and managing CKD are required. Objective of Study The primary goal of this study is to create and test machine learning models for predicting the risk of CKD using patient data. We want to develop models that accurately predict the existence of CKD using demographic, clinical, and laboratory data. ➢ Evaluate the effectiveness of various feature extraction strategies for finding key predictors of CKD. ➢ Evaluate the effectiveness of several machine learning methods for predicting CKD risk. ➢ Evaluate the effect of sample size and data imbalance on model performance. ➢ Identify critical factors linked with CKD risk and improve diagnosis and management. Classification Algorithms The classification techniques used in this research: Logistic Regression: a statistical model used to predict a dependent variable based on a given set of independent variables; it uses a logistic function to build a model to predict binary values. Naive Bayes: a classifier calculated the probability of a given dataset to perform classification. Each attribute in data is independent of others. The highest likelihood of class is the output class. Decision Tree: This technique is one of the decision support techniques that apply a graph model and its likely values; it consists of nodes, branches and leaves; each node represents a test of variables, branches represent the test results, and the leaves represent the class label. Also, it is a way of presenting a conditional algorithm. K-Nearest Neighbor (KNN) is among the simplest machine learning algorithms. It is the non-parametric method used for classification and prediction. It can be used to give weight to the contributions of the neighbours, so the nearer neighbours contribute more to the average than more distant ones. Support Vector Machine (SVM): is machine learning algorithm that is very useful i n solving classification problems. It is used to classify data in an imaginary line, providing that a barrier separates the points from each other. Data mining Tools Python WEKA tool Orange tool Recommended Reading Stock Price Prediction system using Machine Learning Real-Time Object Detection Using Machine Learning Ecommerce Sales Prediction using Machine Learning Data mining technique in chronic kidney disease ▪ Many researchers use data mining techniques to predict kidney disease (Kunwar et al., 2016). The authors used the classification techniques like Naive Bayes and Artificia l Neural Network (ANN); their experiment was in the Rapid Miner tool; the tool showed that Naive Bayes is more accurate. It obtained 100% accuracy compared to ANN, which has 72.73% accuracy (Vijayarani & Dhayanand, 2015). They used the Naive Bayes and Support Vector Machine (SVM) to predict four types of kidney disease, and the result showed that the SVM it the best performance and accuracy. It was 76.32 when compared to Naive Bayes, which has 70.96%. ▪ They used Probabilistic Neural Networks (PNN), Multilayers Perceptron (MLP), Support Vector Machine (SVM), and Radial Basis Function (RBF) techniques for the prediction n stages of kidney disease; the result showed that the PNN is the highest accuracy 96 .7%, comparing with others, the SVM 60.7%, RBF 87%, MlP 51.5%, (Subas et al., 20 17) they used ANN, SVM, C4.5 decision tree, KNN, and Random Forest, the result showed that the KNN 95.75%, C4.5 decision tree 99%,