Enriching Gum Disease Prediction using Machine Learning


Download Enriching Gum Disease Prediction using Machine Learning


Preview text

IJSTE - International Journal of Science Technology & Engineering | Volume 3 | Issue 11 | May 2017 ISSN (online): 2349-784X
Enriching Gum Disease Prediction using Machine Learning

Kirti Nagane Student
Department of Computer Engineering JSPM’S Bhivarabai Sawant Institute of Technology &Research (BSIOTR) Savitribai Phule Pune University India

Nikita Dongre Student
Department of Computer Engineering JSPM’S Bhivarabai Sawant Institute of Technology &Research (BSIOTR) Savitribai Phule Pune University India

Anshita Dhar Student
Department of Computer Engineering JSPM’S Bhivarabai Sawant Institute of Technology &Research (BSIOTR) Savitribai Phule Pune University India

Divya Jadhav Student
Department of Computer Engineering JSPM’S Bhivarabai Sawant Institute of Technology &Research (BSIOTR) Savitribai Phule Pune University India

Prof. Bharat Burghate Assistant Professor
Department of Computer Engineering JSPM’S Bhivarabai Sawant Institute of Technology &Research (BSIOTR) Savitribai Phule Pune University India

Abstract

Gum disease being the awful circumstance faced by people at the moment, promptly requires prediction of the same with least flaws. The system designed in this paper assists in detecting the disease by performing the prognosis at very initial level with all possible symptoms. The most common form of gum diseases are gingivitis and periodontitis can be detected using techniques of Machine Learning. "Hidden Marcov Model" applied along with Dempster Shaffer Reasoning in this system, helps in diagnosis of the symptoms that might be hidden. Most of the techniques may have performance snag related to Gum Disease Detection Systems scarcely not produce desired output. This paper concentrates on overcoming the issues relevant to the performance by proposing a novel idea for Gum Disease Detection Systems. The software developed emphasizes on disease detection and predicting whether the patient is suffering from gum disease by taking input as symptoms along with their images that is processed using clustering methodologies. The desktop application thus generates the resultants in the form of ranges by processing over the provided symptoms and images. Keywords: Gingivitis, Periodontitis, K-Means, Shannon Information Gain, Baum Welch Algorithm, Dempster-Shaffer Reasoning ________________________________________________________________________________________________________

I. INTRODUCTION
Gum diseases are the ones if caused and not cured may lead to severe consequences. The concept described in this paper helps patients to detect their disease which might avoid the further complications. The symptoms along with the images of the affected areas of the gums, of patients are recorded and analyzed using algorithms such as K-means clustering that are further processed using Hidden Marcov Model (HMM) and Dempster Shaffer Reasoning. The main types of gum disease are the initial staged gingivitis further leading to periodontitis.
The k-means algorithm is an unsupervised learning algorithm. The system makes use of k-means for labeling and clustering the patients according to the information mentioned by them. K-means clustering main aim is to split-up number of observations “n” into “k” clusters in which individual observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The equation can be given as
Where,

J(V)= ‘||xi -vj||’ is the Euclidean distance between xi and vj. ‘ci’ is the number of data points in ith cluster. ‘c’ is the number of cluster centers. Assuming X={x1 , x2, x3 ,…….xn } be data points and V= {v1,v2,…….,vc} be centers. The k-means clustering can be viewed by below pictorial format.

All rights reserved by www.ijste.org 273

Enriching Gum Disease Prediction using Machine Learning (IJSTE/ Volume 3 / Issue 11 / 052)

Fig. 1: Clustering of elements
The another important algorithm used is Hidden Marcov Model (HMM). The concept described in this algorithm is about finding out the hidden states. In HMM the state is not precisely visible but the output relying on the state is visible. It mainly includes two important types of algorithms- forward algorithm and backward algorithm for calculating the possibilities. The forward algorithm used is for calculating the belief state-the probability of particular state at certain time, known as filtering.
The forward algorithm is mostly used in applications that needs to determine the probability of a specific state when sequence of observations are known. The forward-backward algorithm is an inference model for HMM. It computes all hidden state variables X€{X1,X2,….Xn} {\displaystyle X_{k}\in \{X_{1},\dots ,X_{t}\}}, the distribution process also known as smoothing. In first phase the forward possibilities are identified according to the k observations. In second phase the backward possibilities by observing remaining observations.
The images taken as input also go through the processing along with the remaining parameters and the outcomes are generated in the form of ranges. The patient with the value “0.2” being considered as least affected with the disease or healthy gums on contrary to that the values between “0.8-1.0” reflecting very high chances of gum disease.
II. LITERATURE SURVEY
The papers deals with the history of neural networks can be outlined back to the work of trying to model the neuron[1]. It should be noted that huge mass of work has also been produced on modelling biological systems with artificial neural
networks. Now, latest kind of defense technology is Network Intrusion Detection System which is one of the vibrant areas in network security. In previous years many techniques are available for intrusion detection. In this paper, a detailed survey of important techniques based on intrusion detection is presented by means of the extensive utilization of computer networks the quantity of attacks has developed widely, and numerous hacking tools and intrusive methods have been appeared. Also the coordination of the techniques based on neural network, k-means, hybrid techniques, support vector machine etc., is provided. Detection rate and false alarm rate are considered for extensive analysis.
This paper describes [2] the first standard of a neuron was by physiologists, McCulloch and Pitts. The standard they created had two inputs and a single output. McCulloch and Pitts recognized that a neuron would not activate if only one of the inputs was active. The one of the most common and deadly diseases in the world is Brest cancer. Breast cancer is a malignant tumor that has flourish grow from cells of the breast. A major cause of death among women in developed countries is Breast cancer. The most powerful way to reduce breast cancer deaths is detect it earlier. However, earlier treatment requires the ability to detect breast cancer in early stages. An accurate and dependable diagnosis procedure that allows physicians to distinguish begin breast tumors from malignant ones requires. Early diagnosis. The automatic diagnosis of breast cancer is an important, real-world medical problem. Thus, finding an accurate and effective diagnosis method is very imperative. In this paper the author has shown how neural networks ART Model in actual clinical diagnosis of Breast cancer is used. ART model, a diagnostic system that observes an accuracy level is established here. In this performance, the work of neural network with ART structure was to dig for Breast cancer diagnosis problem.
This paper describes [3 ] the design to develop two fuzzy logic based systems which monitors the anesthesia, to detect the hypovolaemia and to compare the usability and acceptability of these two systems that are real alarm system (RT-SAAM) and fuzzy logic monitoring system-2(FLMS-2).As many expert systems were operated offline but in this new online monitoring techniques like fuzzy logic based diagnosis, artificial neural network based diagnosis ,and statistically/probability based techniques are used for detection purposes. To assist anesthesiologist in rapid processing of large amount of information available from the available equipment and conveying this information into meaningful manner for fast intervention, as it is helpful for identification of changes in patient status. These systems can be used in real time environment for detection of absolute hypovolaemia and generate alerts.FLMS-2 has ability to more accurately detect the differences between the levels of hypovolaemia. Drawback of this is need more refinement and enhancement with additional features for routine clinical use.
Fuzzy logic is an function of thinking logically that seems to human understanding. The approach of fuzzy logic emulate the style of decision making that includes all intermediate prospect between digital values 0 and 1 or YES and NO. The traditional logic block that computer perceives takes accurate input and generates an output as true or false i.e 1 or 0 which is identical to human's YES or NO. Fuzzy logic includes the range of possibilities between YES and NO such as certainly yes, possibly yes, cannot say, possibly no, certainly no. It works on the levels of prospects of input to attain definite output.

All rights reserved by www.ijste.org

274

Enriching Gum Disease Prediction using Machine Learning (IJSTE/ Volume 3 / Issue 11 / 052)
In this paper [4] fuzzy-based approach is used to decision making on determination of the most effective treatment method for acute periodontitis. The most effective treatment function is determined using fuzzy ranking method. The methodology allows estimation entirety fuzzy value for a considered fuzzy alternative under conditions of indefinite probabilities. This methodology is based on fuzzy value function described as fuzzy number-valued Choquet integral with fuzzy number-valued integrand and fuzzy number-valued fuzzy measure. The fact when for the two most expressed stages probabilities are assigned linguistically (imprecisely) by the dentist but for the rest and the least expressed stage the dentist finds difficultly in evaluation of its occurrence probability. So, to determine the optimal treatment method first an unknown imprecise (fuzzy) probability is obtained given imprecise (fuzzy) probabilities for the most expressed stages of the disease. Drawback of this is it does not provide any generalizable results, as it needs a lot of data to develop a fuzzy system and the program has to be executed for particular patient.
This paper[5] represents the early detection of disease which plays an crucial role in successful therapy. In most cases early diagnosis of disease may lead to prevention of it. If the diseases are not diagnosed until the horrible symptoms become visible in the late phase then it may lead to horrible disease. For this purpose with help of ANN, researcher are finding molecular disease biomarkers that releases a hidden mortal warning before the complication of disease. ANN provides accurate and real-time assessments of periodontal diseases and to prognosticate periodontal treatment outcomes.
This section of paper [6] focus on efficiency. Today’s vast amount of application’s data being processed and take lots of time which makes algorithm inoperative. To make algorithm efficient transformation is done. A common approach where complex objects finally translates them into multidimensional vectors so that feature similar objects are retrieved and applied on query (object). The property which is being translated corresponds to similarity of objects corresponding to a small distance (calculated by Euclidean distance) .The similarity search used in this paper is index structure where feature transformed data from database is indexed and featured based similarity search is carried out.
[7]The paper interprets the idea for designing a system to detect various diseases using K-means algorithm. The main goal of this system is to increase the quality of service and reduce the medical expenses thus helping the patients. The data is taken as input and then clustered accordingly using k-means algorithm .Thus predicts through this algorithmic steps the possibilities of the disease the patient suffering. It hence, discovers various pattern using data mining concepts for further prediction of the diseases.
The biomedical based system is developed in this paper [8].The concept described in this paper is the use of Learning Vector Quantization Neural Network (LVQNN) for categorizing the internal carotid artery Doppler signals. It basically consists of two important stages namely feature extraction and classification. In the feature extraction phase the power spectral density (PSD) are evaluated of internal carotid artery Doppler signals using Burg autoregressive (AR) spectrum analysis technique. In the classification phase the LVQ NN is used to classify the features obtained from Burg AR. The classification performance is also compared with various other methodologies as in -decision tree and Support Vector Machine (SVM), K-nearest neighbor Naive Bayes etc. The results generated thus states LVQ NN being the most optimal way for classification of the internal carotid artery Doppler signals.
The system proposed in this paper[9] interprets the idea of developing a system for diagnosis of Parkinson’s disease in an effective and efficient way. The main concept used here for detection of disease is the fuzzy k-nearest neighbor(FKNN).This FKNN based system is analyzed comparatively with the support vector machine(SVM) based methodology. Thus evaluating the results which specifies the FKNN system to be more fruitful than the SVM related to the fields like the flawless classification sensitivity, specificity and AUC. Providing an precise and effortless way for the patients to diagnose the PD.
The idea used here, helps in detecting the lung cancer for patients in an effective manner by applying some clustering algorithms. The system developed in this paper[10] fundamentally makes use of Foggy K-means approach and K-means approach ,comparing the both. It makes uses of real time dataset on which processing is performed the outcomes thereafter produced justifies the cluster validity parameters to be more satisfactory in case of Foggy K-means approach.
III. PROPOSED METHODOLOGY

Fig. 2: System Overview of the proposed model

All rights reserved by www.ijste.org

275

Enriching Gum Disease Prediction using Machine Learning (IJSTE/ Volume 3 / Issue 11 / 052)
The proposed methodology to predict the gum disease can be explain elaborately with the below mentioned steps, and the over view of the system can be view in figure 2. 1) Step 1: This is the initial step of our proposed approach where user inputs parameters like age, sex, symptoms and risk factors
to get the desired probability of gum disease. As soon as system receives these parameters it reads the data set which is workbook format and stores in two dimensional vector lists for further processing. 2) Step 2: preprocessing and Labeling - Data set contains 10 attributes like name, age, sex, symptoms, risk factors, genetics, obesity, stress, medications and image name. So preprocessing step sorts this data which was in two dimensional vector list by selecting the needed attributes. So in the beginning system selects some attributes for the labeling process, selecting attributes are like age, sex, symptoms and risk factors. Then these attributes are labeled with integer values according to their risk levels to ease the process of clustering. 3) Step 3: K-means clustering - Integer labeled data is considered for clustering process where distances of the rows are estimated using mean Euclidean distance scheme. Random data points are identified to get centroids which helps to design cluster ranges. These cluster ranges are validates the process of cluster formation. 4) Step 5: Information gain - Here in this step distribution factor of the input parameters are analyzed by using Shannon information gain. Shannon information gain yields distribution entity in between 0 and 1. Any values which are nearer to 1 is always consider as the highly distributed factor and it is consider as the most important factor. So this step applies info gain method on input data from the user who is willing to check for the probability of the gum disease. For this purpose system uses the Shannon information gain theory which can be state with the help of the following equation 1.
IG(E) = -(P/T) log (P/T) - (N/T) log (N/T) Where P= Frequency of the entity’s present count in the clusters N= Non presence count T= Total number of clusters. IG (E) = Information Gain for the given Entity On applying this equation, which yields the value in between 0 to 1. The data size which are having value nearer to 1 indicating the highest priority in the list. 5) Step 6: Baum welch Algorithm - Here a forward probability is identified based on the complete value of information gain that
is 1. So only the rows are selected for which each attributes are satisfying perfect information gain values which yields forward probability. These forward probability data are reasonable to extract the backward probability of rows of data set for the attributes like genetics, stress, medications and obesity. Then the average ratios of backward and forward probability make the transition matrix. This transition matrix is used to evaluate the HMM probability using Baum-welch methodology. 6) Step 7: Gum image ratio estimation: Here in this step for the entire identified forward probability data rows respective gum image is been extracted. Then a ratio is been estimated for availability of number of white pixels to total number of pixels in the image. Gum image ratio can be identified using following algorithm1.
Algorithm 1: Gum image ratio
Input: Image Output: Gum Image
 Step 0: Start  Step 1: Get Image path.  Step 2: Get Height and width of the Image F (L*W).  Step 3: initialize count=0  Step 4: FOR i=0 to width.  Step 5: FOR j=0 to Height.  Step 5: Get a Pixel at (i, j) as signed integer.  Step 6: Convert pixel integer value to Hexadecimal to get R, G, and B.  Step 7: IF (R=255 and G=255 and B=255)  Step 8: count++;  Step 9: End of inner FOR  Step 10: End of outer FOR  Step 11: gir=count/ (L*W) (gir: Gum image Ratio)  Step 12: return gir  Step 13: Stop 7) Step 8: Dumpster Shaffer rules- Here in this step on analyzing the labeled data for backward probability a rule is been set for the entities like genetics, Stress, medication and obesity.

All rights reserved by www.ijste.org

276

Enriching Gum Disease Prediction using Machine Learning (IJSTE/ Volume 3 / Issue 11 / 052)

Then finally Dumpster –Shaffer Reasoning identifies the proper reasonable evaluation factor for gum disease to predict the probability.

IV. RESULTS AND DISCUSSIONS

Proposed system of gum disease prediction is deployed by using java technology on Netbeans 6.9.1 as IDE. Proposed model uses

MAE as its performance measuring parameter where performance of the system is estimated with human predicted parameters.

MAE refers to mean absolute Error that is evaluated by the comparison of human estimated prediction values. MAE can be

represented with by the following equation of 2.

MAE= ∑ i,j | r i,j – r’i,j | / N

(2)

Where ri,j denotes the evaluation of gum disease prediction by the human for the set of attributes i by the user j.

r’i,j denotes evaluation of gum disease prediction by the system for the set of attributes i by the jth run. So a mean MAE is

measured to get the quality of our system by using of Normalized Mean Absolute Error (NMAE) metric. We define our NMAE to

be the standard MAE normalized by the mean of the expected grade values as follows:

NMAE = MAE / (∑ i, j r i,j /N)

(3)

Where smaller NMAE value means higher Quality of the proposed system.

On different trials of our experiment it yields different NMAE as listed in table 1.

Fig. 3: MAE evaluation table

Fig. 4: MAE performance evaluation
The plot in the figure 4 clearly indicates that machine learning method for gum disease prediction is having lesser NMAE when compared to the standard human prediction of gum disease.
V. CONCLUSION AND FUTURE SCOPE
Most of the time due to negligence patients often keep skipping the appoints with the Dentist and this may leads to the worst effect on the gums and tooth. So proposed methodology develops a system to predict the gum disease based on the parameters given by the user. System uses machine learning approach for this where a vigorous scrutinization is done using HMM and Dumpster Shaffer theory to yield best results of gum disease detection.
And when the system performance is compared with that of standard dentist’s performance, system found there is very nominal error from the developed methodology.
Proposed system can be enhance in the future by using various parameters and it can be develop as mobile app so that user can download and check for the gum disease on the go.

All rights reserved by www.ijste.org

277

Enriching Gum Disease Prediction using Machine Learning (IJSTE/ Volume 3 / Issue 11 / 052)
REFERENCES
[1] “A Literature Survey and Comprehensive Study of Intrusion Detection”, Sravan Kumar Jonnalagadda, Ravi Prakash Reddy I, Volume 81 November 16,2013. [2] “A Review of Breast Cancer Detection using ART Model of Neural Networks” , Sonia Narang, Harsh K Verma, Uday Sachdev, Volume 2, Issue 10, October
2012 [3] “Fuzzy logic based anaesthesia monitoring systems for detection of absolute hypovolaemia”, Mirza Mansoor Baig, Hamid GholamHosseini, Michael
J.Harrison, 0010-4825/$-see front matter & 2013 Elsevier Ltd [4] “Selection of an Optimal Treatment Method for Acute Periodontitis Disease”, Rafik A. Aliev & B. F. Aliyev & Latafat A. Gardashova & Oleg H. Huseynov,
6 October 2009 /Accepted: 7 May 2010 [5] “Salivary biomarkers in the diagnosis of periodontal diseases”, Jeffrey J. Kim, Christine J. Kim, and Paulo M. Camargo, J Calif Dent Assoc. 2013 Feb; 41(2):
119–124. [6] “Similarity Search and Data Mining: Database Techniques Supporting Next Decade's Applications”, Christian Boehm, 1998 [7] “Comparative Analysis of K-Means Algorithm in Disease Prediction”, K.Rajalakshmi, Dr.S.S.Dhenakaran, N.Roobini, International Journal of Science,
Engineering and Technology Research (IJSETR), Volume 4, Issue 7, July 2015 [8] “Detection of Carotid Artery Disease by Using Learning Vector Quantization Neural Network”, Harun Uğuz, Springer Science+Business Media, LLC 2010 [9] “An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach”, Hui-Ling Chen, Chang-Cheng Huang, Xin-
Gang Yu, Xin Xu, Xin Sun, Gang Wang, Su-Jing Wang, 2012 Elsevier Ltd. [10] “Clustering of Lung Cancer Data Using Foggy K-Means”, Akhilesh Kumar Yadav, Divya Tomar, Sonali Agarwal, International Conference on Recent Trends
in Information Technology (ICRTIT),2013

All rights reserved by www.ijste.org

278

Preparing to load PDF file. please wait...

0 of 0
100%
Enriching Gum Disease Prediction using Machine Learning