Visuospatial disorders detection in dementia using a computer-generated environment based on voting approach of machine learning algorithms
11250723 · 2022-02-15
Assignee
Inventors
- Areej Yahya Omar Bayahya (Jeddah, SA)
- Wadee Saleh Alhalabi (Jeddah, SA)
- Sultan Hassan Alamri (Jeddah, SA)
Cpc classification
G16H20/30
PHYSICS
A61B3/032
HUMAN NECESSITIES
A61B5/4088
HUMAN NECESSITIES
G06F3/04815
PHYSICS
Y02A90/10
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
A61B3/0025
HUMAN NECESSITIES
G16H50/20
PHYSICS
A61B5/7275
HUMAN NECESSITIES
A61B5/744
HUMAN NECESSITIES
International classification
G09B19/00
PHYSICS
G16H20/30
PHYSICS
G16H50/20
PHYSICS
G06F3/0481
PHYSICS
A61B5/00
HUMAN NECESSITIES
Abstract
A system and methodology combines virtual reality (VR) with a plurality of machine learning analyses, and uses majority voting to detect dementia and the diseases under dementia. The accuracy of the classification in the Medical Visuospatial Dementia test is very high.
Claims
1. A visuospatial disorders detection method, comprising: presenting to a subject a three dimensional (3D) virtual reality environment in which the subject utilizes a multidirectional input device to input answers to questions and to guide an avatar on a path through the 3D virtual reality environment, wherein said multidirectional input device at least moves front, back, left and right; receiving input data for the subject generated by the subject's use of the multidirectional input device which comprises answers to questions input by the subject, coordinates and direction of the avatar relative to the path through the 3D virtual reality environment, and a time period used by the subject to guide the avatar along the path through the 3D virtual reality environment; supplying the received input data into a plurality of machine learning algorithms which utilizes correct and incorrect answers input by the subject, number of changes in direction of the avatar as the subject moves the avatar relative to the path through the 3D virtual reality environment, and the time period used by the subject to guide the avatar along the path through the 3D virtual reality environment, wherein the plurality of machine learning algorithms comprise Decision Tree Classifier, Extra Tree Classifier, AdaBoost Classifier, XGB Classifier, Gradient Boosting Classifier, Support Vector Classifier, Random Forest Classifier, Multinomial Naive Bayes, K-Neighbors Classifier, and Multilayer Perceptron; using machine learning with each of the plurality of machine learning algorithms to classify the subject into one of three classification labels selected from the group consisting of normal, demented, and mild cognitive impairment; and feeding results obtained with each of the plurality of machine learning algorithms into a system of voting to produce a final classification, wherein the system of voting comprises hard voting which predicts the final classification based on a most frequently used classification label produced by the machine learning using the plurality of machine learning algorithms.
2. The method of claim 1 wherein the input data received from the subject answering questions and guiding the avatar relative to the path through the 3D virtual reality environment represent testing of both memory and visuospatial function.
3. The method of claim 1 wherein the input data received from the subject answering questions and guiding the avatar relative to the path through the 3D virtual reality environment represent testing of each of navigation, visual memory, and memory function.
4. A visuospatial disorders detection method, comprising: presenting to a subject a three dimensional (3D) virtual reality environment in which the subject utilizes a multidirectional input device to input answers to questions and to guide an avatar on a path through the 3D virtual reality environment; receiving input data for the subject generated by the subject's use of the multidirectional input device which comprises answers to questions input by the subject, coordinates and direction of the avatar relative to the path through the 3D virtual reality environment, and a time period used by the subject to guide the avatar along the path through the 3D virtual reality environment; supplying the received input data into a plurality of machine learning algorithms which utilizes correct and incorrect answers input by the subject, number of changes in direction of the avatar as the subject moves the avatar relative to the path through the 3D virtual reality environment, and the time period used by the subject to guide the avatar along the path through the 3D virtual reality environment, wherein the plurality of machine learning algorithms comprise Decision Tree Classifier, Extra Tree Classifier, AdaBoost Classifier, XGB Classifier, Gradient Boosting Classifier, Support Vector Classifier, Random Forest Classifier, Multinomial Naive Bayes, K-Neighbors Classifier, and Multilayer Perceptron; using machine learning with each of the plurality of machine learning algorithms to classify the subject into one of three classification labels selected from the group consisting of normal, demented, and mild cognitive impairment; and feeding results obtained with each of the plurality of machine learning algorithms into a system of voting to produce a final classification, wherein the system of voting comprises soft voting which predicts the final classification based on averaging classification labels produced by the machine learning using the plurality of machine learning algorithms.
5. The method of claim 4 wherein the input data received from the subject answering questions and guiding the avatar relative to the path through the 3D virtual reality environment represent testing of both memory and visuospatial function.
6. The method of claim 4 wherein the input data received from the subject answering questions and guiding the avatar relative to the path through the 3D virtual reality environment represent testing of each of navigation, visual memory, and memory function.
Description
DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with a general description of the invention given above, and the detailed description given below, serve to explain the invention.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9) Vote Algorithm.
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
DETAILED DESCRIPTION
(22) In an embodiment of the invention, there is a system combining a model for patient information storage and retrieval, cognitive test-based VR System, and MLAs for classifying the patients' condition. Results obtained using this system are described below. The designed model contains different cognitive methods for measuring the impairment of patients' cognitive abilities and uses classification tools to determine whether the patient has cognitive impairment based on the data extracted from the system. This unique work combined three main parts, which are as follows: VR testing, multi-MLAs, and voting approach, and the combination makes this work unique. The performance of participants was compared with individuals diagnosis based on traditional neuropsychological tests used for the same cognitive domains: i) early and moderately severe dementia; ii) MCI; and iii) older adults who have normal cognitive. It was designed specifically for the Arabs in the Middle East, from both the educated and non-educated class, but the general operation of the system and methodology is not limited to individuals from specific geographic areas. The system was made with off-the-shelf consumer components and is designed to work in any neurology or clinical facility as a quantitative assessment of patients along with other cognitive or non-cognitive approaches. The system was tested on 115 real patients from Dr. Soliman Fakeeh Hospital, King Abdul-Aziz Hospital, International Medical Centre and Association of Elderly People Friends. Thirty of those individuals have a cognitive impairment (dementia), sixty-five are cognitively healthy, and ten have a mild cognitive impairment. The age of all patients was higher than 50 years old and they had both educated or non-educated backgrounds. The nature of the collected data is discrete, non-parametric, non-normalized, and labeled. Consequently, supervised and classification algorithms were used to classify the patients. The data is used as input for the MLAs that perform a classification at the output end. In addition, for the MLAs, a series of statistical indicators were computed: accuracy, sensitivity, precision and specificity. The architecture model used to classify dementia patients is shown in
(23) Clinical and Demographic Information
(24) Enrollment is the first stage that is performed when using the system. This provides a record that can be used for all subsequent patients' outcome measures being generated. In the process, the Assistant/Nurse enrolls the patients by providing the following patient information to the system: Personal information Patient history and medical history Vision impairment problems. Depression, past head injuries or exposure to solvents. Clinical diagnosis
Patients Process & Mechanism of Data Collection
(25) Firstly, the patient reaches at clinic, then, he or she goes to the first phase
(26) Secondly, the patient receives a clinical test At the second phase, the doctor tests the patient based on cognitive tests (pen paper test), such as Mini-cog test. The doctor evaluates the patient's performance of daily tasks. The doctor registers the total scores on paper or records them in a computer. Alternatively, the doctor gives the patient's scores to the assistant, and the assistant inserts these data into the system and directly the data stores into the database. The database preferably has three Tables, one for Patients' history, another for patient's scores, and the last for the patient's path coordinates. Pre-processing and analyzing the data by statistical methods to compare VR results vs clinical results and MLA with voting approach to classify the patients and detect the disease in any new patient
Visuospatial Function
(27) Visuospatial function is commonly conceptualized in three components: visual perception, construction, and visual memory [29]. The task involves detecting and localizing a point in space, detecting and judging direction and distance, and detecting topographical orientation.
(28) Navigational Task
(29) As depicted in
Visual Memory Task
(30) There are two main concepts under visual memory: recall (or recognition) of visual information and topographical memory; where topographical memory includes encoding and perception of spatial orientation to walk in the surrounding environment [11]. With reference to
Memory Function
(31) Memory is the most predominant cognitive dysfunction domain preceding the diagnosis of dementia. Elements addressed by this invention are focused on memory delay recall and visual memory.
(32) Memory Registration and Delayed Recall Task
(33) To measure memory deficits in patients with dementia we used the three words recall algorithm [30]. It allows the patients to use the natural and intuitive way to measure the level of memory deficit. With reference to
Outcomes Measurements
(34) A number of factors are calculated to detect cognitive impairment in the patients: time to completion, accomplishment VR score, patients' history, and neuropsychological assessment. Amount of times they changed direction, the total time they took to arrive to their destination were recorded; and total time they took to finish the visual memory task VR scores include: navigational ability, spatial orientation, memory recall, visual memory correct, and visual memory incorrect.
Machine Learning Algorithms
(35) Machine Learning Algorithms (MLA) learn the relationships between different input data for patients such as (test scores, time spent, etc.). The classification of patients depends on outcome data from each patient. In this work, these algorithms are used to classify patients into three classes: i) cognitive impairment (dementia); ii) cognitively healthy older adult; and iii) mild cognitive impairment. However, the system has more than one MLA to vote by majority voting approach for a higher rating and gives reliable information, high accuracy in the diagnosis and classification of patients.
(36) As shown in
(37) Classification Process
(38) Classification is the task of identifying and sorting the objects from certain groups into their appropriate categories by building a model based on one or more numerical and/or categorical variables (predictors or attributes). The goal of classification is to be predictable for each data accurately and correctly [31]. The main idea of classification is to build by identifying the objects in certain groups and assigning to their appropriate categories—predictors or attributes [26].
(39) In various aspects of the invention, classification methods were performed where classifying patients depends on multiple algorithms i.e. Decision Tree [32], Extra Trees [33], AdaBoost [34], XGB [35], Gradient Boosting [36], SVC [37], RANDOM FOREST® [38], Multinomial NB [39, 40], K-Neighbors [41], and MLP [42]. These algorithms are used for disease diagnosis as they led to good accuracy. After that, a voting approach namely Ensemble Vote [43, 44] are used to vote the most frequently used from the latter MLAs. The next paragraphs will discuss the classification methods that were applied.
(40) Decision Trees Classifier
(41) A Decision Tree Classifier is defined as a multistage classification strategy, which is a classifier expressed as a recursive partition of the instant space [45]. It is an attribute-vector approach, and can be applied to the tree, leaf node of the tree labeled with a class, or a structure containing a test. The classification process is completed by performing the test onto the attributes, reaching one or another leaf. The Decision Tree Classifier builds hyperplanes/partitions to divide the space between the classes [45].
(42) Extra Trees Classifier
(43) The Extremely Randomized Trees Classifier is an extremely randomized version of the Decision Tree Classifier, and is a type of ensemble supervised learning technique that fits a number of randomized decision trees [33]. It is used for improving the predictive accuracy by using the average of the data within a dataset. It is very similar to a RANDOM FOREST® Classifier but differs in the construction of the decision trees.
(44) AdaBoost Classifier
(45) Introduced in 1995 by Freund and Schapire [34], the principle of AdaBoost is to fit a sequence of weak learners where the predictions are combined through a weighted majority vote to produce the final prediction [36]. AdaBoost can be used for multi-class classification.
(46) Gradient Boosting Classifier
(47) This classifier is used for classification tasks and supports both binary and multi-class classification. It creates a strong predictive model from combining many weak learning models together [36], and is used to reduce the loss between the training actual class and the predicted class value.
(48) XGB Classifier
(49) This classifier is a system optimization that is a customized version of the Gradient Boosting Decision Tree system. It is a tool used to push the extreme of the computation limits of what is possible for gradient boosting algorithms to provide a portable, scalable, and accurate library [35].
(50) RANDOM FOREST® Classifier
(51) RANDOM FOREST® Classifier is an ensemble algorithm that consists of a large number of relatively uncorrelated models (trees) [46]. A class prediction comes from each individual tree in the random forest. Then, the most votes of the class becomes our model's prediction [46].
(52) Multinomial Naive Bayes (NB)
(53) Multinomial Naive Bayes is a uni-gram language model with integer word counts [39]. It is an appropriate distribution when the data consists of counts [39]. It should be used for the features with discrete values like 1,2,3. This approach has also been used for text classification.
(54) Support Vector Classifier (SVC)
(55) The most applicable machine learning algorithm is Support Vector Classifier. It builds an optimal hyperplane which is used for linearly separable patterns [37]. The optimal hyperplane is elected after fitting the data and returning the best fit hyperplane for classifying patterns [47].
(56) K-Neighbors Classifier
(57) This is a non-parametric method used for either classification or regression. The data is classified by voting the K-closest neighbors training in the feature space [41]. To find the closest similar points, the distance between points can be determined by using distance measures such as Manhattan distance, Euclidean distance, etc. [41]. The prediction comes from the most votes of each object for their class. The models will be generated with no requirement for training data points.
(58) Multilayer Perceptron
(59) Multilayer Perceptron is a classical type of neural network. MLPs are suitable for classification prediction because they are capable of mapping highly non-linear relations between inputs and outputs and provide a good performance [42].
(60) Generalize MLA Results Using Voting Approach
(61) The VR machine learning system aims to combine multiple pieces of evidence to arrive at one prediction by a voting approach such as majority voting. An embodiment of this invention employs an applied Ensemble Vote [43, 44] using all MLA methods that are mentioned above to get the accurate classification. This approach gives a better predictive performance compared to a single model. That is why ensemble methods placed first in many prestigious machine learning competitions. Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance, bias, or improve predictions. when patients' data increases, using a single algorithm may give an opposite result or less accuracy and the result is inconsistent. Consequently, it is necessary to use the majority voting approach instead of choosing a single algorithm as a final result. Ensemble Vote is a list of classifiers that combine similar or different ML classifiers into a single model for classification via majority voting as shown in
(62)
Where hj are given different classification rules, and i) is an indicator function [44]
(63)
where pij is the probability estimate from the jth classification rule for cation rule for the ith class [44]
Performance Evaluation
(64) As discussed above, we have used different MLAs that measured how accurately the ML algorithms classified patients into three classes: cognitive impairment (dementia), cognitively healthy and mild cognitive impairment.
(65) Evaluation metrices such as sensitivity, specificity, accuracy, F1, precision, Mean Squared Error (MSE), ROC curve, Micro-average and Macro-average were used to determine the performance of ML models. The different MLAs that were used to measure the classification of patients included: Extra Trees, SVC, AdaBoost, K-Neighbors, XGB, Decision Tree, MLP, Multinomial NB, and RANDOM FOREST®. Below, training and testing phase will be explained, the performance results of MLAs and learning curve will be discussed in detail. After that, the results will be generalized by a voting approach, namely Ensemble Vote [43]. Visualization data is also described.
(66) Training and Testing Phase
(67) In the training phase, ten classifiers were used to train the data. The procedure starts from splitting the data into 70% training dataset and 30% testing dataset. Then, each approach builds its model (with a specific structure). After that, the models are tested to check their effectiveness using measures such as accuracy, sensitivity, specificity, MSE, F1, Micro-avg, Macro-avg and ROC curve.
(68) Evaluation Performance of ML Model
(69) After testing phase, different metrices were used to evaluate the performance of ML models: sensitivity, specificity, accuracy, F1, precision, ROC curve, MSE, Micro-average and Macro-average. The evaluation metrics were extracted from Confusion Matrix (CM) where the latter gives a summary of prediction results on a classification problem (see
(70) As shown in
(71) TP for Dementia class in SVC=CM [1][1]=4,
(72) FN for Dementia class in SVC=CM [1][0]+CM [1][2]=0,
(73) TN for Dementia class in SVC=CM [0][0]+CM [2][2]+CM [0][2]+CM [2][0]=32,
(74) FP for Dementia class in SVC=CM [0][1]+MC [2][1]=0.
(75) The data below demonstrates that many models are satisfactory and that some models better than others at predicting which patients have dementia.
(76) As shown in Table II, most of the algorithms have a high accuracy 97.22%, which means that most of the participants are assigned to the right class. The actual error rates, are suggested as performance measures for the classifications procedure. As shown in Table I, the actual error rate was between 11≤AER≤0.22 which is an acceptable misclassification rate. In the development of this invention, 10-fold cross validation procedures were used to train the data and to validate the model effectiveness well. Cross-validation is a technique which trains a particular set from the whole dataset, while it reserves the remaining data by splitting it into 10 folds. Then, it builds the model on 10 folds of the dataset. After the model is built, it is tested to check the effectiveness for the 10th fold. This procedure is repeated with the latter steps while recording the accuracy and errors until each of the ten folds has served as the test dataset. Performance metrics of the model were extracted from the average of k records. 10-fold cross validation procedures showed high values in all models. 10-fold cross validation procedures showed high values in all models where the data set is split into 10 folds. The highest percentage was 99.14% for RANDOM FOREST® as well as for Extra Trees as shown in Table II.
(77) TABLE-US-00002 TABLE II Evaluation Metrices to Determine the Performance of the Machine Learning Models Machine Actual Error Cross Validation Learning Algorithms Accuracy Rate (AER) Accuracy Extra Trees 97.22% 0.22 99.14% AdaBoost 97.22% 0.11 97.43% MLP 97.22% 0.22 96.58% XGB 94.44% 0.22 97.43% Decision Tree 97.22% 0.11 97.43% Gradient Boosting 97.22% 0.11 98.29% SVC 97.22% 0.11 98.29% K-Neighbors 94.44% 0.22 89.74% Random Forest 94.44% 0.22 99.14% Multinomial NB 91.66% 0.33 85.47%
(78) Other metrices used in the development of this invention were sensitivity, specificity and precision. Sensitivity [49] is the proportion of true positives that are correctly identified by the test.
(79) The sensitivity of cognitively healthy participants in all methods was between 0.96 to 1 (i.e.: 100%), which means that most of the cognitively healthy participants are predicted to be cognitively healthy. The sensitivity of dementia patients (as shown in Table III) is 100% for all models. In other words, the proportion of participants suffering from the disease who were correctly identified as the ones suffering from the disease is 100% for each of the models. Similarly, the sensitivity of cognitively impaired patients MCI is 100% in Extra Trees, AdaBoost as well as Gradient Boosting, whereas, the sensitivity of MCI was between 71% to 86% in the rest of models.
(80) Specificity [49] aka recall, is the proportion of true negatives that are correctly identified by the test. The specificity of cognitively healthy participants is 100% in Extra Trees, AdaBoost as well as Gradient Boosting. Furthermore, the specificity of cognitively healthy participants in the rest of the models (as shown in Table III) is between 0.82≤specificity ≤0.91. The higher value of specificity refers to the lower proportion of participants who are unhealthy but got predicted as cognitively healthy [49]. Demented participants have a higher value of specificity equal to 1 in all models except Extra Trees which means no other classes than dementia patients were labeled as dementia class. Similarly, the specificity of MCI class was equal to 1 in most of the models, revealing that no cognitively healthy or demented patients were classified as MCI.
(81) Precision [50] is the proportion of correctly predicted positive values against all the positive predictions. The higher the precision, the better. It helps when a model has very high precision. In contrast, if a model has low precision, that indicates false positives are high; this indicates there is a misdiagnosis. As it can be observed from Table III, the precision of cognitively healthy participants showed the perfect percentage 100% in AdaBoost, Extra Trees as well as Gradient Boosting, whereas the precision of cognitively healthy participants in rest of the MLAs ranged from 92% to 96%. Similarly, cognitively impaired patients showed the perfect percentage 100% precision in all models except Extra Tress (as shown in Table III). In the same way, the findings of precision showed high percentage in most of the models in MCI class.
(82) TABLE-US-00003 TABLE III Validation Metrices to Determine the Performance of the Machine Learning Models Precision Sensitivity Specificity Machine Learning Cog. Cog. Cog. Algorithms Healthy MCI Dement. Healthy Dement. MCI Healthy Dement. MCI Extra Trees 1.00 0.80 1.00 0.96 1.00 1.00 1.00 0.97 1.00 AdaBoost 1.00 1.00 0.88 0.96 1.00 1.00 1.00 1.00 0.97 MLP 0.96 1.00 1.00 1.00 1.00 0.86 0.91 1.00 1.00 XGB 0.93 1.00 1.00 1.00 1.00 0.71 0.82 1.00 1.00 Decision Tree 0.96 1.00 1.00 1.00 1.00 0.86 0.91 1.00 1.00 Gradient Boosting 1.00 1.00 0.88 0.96 1.00 1.00 1.00 1.00 0.97 SVC 0.96 1.00 1.00 1.00 1.00 0.86 0.91 1.00 1.00 K-Neighbors 0.96 1.00 0.86 0.96 1.00 0.86 0.91 1.00 0.97 Random Forest 0.93 1.00 1.00 1.00 1.00 0.71 0.82 1.00 1.00 Multinomial NB 0.92 1.00 0.83 0.96 1.00 0.71 0.82 1.00 0.97
(83) The F1-Scores are the equally weighted harmonic mean of recall and precision. F1-scores in all MLAs ranged from 94% to 98%. Demented patients showed the perfect percentage 100% for all models except Extra Trees. Similar to the precision and recall results in MCI class, F1 scores showed high-percentage in most of the models i.e. most of the models give high values between 83% to 100%.
(84) When the system classifies multiple class labels, it looks for averaging evaluation measures to generalize the results. Further, in order to ensure that there is a range for the measurement of the various metrices, micro-average and macro-average were used to view the averaging evaluation measures on the general results. The micro-average method is a useful measurement and makes sense when the data size is variant. As shown in Table IV, Micro-avg of Recall, Precision and F1-Scores revealed an accuracy of 97% or 94% in all models except Multinomial NB model. In a multi-class setting, micro-averaged precision and recall are always the same. Therefore, each model has the same accuracy micro-avg of Recall, Precision and F1-scores. Macro-average metrics are used to assess the system performance across variance datasets. So, the values of Macro-average F1-Score ranging from 91% to 97% indicate that the models have a high performance in classifying multiple class labels, depending on averaging evaluation measures.
(85) TABLE-US-00004 TABLE IV Micro-avg and Macro-avg Metrices To Determine the Performance of The Machine Learning Models Recall Precision F1-score Micro Macro Micro Macro Micro Macro MLA Classes avg avg avg avg F1-score avg avg Extra Trees Healthy 0.97 0.99 0.93 0.99 0.98 0.97 0.96 Dementia 0.89 MCI 1.00 AdaBoost Healthy 0.97 0.95 0.97 0.99 0.98 0.97 0.97 Dementia 1.00 MCI 0.92 MLP Healthy 0.97 0.95 0.97 0.99 0.98 0.97 0.97 Dementia 1.00 MCI 0.92 XGB Healthy 0.94 0.90 0.94 .980 0.96 .94 0.93 Dementia 1.00 MCI 0.83 Decision Tree Healthy 0.97 0.95 0.97 0.99 0.98 0.97 0.97 Dementia 1.00 MCI 0.92 Gradient Boosting Healthy 0.97 0.95 0.97 0.99 0.98 0.97 0.97 Dementia 1.00 MCI 0.92 SVC Healthy 0.97 0.95 0.97 0.99 0.98 0.97 0.97 Dementia 1.00 MCI 0.92 KNeighbors Healthy 0.94 0.94 0.94 0.94 0.96 0.94 0.94 Dementia 1.00 MCI 0.86 Random Forest Healthy 0.94 0.90 0.94 .980 0.96 0.94 0.93 Dementia 1.00
(86) Receiver Operating Characteristic (ROC) curve for multiclass data measures the accuracy of rating and diagnostic test results. It is used to determine the optimal cut-off value which generates a curve in the unit square. ROC curve is a graphical plot for multiclass data which measures the accuracy of rating and illustrates the diagnostic test results. It is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings [51]. The minimum acceptable value of area under the curve should be 0.5 [51]. As it can be observed from
(87) As a conclusion, all evaluation metrices used to determine the performance of the ML models revealed that there is a high level of classification accuracy, sensitivity, specificity, precision, F1, ROC curve, Micro-Avg, Macro-Avg. Furthermore, the highest performance rate according to rank assessed from the SVC, MLP, AdaBoost, Extra Trees, Gradient Boosting, and Decision Tree showed that there is no distinction between them.
(88) Generalize MLA Results Using Voting Approach
(89) Table V shows the performance results after the voting approach by hard vote method and all different classifiers have equal weight where the accuracy of classification is 97.22%, sensitivity of dementia patients and cognitively healthy are 100% whereas the sensitivity of MCI is 86%. Specificity and precision of dementia patients and MCI patients are 100%. As shown in
(90) TABLE-US-00005 TABLE V Evaluation Metrices To Determine the Performance by Majority Voting Voting Algorithm (Ensemble Vote) Metrices Cog. Healthy Dement. MCI Precision 0.96 1.00 1.00 Sensitivity 1.00 1.00 0.86 Specificity 0.91 1.00 1.00 F1-Score 0.98 1.00 0.92 ROC-Curve 0.95 1.00 0.93 Micro-avg ROC curve 0.98 Macro-avg ROC curve 0.96 Accuracy 97.22% AER 0.11
(91) As can be seen from the above, a comparison of the classification accuracy of all participants using traditional clinical diagnosis method vs. the VR+machine learning system has been made. The dementia diagnosis at clinic (expert diagnosis) depends on functional evaluation plus a cognitive test such as Mini-Cog test at early stages of disease. As noted above, it was observed that patients' classification at clinic—which depended on the Mini-Cog test with functional evaluation—showed 94% accuracy, whereas VR system combined with navigational ability showed 97.22% using Majority Voting approach described herein.
(92) Overall, the highest performance was derived from Ensemble Vote with value equal to 97.22%, which confirms the reliability of the system. In addition, demented patients' class has a greater discriminate capacity than other classes where all performance results equal to 1. This led to the conclusion that there is no overlap between them.
(93) Today, disease diagnosis is an important task. Computers play a vital role as a decision support system in the diseases diagnostic test. This invention combines a model for patient information storage and retrieval, cognitive test-based VR System and ML for classifying the patients' cognitive impairment. The system contains four basic tests with specific tasks. Each task consists of an assessment of a human cognitive field. The system evaluates two human cognitive domains: memory, and visuospatial function. Machine learning algorithms were used to classify patients into three classes: cognitive impairment (dementia), cognitively healthy and mild cognitive impairment. The system relies on a plurality of algorithms where a vote is made between them to choose the correct classification by using Ensemble Vote approach. The example above describes the use of ten algorithms; however, smaller and larger numbers can be employed (e.g., 3-15 algorithms; 3-8 algorithms, 5-10 algorithms; 5-12 algorithms, 8-11 algorithms, etc.).
(94) This virtual reality machine learning platform offers many advantages compared to other more traditional cognitive assessment systems: it is easier and more ecologically valid, and requires less time and resource consumption. Moreover, it evaluates more than one cognitive field to give accuracy in evaluation. This system is unique as it relies on MLA to classify patients. The findings presented herein reveal that many and perhaps all machine learning algorithms have high level of predictions, specificity, precision and sensitivity. In addition, while the findings show low percentage in a few of the models in MCI class, because the sample number of MCI class was very small compared to other groups.
(95) All evaluation metrices used to determine the performance of the machine learning models revealed that there is a high level of accuracy in the classification of patients. Furthermore, the highest performance rated according to rank assessed 97.22% accuracy from the SVC, MLP, AdaBoost, Extra Trees, Gradient Boosting, and Decision Tree showed that there is no distinction between them. Accordingly, after majority voting, the highest performance was derived from Ensemble Vote equal to 97.22%, which confirms reliability of system test. Moreover, the ROC curve of demented patients' class has a greater discriminate capacity than other classes and there is no overlap between them.
ACKNOWLEDGEMENTS
(96) This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, Saudi Arabia under grant no. (KEP-Msc-7-611-38). The authors, therefore, acknowledge with thanks DSR technical and financial support. IRB approval was from Unit of Biomedical Ethics Research Committee under Reference No 535-18-KAU, Dr. Soliman Fakeeh Hospital) DSFH (Policy No GLD-025 under (48/IRB/2019), and from research center of International Medical Centre (IMC) under IMC-IRB #2019-03-104. The authors of this article would like to thank Mr. Abdulrahman Ali and Eng. Mazen Alquliti for their valuable suggestions and helpful comments.
REFERENCES
(97) 1. Geldmacher, D. S. and P. J. Whitehouse, Evaluation of dementia. New England Journal of Medicine, 1996. 335(5): p. 330-336. 2. Alzheimer's, A., 2015 Alzheimer's disease facts and figures. Alzheimer's & dementia: the journal of the Alzheimer's Association, 2015. 11(3): p. 332. 3. Montenegro, J. M. F. and V. Argyriou, Cognitive evaluation for the diagnosis of Alzheimer's disease based on Turing Test and Virtual Environments. Physiology & Behavior, 2017. 173: p. 42-51. 4. Lau, H.-C., et al., Non-invasive screening for Alzheimer's disease by sensing salivary sugar using Drosophila cells expressing gustatory receptor (Gr5a) immobilized on an extended gate ion-sensitive field-effect transistor (EG-ISFET) biosensor. PloS one, 2015. 10(2): p. e0117810. 5. Akgul, C. B. and A. Ekin. A Probabilistic Information Fusion Approach to MR-based Automated Diagnosis of Dementia. in Pattern Recognition (ICPR), 2010 20th International Conference on. 2010. IEEE. 6. Han, S. D., et al., Beta amyloid, tau, neuroimaging, and cognition: sequence modeling of biomarkers for Alzheimer's Disease. Brain imaging and behavior, 2012. 6(4): p. 610-620. 7. Cruz-Oliver, D. M., et al., Cognitive deficit reversal as shown by changes in the Veterans Affairs Saint Louis University Mental Status (SLUMS) examination scores 7.5 years later. Journal of the American Medical Directors Association, 2014. 15(9): p. 687. e5-687. e10. 8. Yeh, S.-C., et al. An innovative virtual reality system for mild cognitive impairment: diagnosis and evaluation. in Biomedical Engineering and Sciences (IECBES), 2012 IEEE EMBS Conference on. 2012. IEEE. 9. Tarnanas, I., et al., Ecological validity of virtual reality daily living activities screening for early dementia: longitudinal study. JMIR Serious Games, 2013. 1(1): p. el. 10. Yamada, E., et al., Assessment and care of visuospatial disorientation in a mixed dementia patient: a case study using objective measurements. Psychogeriatrics, 2016. 16(4): p. 277-282. 11. Salimi, S., et al., Can visuospatial measures improve the diagnosis of Alzheimer's disease? Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring, 2017. 12. Tu, S., et al., Lost in spatial translation-A novel tool to objectively assess spatial disorientation in Alzheimer's disease and frontotemporal dementia. Cortex, 2015. 67: p. 83-94. 13. Pal, A., et al., Study of visuospatial skill in patients with dementia. Annals of Indian Academy of Neurology, 2016. 19(1): p. 83. 14. Mohr, E., et al., Selective deficits in Alzheimer and Parkinsonian dementia: visuospatial function. Canadian Journal of Neurological Sciences, 1990. 17(3): p. 292-297. 15. Frisch, S., et al., Dissociating memory networks in early Alzheimer's disease and frontotemporal lobar degeneration-a combined study of hypometabolism and atrophy. PloS one, 2013. 8(2): p. e55251. 16. Park, J. H., et al., Memory performance on the story recall test and prediction of cognitive dysfunction progression in mild cognitive impairment and Alzheimer's dementia. Geriatrics & gerontology international, 2016. 17. Cordell, C. B., et al., Alzheimer's Association recommendations for operationalizing the  detection of cognitive impairment during the Medicare Annual Wellness Visit in a primary care setting. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 2013. 9(2): p. 141-150. 18. Ellis, S. R., What are virtual environments? IEEE Computer Graphics and Applications, 1994. 14(1): p. 17-22. 19. Weakley, A., et al., Neuropsychological test selection for cognitive impairment classification: A machine learning approach. Journal of Clinical & Experimental Neuropsychology, 2015. 37(9): p. 899-916. 20. Cushman, L. A., K. Stein, and C. J. Duffy, Detecting navigational deficits in cognitive aging and Alzheimer disease using virtual reality. Neurology, 2008. 71(12): p. 888-895. 21. Zakzanis, K. K., et al., Age and dementia related differences in spatial navigation within an immersive virtual environment. Medical Science Monitor, 2009. 15(4): p. CR140-CR150. 22. Lesk, V. E., et al., Using a virtual environment to assess cognition in the elderly. Virtual Reality, 2014. 18(4): p. 271-279. 23. Plancher, G., et al., Using virtual reality to characterize episodic memory profiles in amnestic mild cognitive impairment and Alzheimer's disease: influence of active and passive encoding. Neuropsychologia, 2012. 50(5): p. 592-602. 24. Shamsuddin, S. N. W., et al. VREAD: a virtual simulation to investigate cognitive function in the elderly. in Cyberworids (CW), 2012 International Conference on. 2012. IEEE. 25. Fatima, M. and M. Pasha, Survey of machine learning algorithms for disease diagnostic. Journal of Intelligent Learning Systems and Applications, 2017. 9(01): p. 1. 26. Abdullah, M., et al., A novel adaptive e-learning model matching educator-student learning styles based on machine learning. 2017. 27. Kononenko, I., Machine learning for medical diagnosis: History, state of the art and perspective. Artificial intelligence in medicine, 2001. 23: p. 89-109. 28. Yeh, S.-C., et al., Machine learning-based assessment tool for imbalance and vestibular dysfunction with virtual reality rehabilitation system. Computer methods and programs in biomedicine, 2014. 116(3): p. 311-318. 29. Lezak, M., D. Howieson, and D. Loring, Neuropsychological assessment. 5th edn Oxford University Press. Oxford, New York, ISBN, 2012. 10: p. 9780195395525. 30. Allain, P., et al., Detecting everyday action deficits in Alzheimer's disease using a nonimmersive virtual reality kitchen. Journal of the International Neuropsychological Society, 2014. 20(05): p. 468-477. 31. Sathya, R. and A. Abraham, Comparison of supervised and unsupervised learning algorithms for pattern classification. Int J Adv Res Artificial Intell, 2013. 2(2): p. 34-38. 32. Kaur, G. and A. Chhabra, Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journal of Computer Applications, 2014. 98(22). 33. Geurts, P., D. Ernst, and L. Wehenkel, Extremely randomized trees. Machine learning, 2006. 63(1): p. 3-42. 34. Freund, Y. and R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 1997. 55(1): p. 119-139. 35. Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016. ACM. 36. Ridgeway, G., The state of boosting. Computing Science and Statistics, 1999: p. 172-181. 37. Satyanarayana, N., C. Ramalingaswamy, and Y. Ramadevi, Survey of Classification Techniques in Data Mining. 38. Pumpuang, P., A. Srivihok, and P. Praneetpolgrang. Comparisons of classifier algorithms: Bayesian network, C4.5, decision forest and NBTree for Course Registration Planning model of undergraduate students. in Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on. 2008. 39. McCallum, A. and K. Nigam. A comparison of event models for naive bayes text classification. in AAAI-98 workshop on learning for text categorization. 1998. Citeseer. 40. Nurnberger, A., C. Borgelt, and A. Klose. Improving naive Bayes classifiers using neuro-fuzzy learning. in Neural Information Processing, 1999. Proceedings. ICONIP '99. 6th International Conference on. 1999. 41. Grother, P. J., G. T. Candela, and J. L. Blue, Fast implementations of nearest neighbor classifiers. Pattern Recognition, 1997. 30(3): p. 459-465. 42. Windeatt, T., Ensemble MLP classifier design, in Computational Intelligence Paradigms. 2008, Springer. p. 133-147. 43. Pedregosa, F. a. V., G. and Gramfort, A. and Michel, V., et al., Scikit-learn: Machine Learning in {P}ython. Journal of Machine Learning Research, 2011. 12: p. 2825-2830. 44. James, G., Majority vote classifiers: theory and applications. 1998. 45. Korting, T. S., C4. 5 algorithm and multivariate decision trees. Image Processing Division, National Institute for Space Research-INPE Sao Jose dos Campos-SP, Brazil, 2006. 46. Van Essen, B., et al. Accelerating a random forest classifier: Multi-core, GP-GPU, or FPGA? in 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 2012. IEEE. 47. Pugazhenthi, D. and S. Rajagopalan, Machine learning technique approaches in drug discovery, design and development. Information Technology Journal, 2007. 6(5): p. 718-724. 48. Manliguez, C., Generalized Confusion Matrix for Multiple Classes. 2016. 49. Altman, D. G. and J. M. Bland, Diagnostic tests. 1: Sensitivity and specificity. BMJ: British Medical Journal, 1994. 308(6943): p. 1552. 50. Vafeiadis, T., et al., A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 2015. 55: p. 1-9. 51. Hajian-Tilaki, K., Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian journal of internal medicine, 2013. 4(2): p. 627-635.