Decipherable deep belief network method of feature importance analysis for road safety status prediction
11556800 · 2023-01-17
Assignee
Inventors
Cpc classification
G06F17/18
PHYSICS
International classification
G06F17/18
PHYSICS
Abstract
A method for visualizing and analyzing contributions of various input features for traffic safety status prediction is provided. The method includes initializing a deep belief network (DBN) with input features; performing unsupervised learning/training by observing changes of weights of the input features during the unsupervised learning/training; when the unsupervised learning/training process is complete, performing supervised learning/training process by generating a reconstructed input layer based on results of each hidden layer; and continually running the supervised learning/training and generating a weight diagram based on both visualization and numerical analysis that calculates contributions of the input features. The input features may include one or more of annual average daily commercial traffic (AADCT), median width, left shoulder width, right shoulder width, curve deflection, and exposure for traffic safety status prediction.
Claims
1. A method for visualizing and analyzing contributions of various input features for traffic safety status prediction, comprising steps of: 1) initializing a deep belief network (DBN) with input features; 2) performing unsupervised learning/training by observing changes of weights of the input features during the unsupervised learning/training; 3) when the unsupervised learning/training is complete, performing supervised learning/training by generating a reconstructed input layer based on results of each hidden layer; and 4) continually running the supervised learning/training and generating a weight diagram based on both visualization and numerical analysis that calculates contributions of the input features, wherein the contribution of each of the input features is defined by a linear function shown in following Equation:
Fl.sub.i=Fl.sub.i.sup.unsup+Fl.sub.i.sup.sup wherein Fl.sub.i is importance of an input feature i, Fl.sub.i.sup.unsup is importance of the input feature i obtained from the unsupervised learning/training, and Fl.sub.i.sup.sup is importance of the input feature i obtained from the supervised learning/training.
2. The method of claim 1, wherein the initializing a deep belief network (DBN) comprises pre-setting the DBN with a V-H1-H2-O structure, wherein V represents input neurons, H1 and H2 represent hidden neurons in two hidden layers, respectively, and O represents output for prediction, and wherein weights are randomly pre-set.
3. The method of claim 1, wherein the observing changes of weights during unsupervised learning/training is performed with a focus on magnitude of each input feature.
4. The method of claim 2, wherein the performing unsupervised learning/training comprises training a first restricted Boltzmann (RBM) machine comprising the input neurons V and the hidden neurons in the first hidden layer H1 based on greedy unsupervised learning/training.
5. The method of claim 1, wherein the generating a reconstructed input layer based on each hidden layer is performed by differentiating an activation area and a non-activation area.
6. The method of claim 1, wherein the supervised learning/training comprises generating a diagram of the weights such that whether a secondary consideration of the method is determined to exist after it is taught by a teacher.
7. The method of claim 6, wherein if the secondary consideration is determined to exist, a resulting image different from a resulting image of the unsupervised learning/training is generated.
8. The method of claim 1, wherein the numerical analysis that calculates contributions of input features determines whether the input feature is accepted or rejected.
9. The method of claim 1, wherein when the performing unsupervised learning/training or the performing supervised learning/training is complete, results of the learning/training are evaluated based on values of mean absolute error (MAE) and values of root mean square error (RMSE).
10. The method of claim 1, wherein the input features comprise one or more of annual average daily commercial traffic (AADCT), median width, left shoulder width, right shoulder width, curve deflection, and exposure for traffic safety status prediction.
11. A non-transitory computer-readable medium comprising program instructions stored thereon that, when executed, cause a processor to perform a method for visualizing and analyzing contributions of various input features for traffic safety status prediction, the method comprising steps of: 1) initializing a deep belief network (DBN) with input features; 2) performing unsupervised learning/training by observing changes of weights of the input features during the unsupervised learning/training; 3) when the unsupervised learning/training is complete, performing supervised learning/training by generating a reconstructed input layer based on results of each hidden layer; and 4) continually running the supervised learning/training and generating a weight diagram based on both visualization and numerical analysis that calculates contributions of the input features, wherein the contribution of each of the input features is defined by a linear function shown in the following Equation:
Fl.sub.i=Fl.sub.i.sup.unsup+Fl.sub.i.sup.sup wherein Fl.sub.i is importance of an input feature i, Fl.sub.i.sup.unsup is importance of the input feature i obtained from the unsupervised learning/training, and Fl.sub.i.sup.sup is importance of the input feature i obtained from the supervised learning/training.
12. The non-transitory computer-readable medium of claim 11, wherein the initializing a deep belief network (DBN) comprises pre-setting the DBN with a V-H1-H2-O structure, wherein V represents input neurons, H1 and H2 represent hidden neurons in two hidden layers, respectively, and O represents output for prediction, and wherein weights are randomly pre-set.
13. The non-transitory computer-readable medium of claim 11, wherein the observing changes of weights during unsupervised learning/training is performed with a focus on magnitude of each input feature.
14. The non-transitory computer-readable medium of claim 12, wherein the performing unsupervised training comprises training a first restricted Boltzmann (RBM) machine comprising the input neurons V and the hidden neurons in the first hidden layer H1 based on greedy unsupervised learning/training.
15. The non-transitory computer-readable medium of claim 11, wherein the generating a reconstructed input layer based on each hidden layer is performed by differentiating an activation area and a non-activation area.
16. The non-transitory computer-readable medium of claim 11, wherein the supervised learning/training comprises generating a diagram of the weights such that whether a secondary consideration of the method is determined to exist after it is taught by a teacher.
17. The non-transitory computer-readable medium of claim 16, wherein if the secondary consideration is determined to exist, a resulting image different from a resulting image of the unsupervised learning/training is generated.
18. The non-transitory computer-readable medium of claim 11, wherein the numerical analysis that calculates contributions of input features determines whether the input feature is accepted or rejected.
19. The non-transitory computer-readable medium of claim 11, wherein when the performing unsupervised learning/training or the performing supervised learning/training is complete, results of the training are evaluated based on values of mean absolute error (MAE) and values of root mean square error (RMSE).
20. The non-transitory computer-readable medium of claim 11, wherein the input features comprises one or more of annual average daily commercial traffic (AADCT), median width, left shoulder width, right shoulder width, curve deflection, and exposure for traffic safety status prediction.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DISCLOSURE OF THE INVENTION
(9) The embodiments of the subject invention pertain to a visual feature importance (ViFI) method built on feature importance evaluation and visualization for performing sensitivity analysis and presenting diagrams based on the results of the analysis. As a result, better understanding of the learning/training process is obtained to provide effective evaluation of contributions of various input features on black box features of the learning/training process, allowing improved output decisions for road safety status prediction.
(10) The ViFI method can be based on decipherable deep belief network performing both a unsupervised learning/training process and a supervised (“fine-tuning”) learning/training process to assess importance of each input feature.
(11) In some embodiments of the subject invention, the ViFI method can be divided into four steps as described below:
(12) 1) initializing a deep belief network (DBN) structure with learning/training parameters;
(13) 2) observing changes of weights during the unsupervised learning/training process, focusing primarily on magnitudes of each input feature;
(14) 3) when the unsupervised learning/training process is complete, generating a reconstructed input layer utilizing each hidden layer, and by showing the activation and non-activation areas, obtaining better understanding of the knowledge learned;
(15) 4) continually performing the supervised (“fine-tuning”) learning/training step and generate the weights diagram based on both the visualization and the numerical analysis that calculates the contribution of input features that is either accepted or rejected.
(16) In the first step, the weights are randomly pre-set for a given deep belief network with a V-H1-H2-O structure, wherein V represents input neurons, H1 and H2 represent hidden neurons in two hidden layers, respectively, and O represents output for prediction.
(17) In the second step, a first restricted Boltzmann machine (RBM) including V and H1 is trained based on a greedy unsupervised learning/training process. The feature learning/training process and the weights updating process can be described by Equations (1)-(6) below.
(18)
(19) Equation (1) represents the starting state of input data (values between 0 and 1), with weights W between the two layers randomly given (all zeros are preferred for easier calculation in the following steps). v.sub.i and h.sub.j are neurons in V and H1. Equations (2)-(4) are the feature learning/training equations of the RBM. Particularly, V.sup.0, H.sup.0, V.sup.1, and H.sup.1 are the four states recorded during transformation. p( ) is the probability of a neuron being activated, w.sub.ij is the weight between i in V and j in H1, and b and c are the biases.
(20) Moreover, the weights are updated by applying Equations (5)-(6). As the weights are all initialized to be zero at the beginning, in the unsupervised learning/training process if a feature is determined to be important, the weights between the specific feature neuron and hidden layer 1 will be strengthened, leading to a negative αAW in Equation (5) since more neurons will have a value of 1 in V.sup.1 and H.sup.1. If a feature is determined to be unimportant by the learning/training process, the ΔW will be set to a positive value, and since the V.sup.1 and the H.sup.1 are mostly 0, values of the W.sup.t+1 will keep increase.
(21) Referring to
(22) In
(23) In the third step, the supervised (“fine-tuning”) learning/training process is performed in which the same method of step 2 is then applied to produce a diagram for the weights. This process allows a determination of any secondary considerations the method uses after it is taught by a teacher. If these considerations exist, the resulting image would be different from the resulting image obtained from the unsupervised learning/training process. The supervised (“fine-tuning”) learning/training process is described by Equations (7)-(10) below,
(24)
(25) wherein, E.sub.W is the objective function used in a back-propagation network which calculates the error between the target output O.sub.target and observed output O.sub.observed; T is the testing set; n is the number of layers, thereby O.sub.n is the output of the whole network, O.sub.n=O.sub.observed; and H.sub.k is the vector value of layer k.
(26) When the supervised (“fine-tuning”) learning/training process is complete, the changes of weights between hidden layer 1 and the input are exported along with the diagram. By applying Equation (10), an increase of the weights is indicated as a negative ΔW.
(27) In the forward process, a sigmoid transfer function,
(28)
is applied to represent data flow from input to hidden layer 1. This function is ascending and all values of input and hidden layer units are over 0. As a result, bigger weights would lead to bigger p( ). Assuming that the ViFI method is not over-fitted, the bigger p( ) values are preferred as they suggest that the connections to the input feature are significant. On the other hand, if the weights decrease, the feature that they are connected to may be unimportant. These features may be different from the ones that a user of the ViFI method would identify as important, because an artificial intelligence (AI) method may learn differently from human beings.
(29) The mean value of the weights on each feature is then calculated based on the results of the unsupervised learning/training process and the results of the supervised (“fine-tuning”) learning/training process. Since Equation (6) which is a weight updating equation is linear, the contributions of the feature learning/training process are defined based on a linear function shown in Equations (11)-(13) below, in which Fl.sub.i is the importance of feature i, Fl.sub.i.sup.unsup indicates importance of i in the unsupervised learning/training and Fl.sub.i.sup.sup is the importance after fine-tuning. w.sub.i.sup.n represents the weights that connect to i in epoch n, V represents a number of features and H represents a number of hidden units.
Fl.sub.i=Fl.sub.i.sup.unsup+Fl.sub.i.sup.sup (11)
Fl.sub.i.sup.unsup=Σw.sub.i.sup.0−Σw.sub.i.sup.n/H (12)
Fl.sub.i.sup.sup=1/1+e−(Σw.sub.i.sup.n−Σw.sub.i.sup.o/H) (13)
(30) In the fourth step, the supervised (“fine-tuning”) learning/training step is continually performed and the weights diagram is generated based on the visualization and the numerical analysis that calculates the contribution of input features, the input feature being determined to be either accepted or rejected.
(31) The ViFI method allows effective deciphering of the method's inner workings and allows the important/significant features to be identified and the unimportant/bad features to be eliminated. Subsequently, the revised dataset can be applied to the ViFI method in crash and vehicle collision prediction for improving road traffic safety.
(32) Experiment 1:
(33) In this exemplary experiment, the historical data from Highway 401, a multilane-access controlled highway in Ontario, Canada are used. The highway is one of the busiest highways in North America and connects Quebec in the east and the Windsor-Detroit international border in the west. Approximately 800 km of the total length of 817.9 km of the highway was selected for the experiment. According to 2008's traffic volume data, the annual average daily traffic ranges from 14,500 to 442,900, indicating a relatively busy road corridor.
(34) The processed crash and traffic data of this experiment are integrated into a single dataset with homogenous sections and having a total of 3,762 records with the year being used as the mapping fields. The six input features of the dataset are annual average daily commercial traffic (AADCT), median width, left shoulder width, right shoulder width, curve deflection, and exposure.
(35) The description of continuous input features is summarized in Table 1 which includes the sample sizes for learning/training and testing. After the learning/training process is complete, the performance of each method is estimated based on mean absolute error (MAE) and root mean square error (RMSE), as defined by Equations (14)-(15) below.
(36)
(37) TABLE-US-00001 TABLE 1 Summary of the dataset (Highway 401, Ontario) Variables Mean Max Min St. dev. Sample size Collisions (per year) 23.81 468 0 50.02 Total: 3762 AADT (veh/day) 76633 442900 12000 91476 (year 2000-2008) Segment Length (km) 1.95 12.7 0.2 2.06 Training: 2926 AADCT (veh/day) 13993 42076 0 6719 (year 2000-2006) Median width (m) 11.11 30.5 0.6 6.14 Testing: 836 Shoulder width-right (m) 3.14 4 2.6 0.28 (year 2007-2008) Curve deflection (per km) 0.19 1.86 0 0.35 Shoulder width-left (m) 1.6 5.19 0 1.19
(38) In Equations (16)-(20):
(39)
(40) x.sub.i and y.sub.j are the continuous values of unit i and j in two layers; w.sub.ij is the weight between them; N(0,1) is a Gaussian random variable with mean 0 and variance 1; σ is a constant; φ(X) denotes a sigmoid-like function with asymptote of θ.sub.H and θ.sub.L; α is a variable that controls noise; F.sub.W is the new optimization function in fine-tuning; R.sub.W is the Bayesian regularization item for inhibiting over-fitting by controlling the values of weights; and α and β are performance parameters that can be calculated during the iteration.
(41) An embodiment of the ViFI method of the subject invention was first performed based on the unsupervised learning/training process. The method was initialized with six input neurons, one for each feature, namely, exposure, AADCT, left shoulder width, median width, right shoulder width, and curve deflection; two hidden layers with ten neurons in each layer, and one output layer that contains only one neuron for vehicle collision prediction.
(42) The weights between input and hidden layer 1 can be written as,
W1=(w.sub.11,w.sub.12, . . . ,w.sub.1 10,w.sub.21, . . . ,w.sub.ij,w.sub.61, . . . ,w.sub.6 10),
(43) where i (from 1 to 6) and j (from 1 to 10) are neurons in the two layers. A visualization of the structure that highlights how the weights form the different connections between layers and how they are updated is illustrated in
(44)
(45) According to the analysis described above, the darker the color is, the more important the feature is. The more important a feature is, the more knowledge the hidden layer needs to learn, thus the bigger the difference will be. Therefore, it is noted that all features may be regarded as important in the unsupervised learning/training process, especially the features of exposure and curve flection represented by the input neurons 1 and 6, respectively.
(46) Subsequently, the Equation (14) was applied to the hidden layer to reconstruct the input data. By comparisons, the patterns of the reconstructed features from the two hidden layers are determined to be similar, suggesting similar or equal feature learning/training ability.
(47) Then the supervised (“fine-tuning”) learning/training process is performed for about 5,000 iterations of learning/training. The changes of weights between the input layer and the hidden layer 1 are visualized in
(48) In particular,
(49) In addition,
(50) Further, when the supervised (“fine-tuning”) learning/training process is complete, the weights that join each feature and the black box are shown in
(51) The results of the above steps were determined to be [0.428, 0.117, 0.143, 0.084, 0.087, 0.393], for the six features, namely, exposure, AADCT, left shoulder width, median width, right shoulder width, curve deflection, respectively. After the supervised (“fine-tuning”) learning/training process is complete, as the weights updating is based on a nonlinear function, the changes of the contributions can be defined based on a sigmoid function. The results are then determined to be [0.928, −0.321, 0.688, 0.589, 0.635, 1.015] which are shown in
(52) Referring to
(53) Referring to Table 2, four methods including negative binomial (NB), kernel regression (KR), back propagation neural networks (BPNN), regularized deep belief network (R-DBN) are compared. The NB method is one of the most popular methods used in real-world applications, the KR and BPNN methods are two popular traditional machine learning/training methods, and the R-DBN method is an improved version of DBN which is one of the most significant methods in deep learning.
(54) Referring to the results of Table 2, it is noted that the decoded R-DBN method of the subject invention demonstrates more excellent performance when compared to other conventional methods, and the decoded R-DBN of the subject invention outperforms the original version of R-DBN by achieving a minimal MAE value of 7.58 and a minimal RMSE value of 15.03. Based on the results, the feature importance using traditional numerical method and deep neural nets are compared in Table 2. Similar trends of the feature importance are observed to show that deep neural network not only correctly identifies the unimportant features but also makes better use of the important features.
(55) TABLE-US-00002 TABLE 2 Method Testing Comparison Numerical DNN DNN Min Min Calculated Unsupervised Final Methods MAE RMSE FI Learning FI FI NB 11.80 26.60 / / / KR 8.85 17.85 / / / BPNN 8.60 16.51 / / / R-DBN 8.00 15.24 0.000 / / R-DBN without 11.83 26.02 0.228 0.428 0.928 Featurel R-DBN without 7.58 15.03 −0.053 0.117 −0.321 Feature2 R-DBN without 9.02 19.03 0.128 0.143 0.688 Feature3 R-DBN without 8.82 19.20 0.101 0.084 0.589 Feature4 R-DBN without 8.34 15.95 0.043 0.087 0.635 Feature5 R-DBN without 9.24 17.86 0.155 0.393 1.015 Feature6 R-DBN without 9.54 17.21 / / / Hid-layer2
Note: NB represents negative binomial, KR represents kernel regression, BPNN represents back propagation neural network, R-DBN represents regularized deep belief network, and FI represents feature importance.
(56) Referring to
(57) The method of the subject invention is built on visualization, feature importance and sensitivity analysis, allowing the contributions of input variables on the “black box” features of the learning/training process and the output decision to be effectively evaluated. Moreover, the method can intuitively highlight the areas that respond positively or negatively to the inputs in deep neural networks. Accordingly, the method of the subject invention enables users to understand the black box feature of the learning/training process, provides potentials to analyze the contributions of the various input features, and intuitively highlights areas respond positively or negatively to the inputs. Furthermore, how a deep neural network, especially in the unsupervised learning/training process, studies differently from other methods is demonstrated, allowing effective deciphering of the method's inner workings, identifying the important features and removing the unimportant features, such that a more accurate road safety condition can be predicted.
(58) Thus, embodiments of the subject invention could be used in development of road safety management and alarm systems. In reality, the input dataset on which the road safety management and alarm systems are based are collected from different geographical regions, resulting in varying feature importance. The visualization, analysis, and evaluation provided by the embodiments of the subject invention help the users develop more accurate road safety status prediction. Potential exemplary applications include, but not limited to, SPF analysis, signal process filtering, and structure design filtering.
(59) All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
(60) It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and the scope of the appended claims. In addition, any elements or limitations of any invention or embodiment thereof disclosed herein can be combined with any and/or all other elements or limitations (individually or in any combination) or any other invention or embodiment thereof disclosed herein, and all such combinations are contemplated with the scope of the invention without limitation thereto.
REFERENCES
(61) [1] A. Adadi, and M. Berrada, Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access, vol. 1, pp. 1, 2018. [2] S. Wojciech, T. Wiegand, and K. Müller, Explainable artificial intelligence:
(62) understanding, visualizing and interpreting deep learning models, 2017. [3] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning. Nature, vol. 521, no. 7553, pp. 436-444, 2015. [4] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, Practical black-box attacks against deep learning systems using adversarial examples, arXiv preprint arXiv: 1602.02697, 2016. [5] N. Narodytska, and S. P. Kasiviswanathan, Simple black-box adversarial attacks on deep neural networks, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 6-14, 2017. [6] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi, A survey of methods for explaining black box models, ACM Computing Surveys (CSUR), vol. 51, no. 5, article. 93, 2018. [7] D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, Machine learning interpretability: a survey on methods and metrics. Electronics, vol. 8, no. 8, pp. 832, 2019. [8] M. Du, N. Liu, X. Hu, Techniques for interpretable machine learning. arXiv 2018, arXiv:1808.00033 [9] A. Shrikumar, P. Greenside, and A. Kundaje, Learning important features through propagating activation differences, arXiv preprint arXiv:1704.02685, 2017. [10] M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, UrsMuller, and K. Zieba, VisualBackProp: Visualizing CNNs for autonomous driving, CoRR, vol. abs/1611.05418, 2016. [11] M. D. Zeiler, and R. Fergus, Visualizing and understanding convolutional networks, In European Conference on Computer Vision, pp. 818-833, 2014. [12] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. R. Müller, Evaluating the visualization of what a deep neural network has learned, IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 11, pp. 2660-2673, 2017. [13] M. F. Hohman, M. Kahng, R. Pienta, and D. H. Chau. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE transactions on visualization and computer graphics, pp. 1-20, 2018. [14] M. F. Hohman, H. Park, C. Robinson, and D. H. Chau. Summit: Scaling deep learning interpretability by visualizing activation and attribution summarizations. IEEE Transactions on Visualization and Computer Graphics (TVCG). Vancouver, Canada, 2020. [15] R. Garcia. A task-and-technique centered survey on visual analytics for deep learning model engineering. Computers and Graphics, vol. 77, pp. 30-49, 2018. [16] R. Shwartz-Ziv, and N. Tishby, Opening the black box of deep neural networks via information, arXiv preprint arXiv:1703.00810, 2017. [17] P. W. Koh, and P. Liang, Understanding black-box predictions via influence functions, In Proceedings of the 34th International Conference on Machine Learning, pp. 1885-1894, 2017. [18] J. Thiagarajan, B. Kailkhura, P. Sattigeri, and K. Ramamurthy, Tree-View: peeking into deep neural networks via feature-space partitioning, arXiv preprint arXiv:1611.07429, 2016. [19] Y. Lee, A. Scolari, B. Chun, M. D. Santambrogio, M. Weimer, and M. Interlandi, PRETZEL: opening the black box of machine learning prediction serving systems, In 13th USENIX Symposium on Operating Systems Design and Implementation, pp. 611-626, 2018. [20] M. Honegger, Shedding light on black box machine learning algorithms: development of an axiomatic framework to assess the quality of methods that explain individual predictions, arXiv preprint arXiv:1808.05054, 2018. [21] Guangyuan Pan a, Liping Fu a,b, , Lalita Thakali, Development of a global road safety performance function using deep neural networks, International Journal of Transportation Science and Technology 6 (2017) 159-173.