Unsupervised classification of encountering scenarios using connected vehicle datasets

Abstract

The present disclosure provides a method in a data processing system that includes at least one processor and at least one memory. The at least one memory includes instructions executed by the at least one processor to implement a driving encounter recognition system. The method includes receiving information, from one or more sensors coupled to a first vehicle, determining first trajectory information associated with the first vehicle and second trajectory information associated with a second vehicle, extracting a feature vector, providing the feature vector to a trained classifier, the classifier trained using unsupervised learning based on a plurality of feature vectors, and receiving, from the trained classifier, a classification of the current driving encounter in order to facilitate the first vehicle to perform a maneuver based on the current driving encounter.

Claims

1. A method in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to implement a driving encounter recognition system, the method comprising: receiving an information from a one or more sensors coupled to a first vehicle; determining, based on the information received from the one or more sensors coupled to the first vehicle, a first trajectory information associated with the first vehicle and a second trajectory information associated with a second vehicle; normalizing the first trajectory information and the second trajectory information to a predetermined number of data points; extracting, based on a current driving encounter comprising the first trajectory information and the second trajectory information, a feature vector; providing the feature vector to a trained classifier, wherein the classifier was trained using unsupervised learning based on a plurality of feature vectors corresponding to driving encounters; and receiving, from the trained classifier, a classification of the current driving encounter in order to facilitate the first vehicle to perform a maneuver based on the current driving encounter.

2. The method of claim 1, wherein the method further comprises determining that the first vehicle and the second vehicle are within a predetermined distance of each other in order to start receiving the information.

3. The method of claim 1, wherein the method further comprises receiving the information for a predetermined time.

4. The method of claim 3, wherein the predetermined time is 5 to 20 seconds.

5. The method of claim 1, wherein the classifier was trained using a clustering technique.

6. The method of claim 5, wherein the classifier was trained using k-means clustering.

7. The method of claim 5, wherein the clustering technique has a number of clusters selected in order maximize between-cluster distance and minimize within-cluster distance.

8. The method of claim 1, wherein the predetermined number of data points is 50 to 400 data points.

9. The method of claim 1, wherein the feature vector is extracted using an autoencoder, and the plurality of feature vectors were extracted using the autoencoder.

10. The method of claim 9, wherein the autoencoder is one of: a long-short term memory autoencoder, a normalized Euclidean distance long-short term memory autoencoder, or a dynamic time warping convolutional autoencoder.

11. The method of claim 1, wherein the feature vector is extracted using a distance-based formula, and the plurality of feature vectors were extracted using the distance-based formula.

12. The method of claim 11, wherein the distance-based formula uses dynamic time warping or normalized Euclidean distance.

13. A method for training an autonomous vehicle, comprising: obtaining trajectories of a plurality of vehicles; identifying each pair of vehicles from the plurality of vehicles with a between-vehicle-distance that is less than a threshold; normalizing the trajectories of each pair of vehicles to a predetermined number of data points; learning features of each normalized trajectory; clustering each trajectory based on the learned features; and training the autonomous vehicle based on the learned features.

14. The method of claim 13, wherein the clustering further comprises k-means clustering.

15. The method of claim 13, wherein the clustering further comprises clustering each trajectory into one of a plurality of clusters, the plurality of clusters comprising a number of clusters selected in order maximize between-cluster distance and minimize within-cluster distance.

16. The method of claim 13, wherein the learning further comprises learning the features of each normalized trajectory using an autoencoder selected from: a long-short term memory autoencoder, a normalized Euclidean distance long-short term memory autoencoder, or a dynamic time warping convolutional autoencoder.

17. The method of claim 16, wherein the learning further comprises extracting features from a hidden layer of the autoencoder.

18. The method of claim 13, wherein the learning further comprises unsupervised learning based on a plurality of feature vectors corresponding to driving encounters, the plurality of feature vectors being extracted by a driving encounter feature extractor.

19. The method of claim 13, wherein the threshold is 100 meters.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows various exemplary driving encounters.

(2) FIG. 2 shows various exemplary steps of unsupervised classification of driving encounters.

(3) FIG. 3 illustrates an exemplary driving encounter.

(4) FIG. 4 illustrates an exemplary autoencoder.

(5) FIG. 5 shows an exemplary geographical selection range of driving encounters in an urban area.

(6) FIG. 6 displays the distribution of duration length of origin driving encounters.

(7) FIGS. 7A-7D show examples of learning representations by different approaches. FIG. 7A shows raw driving encounter GPS trajectories with start points. FIG. 7B shows extracted representations using dynamic time warping (DTW). FIG. 7C shows reconstructed representations of DTW using convolutional autoencoder (ConvAE). FIG. 7D shows extracted representations of using normalized Euclidean distance (NED) of two trajectories and its reconstructed representation from long-short term memory autoencoder (LSTMAE).

(8) FIGS. 8A-8C show statistical results of extracted hidden features. FIGS. 8A-C shows results for DTW-ConvAE, NED-LSTMAE, and LSTMAE respectively.

(9) FIGS. 9A-9D show performance of between-clusters and within clusters distance using five representation extraction approaches. FIGS. 9A-9D compares all approaches to cluster driving encounters through showing the within- and between-cluster metrics (BC and WC), as shown in FIGS. 9A and 9C respectively, and their change rates over the number of clusters k, as shown in FIGS. 9B and 9D respectively.

(10) FIG. 10 shows visual GPS data of clustering results using DTW-kMC with k=10.

(11) FIG. 11 shows clustering results of using DTW-kMC with k=10.

(12) FIG. 12 shows an exemplary embodiment of a vehicle 1210 with one or more sensors.

(13) FIG. 13 shows an example of a process for facilitating a vehicle to perform a maneuver.

DETAILED DESCRIPTION OF THE INVENTION

(14) Driving encounter classification and analysis can benefit autonomous vehicles to achieve a more smart decision. This disclosure presents an unsupervised learning framework to classify driving encounter which composes of two vehicles' GPS trajectories ordered by time. We developed two kinds of approaches, including deep autoencoders and distance-based approaches, to learn underlying representations in terms of both shape and distance of driving encounters. Five specific approaches were developed and evaluated by combining the deep autoencoders and the distance-based approaches. We then applied k-means clustering to categorize driving encounters into several distinguishable groups based on the learned representations. The effectiveness of our unsupervised learning framework for driving encounter classification was demonstrated based on 2568 naturalistic driving encounters, comparing with the state-of-the-art deep learning methods.

Driving Encounter Classification Framework

(15) As shown in FIG. 2, unsupervised classification of driving encounters can include three steps: driving encounter unification, representation extraction, and clustering. Before detailing the framework, we first introduce some mathematical preliminaries to define the driving encounter.

Driving Encounters

(16) A driving encounter can be composed of two vehicle trajectories with the same length of time, where the vehicle trajectories could be continuous or discrete. In our case we may only consider the discrete trajectories and define as follows. Given a discrete observed driving encounter
x={(p.sub.1.sup.(1),p.sub.1.sup.(2),t.sub.1), . . . ,(p.sub.k.sup.(1),p.sub.k.sup.(2)t.sub.k), . . . ,(p.sub.K.sup.(1),p.sub.K.sup.(2),t.sub.K)} (1)
where p.sub.k.sup.(1)∈ custom character .sup.2 and p.sub.k.sup.(2)∈.sup.2 are the positions of two vehicles in the driving encounter at discrete time t.sub.k, K∈ is the length of the driving encounter x, as shown in FIG. 3. In particular, we define X=(x.sub.1, x.sub.2, . . . , x.sub.N) as a set of driving encounters with the number of N.

Driving Encounter Unification

(17) For some clustering algorithms, driving encounters are required to be set as a unified length so that they could be well measured. For two arbitrary driving encounters, their length may be largely different from each other in real traffic. In order to make clustering algorithms tractable for driving encounters, we need to normalize the driving encounters into the same length with little loss of information. Many mature approaches of normalizing trajectories have been developed, see review literature (Ref. 22). Some exemplary approaches are listed below.

(18) Trajectory transformation algorithms, which projects the trajectories into another structured space with fixed parameter. Some popular methods include linear transformation, curving fitting (e.g., uniform cubic B-spline curve), discrete Fourier transformation, etc.

(19) Re-sampling methods, which chooses trajectory points by sampling rule to normalize trajectory lengths with minimal information loss. The re-sampling method with interpolation processes can be very flexible to operate.

(20) Trajectory substitute, which utilizes the basic components of trajectories, termed as sub-trajectories, to represent describe hidden information of original trajectory data.

(21) Points of interest, which is flexible and preferable when research focuses on some specific regions of surveillance area, termed as points of interest, rather than the points outside the regions.

(22) Scale-invariant features, which has been widely used to extract more robust and representative features (e.g., histograms of oriented gradients and histograms of optical flow) from image frames rather than only the position of trajectories.

(23) Trajectory transformation algorithms and re-sampling methods may be implemented with less computational cost than other methods.

Representation Learning

(24) Since the driving encounter is represented as time-series data sequences, it is nontrivial to directly feed the data into clustering algorithms. In order to classify driving encounters into distinguishable groups, we need to select representations capable of characterizing these driving encounters, thus allowing to minimize the difference within clusters and maximize the difference between clusters in terms of the learning representations, instead of directly operating on vehicle trajectories. There are many different approaches to extract representations of driving encounters with respect to their shape or distance, for example, by measuring the distance (Ref. 3, 10) between two trajectories or by using neural networks to learn representations of trajectories (Ref. 11, 23).

(25) In order to distinguish driving encounters and categorize them into groups, certain features capable of capturing the underlying characteristics of the driving encounters should be carefully extracted. Improper representation can lead to unsatisfied results. Therefore, extracting representations from driving encounter is of importance to get satisfied classification performance. Unlike extracting features of images with specific and explicit labels, we can have rare knowledge about the feature space of driving encounters. Fortunately, many approaches to learning representations of time-series trajectories have been developed, which can be taken as reference since driving encounter in nature is also time-series data sequences. In this section, three ways to learn the representations of vehicle trajectories are presented: deep autoencoder, distance-based measure, and shape-based measure.

Deep Autoencoder

(26) Our goal is to find a representation which can characterize a given driving encounter. For the autoencoder, it has been widely used to dig underlying information of time-series data. An autoencoder is a neural network (NN), consisting of encoder and decoder (as shown in FIG. 4), that attempts to copy its input x to its output z, where the hidden layer h describes the code to represent the input x. By training a model to minimize the difference between x and x, the hidden layer h can be treated as a representation of characterizing this driving encounter. Autoencoder varies over the ways to connect the encoder and the decoder, for example, a structure of long-short term memory (LSTM) will make a LSTM autoencoder and a structure of convolutional NN (CNN) will make a convolutional autoencoder. In general, the encoder and the decoder can be mathematically detailed as follows.

Encoder

(27) As shown in FIG. 4, the autoencoder 400 can have an encoder 410. The encoder can include multilayer neural networks, mapping an input x.sub.i∈ custom character .sup.r×s at the i.sup.th-layer into x.sub.i+1 at the (i+1).sup.th-layer by
x.sub.i+1=w.sub.i,i+1.sup.Tx.sub.i+b.sub.i (2)
where w.sub.i,i+1∈.sup.r×s is the weight vector, b.sub.i∈.sup.s is the bias vector, x.sub.i+1∈.sup.s×d. The autoencoder 400 can include a middle layer 420, which can be the representation h of a driving encounter.

Decoder

(28) The hidden representation h from the encoder is then mapped back to z through a symmetric multilayer network.

(29) Subsequently, an autoencoder involved simple multilayer perception can be easily derived using (2) without concerning the past information of observation, which has been investigated in (Ref. 20). For driving behavior, however, previous research demonstrates that it is dynamic process in nature with related to past information (Ref. 24), (Ref. 25). Hence, in this disclosure, we take the past information of observation into consideration by employing two advanced autoencoders: LSTM-Autoencoder (LSTMAE) and convolutional autoencoder (ConvAE).

LSTMAE

(30) In contrast to the simplest autocoder with a multilayer perception, the LSTMAE (Ref. 26) takes advantage of the previous information by adding an information selection function, usually as a sigmoid neural net layer, to decide what information is useful and to remember this helpful information, thus transferring the observation x.sub.i at the i.sup.th-layer to the representation at the (i+1).sup.th-layer. Instead of using (2), the LSTMAE introduces four basic equations to achieve this, formulated by
z.sub.i=σ(w.sub.z.sup.Tl.sub.i+u.sub.zx.sub.i−1+b.sub.z) (3a)
y.sub.i=σ(w.sub.y.sup.Tl.sub.i+u.sub.zx.sub.i−1+b.sub.y) (3b)
{circumflex over (x)}.sub.i=tan h(w.sub.h.sup.Tl.sub.i+u.sub.ly.sub.ix.sub.i−1+b.sub.l) (3c)
x.sub.i=(1−z.sub.i)x.sub.i−1+z.sub.i{circumflex over (x)}.sub.i (3d)
where σ(⋅) is the sigmoid function, w.sub.z, w.sub.y, w.sub.i are the weights, b.sub.z, b.sub.y, b.sub.i are the biases, x.sub.i is the output vector of the LSTM unit. Therefore, given observation x.sub.i−1 for each LSTM unit, we can get the output x.sub.i to next layer.

Convolutional Autoencoder (ConvAE)

(31) Sharing the same structure with LSTMAE, the ConvAE describes the layer relationship using a convolutional operator, instead of using a basic multiply operator. In fact, the encoding and decoding processes can be treated as a procedure of reconstructing observations and hence we can learn hidden representation h by solving the following optimization problem of minimizing the errors between reconstructed observations ˜x and the origin inputs x with the cost function

(32) $\begin{matrix} θ^{*} = \arg \begin{matrix} \min \\ θ \end{matrix} E (θ) & (4) \end{matrix}$
where E(θ)=(x−{tilde over (x)}).sup.T(x−{tilde over (x)}) and θ==.sub.i=1.sup.I for all layers, where I is the number of layers. From aforementioned discussion, the representation learning process can be transformed into an issue of training autoencoders, then recording information of the middle layer h (i.e., the layer of hidden presentation) in feature space H∈ custom character .sup.r×1 for each driving encounter x, where r is the dimension of hidden layer.

Distance-Based Measure

(33) In addition to treat the representation learning process as a black box, we can also use a mathematically rigorous way to capture their distance characteristics. In order to describe the distance relationship between two trajectories in a driving encounter, we need to define a measure to gauge their geometrical distance. Generally, there are two typical ways or formulas to achieve this, i.e., dynamic time warping and Euclidean distance, discussed as follows.

Dynamic Time Warping (DTW)

(34) Given a driving encounter observation x, the objective of DTW (Ref. 27) is to measure the geometric relationship of the two vehicle trajectories
p.sup.(1)=(p.sub.1.sup.(1), . . . ,p.sub.k.sup.(2), . . . ,p.sub.K.sup.(1))
p.sup.(2)=(p.sub.1.sup.(2), . . . ,p.sub.k.sup.(2), . . . ,p.sub.K.sup.(2))
with a length of K. In what follows, we will introduce trajectory measure space denoted by Pε custom character .sup.K×K, with p.sub.m.sup.(1), p.sub.n.sup.(2)∈P for m, n,∈[1, . . . , K]. To evaluate the relationship between trajectories p.sup.(1) and p.sup.(2) in a driving encounter, we introduce a local distance measure, defined by
f:P×P.fwdarw..sub.≥0 (5)
Typically f(p.sub.m.sup.(1), p.sub.n.sup.(2)) is small if p.sub.m.sup.(1) and p.sub.n.sup.(2) are close to each other, and otherwise f(p.sub.m.sup.(1), p.sub.n.sup.(2)) is large. Thus by evaluating each pair of elements in the vehicle trajectories p.sup.(1) and p.sup.(2), we can obtain the corresponding feature matrix F∈ custom character .sup.K×K to represent the geometry of driving encounter x, where
F(m,n):=f(p.sub.m.sup.(1),p.sub.n.sup.(2)) (6)

(35) Here the Euclidean distance is selected to compute the local distance and computed by
f(p.sub.m.sup.(1),p.sub.n.sup.(2))=∥p.sub.m.sup.(1)−p.sub.n.sup.(2)∥.sub.2 (7)

(36) Similar to the DTW distance, other distance measures can also be selected such as LCSS, edit distance for real sequence (EDR). They discrete the trajectories of driving encounter and account the number of occurrences where the Euclidean distance between matched segments does not match a predefined spatial threshold. Compared to DTW, both of LCSS and EDR are sensitive to the spatial threshold (Ref. 10), that is, a large threshold value indicates a high acceptation of differences in trajectories and otherwise, a low tolerance of differences.

Normalized Euclidean Distance (NED)

(37) The other simple and efficient way to measure the distance between driving encounters is to apply a normalized Euclidean distance directly, which is computed by

(38) $\begin{matrix} F = \frac{.Math. p^{(1)} - p^{(2)} .Math.}{\max (.Math. p^{(1)} - p^{(2)} .Math.)} & (8) \end{matrix}$

(39) Thus we can obtain the feature F of this driving encounter. Other kinds of distance measure could also be flexible used.

Shape-Based Measure

(40) Different from the distance-based measure, the shape-based measures have also been used to capture the geometric features of two single trajectories (Ref. 3), (Ref. 10). The most well-known shape-based measures are the Hausdorff distance (Ref. 28), the Fréchet distance (Ref. 29), and the symmetrized segment-path distance (SSPD) developed in (Ref. 10). These methods have registered a high level of performance for describing the geometric features of a single trajectory; however, they are not suitable for the relationship between driving encounters consisting of a pair of vehicle trajectories, since their output is a metric and easy to measure the shape-similarity level of two individual trajectories but not work for pairs of two trajectories. Because even for driving encounters with the same metric value, the driving encounters could be greatly different from each other in terms of shape. Therefore, the shape-based measure is not applied in this disclosure.

Summary of Approaches

(41) According to the above discussion, we select the neural network-based autoencoder and distance-based measure to learn features representative of driving encounters. Besides, the autoencoder is able to uncover underlying information with dimension reduction and hence it can potentially extract meaningful representations from the results of DTW or NED. There are five combination ways to learn representations from driving encounters, listed as follows:

(42) DTW: directly using output of DTW as representations, i.e.,

(43) $DTW : x \overset{(6)}{.Math.} F$

(44) NED: directly using output of NED as representations, i.e.,

(45) $NED : x \overset{(8)}{.Math.} F$

(46) LSTMAE: Applying LSTMAE to extract underlying representations of driving encounters, i.e.,

(47) $LSTMAE : x \overset{LSTMAE}{.Math.} h$

(48) DTW-ConvAE: combining ConvAE with DTW to extract underlying representations, i.e.,

(49) $DTW - ConvAE : x \overset{(6)}{.Math.} F \overset{ConvAE}{.Math.} h$

(50) NED-LSTMAE: combine LSTMAE with NED to extract underlying representations, i.e.,

(51) $NED - LSTMAE : x \overset{(8)}{->} F \overset{LSTMAE}{.Math.} h$

(52) To understand easily, we display and compare the input and output for each representation extraction method in Table 1. In order to follow easily, we denoted custom character .sub.x.sub.i as the extracted representation F or h of driving encounter x.sub.i for all approaches in Table 1, where K is the dimension of inputs and r is the dimension of the learned representations.

(53) TABLE-US-00001 TABLE 1 INPUT AND OUTPUT OF REPRESENTATION LEARNING APPROACHES Method Input Observations Output Features DTW x ∈ custom character .sup.K×d F ∈ .sup.K×K NED x ∈ .sup.K×1 F ∈ .sup.K×1 LSTMAE F ∈ .sup.K×1 h ∈ .sup.r×1 DTW-ConvAE F ∈ .sup.K×K h ∈ .sup.r×1 NED-LSTMAE F ∈ .sup.K×1 h ∈ .sup.r×1

Clustering

(54) This section will introduce a clustering approach for the learned representations. We first detail the clustering method and then introduce the qualification criteria of clustering results.

Methods

(55) Time-series trajectories clustering is very ubiquitous (Ref. 30), and a wealth of clustering algorithms has been developed to solve related problems (Ref. 31). Clustering method selection is restricted by the characteristics of the driving encounter. Given the extracted representation of driving encounters in Table I, we can classify them into groups. In this disclosure, we can use k-means clustering (k-MC) (Ref. 32).

Performance Metric

(56) For the clustering task, we do not exactly have the prior knowledge of cluster number. Therefore, we need to define a criterion to evaluate the clustering performance. Our aim is to gather the driving encounters with similar characteristics into one group that is greatly far from other groups. Hence, the quality of clustering results can be assessed by checking the between and within variances of the obtained clusters. On the other hand, the variance of the elements in the same groups should be as small as possible. Here (according to Refs. 10, 30, 31), we should define the within-cluster (WC) variance and between-cluster (BC) variance, which requires compute a mean value of extracted features.

(57) However, in our case, directly computing the mean of driving encounters is not tractable since it is inconceivable and meaningless to calculate the mean of trajectories in driving encounters. In this disclosure, instead of directly using the vehicle trajectories to evaluate the BC and WC variances, we make the advantages of the extracted representations and introduce the mean of the extracted representations, denoted by custom character .sub.χ, which is computed by averaging all extracted representations within the cluster χ. Let χ.sub.1, χ.sub.2, . . . χ.sub.J be the set of clusters of driving encounters x. Then the BC variance and WC variance can be derived by

(58) $\begin{matrix} BC = \frac{1}{J - 1} {.Math.}_{j = 1}^{J} �� ({\overline{ℱ}}_{x}, {\overline{ℱ}}_{x_{j}}) & (9 a) \\ WC = \frac{1}{K - J} {.Math.}_{j = 1}^{J} \frac{1}{.Math. x_{i} .Math.} �� ({\overline{ℱ}}_{x_{j}}, {\overline{ℱ}}_{x_{j}}) & (9 b) \end{matrix}$

(59) In such way, the BC and WC variances between approaches become comparable through scaling their values into [0, 1] with respect to each approach. A large value of λ.sub.BC indicates a far distance between clusters and otherwise, a close distance.

Example

(60) The following Example is provided to demonstrate and further illustrate certain embodiments and aspects of the present invention and is not to be construed as limiting the scope of the invention.

Data Collection and Analysis

(61) All the driving data were collected from naturalistic settings supported by the University of Michigan Safety Pilot Model Development (SPMD) program (Ref. 33). This database was conducted by the University of Michigan Transportation Research Institute (UMTRI) and provides driving data logged in the last five years in Ann Arbor area. It includes approximately 3,500 equipped vehicles and 6-million trips in total. Latitude and longitude information of each vehicle was collected by the on-board GPS. The collection process starts at the ignition for each equipped vehicle. The data was collected with a sampling frequency of 10 Hz.

B. Experiment Procedure and Settings

(62) 1) Autoencoders:

(63) We searched the dataset of 100,000 trips, collected from 1900 vehicles with 12-day runs. The trajectory information we extracted includes latitude, longitude, speed of the vehicles. The selection range was constricted to an urban area with the latitude in (−83.82, −83.64) and the longitude in (42.22, 42.34), as shown in FIG. 5. The vehicle encounter was defined as the scenario where two vehicles are close to each other and the relative Euclidean distance between them is small than 100 m. The dots indicate the position of the vehicle at every sample time. After querying from the SPMD database, we got 49,998 vehicle encounters. In the case where the distance between two vehicles is less than 100 m, then for a short second, they were out of the range to communicate with each other, which results in very short trajectories. In this disclosure, we mainly focus on the trajectory length larger than 10 s, which will be very meaningful for behavior analysis and applications. Finally, we obtained 2,568 satisfied driving encounters for experimental validation. FIG. 6 displays the distribution of duration length of all origin driving encounters. In order to make feature extraction tractable, we used the re-sampling method through an interpolation process to normalize the driving encounter into identical length.

(64) From FIG. 6, it can be seen that the length of driving encounters was various. In order to facilitate the training procedure, we empirically unified each driving encounter to a fixed length of 200 sample points using linear interpolation. Here, we selected the unified length of 200 with the fact that a small number of unified samples will reduce model accuracy while a high number of unified samples will increase computational burden without significantly model performance improvement.

(65) Setting an optimal number of layers in autoencoder is a challenging and open-end problem, because there is no general design rule that can be universally used to solve all problems. We set all autoencoders with a symmetric structure according to our experiences, and more specifically,

(66) LSTMAE: A fifth-layer (I=5) LSTMAE was designed for the deep representation, where the fourth layer (i.e., middle layer) is the hidden representation of driving encounter. The weights for each layer were initialized using random numbers between 1 and −1 following uniform distribution. The dimension of the middle layer (i.e., hidden layer) was set as 10, i.e., h∈ custom character .sup.r×1 with r=10.

(67) ConvAE: A fifth-layer ConvAE was designed to extract underlying features from outputs of DTW. The dimension of its hidden layer was also set as 10, i.e., h∈ custom character .sup.r×1 with r=10. After learning the hidden feature .sub.x.sub.i of driving encounter x.sub.i using autoencoders, we then applied the k-means clustering (k-MC) approach to these extracted hidden representations to cluster driving encounters.

(68) 2) DTW:

(69) Similarly to autoencoders, the length of driving encounters was unified to 100 for DTW in order to balance the model accuracy and computational burden of representation learning. Thus we can compute the representation custom character .sub.x∈.sup.100×100 using (7) for each driving encounter.

(70) 3) NED:

(71) For the normalized Euclidean distance approach, the representation can be directly computed through (8) based on the unified driving encounters. For all learned representations, we do not have any prior knowledge of what values of k∈ custom character .sup.+ pet. In order to determine k, we run clustering algorithms with different k and computed their performance metric λ.sub.BC and λ.sub.WC.

Result Discussion and Analysis

(72) This section will present and analyze the experiment results, covering representation extraction results and clustering performance evaluation and comparison.

Representation Learning and Analysis

(73) FIG. 7 shows examples of learned representations using DTW and NED as well as the reconstructed results of using LSTMAE and ConvAE. FIG. 7A shows raw driving encounter GPS trajectories with start points. FIG. 7B shows extracted representations using DTW. FIG. 7C shows reconstructed representations of DTW using ConvAE. FIG. 7D shows extracted representations of using NED of two trajectories and its reconstructed representation from LSTMAE. It can be seen that the autoencoders of both LSTMAE and ConvAE successfully reconstruct the representations of DTW and NED, which implies that the learned hidden layers of autoencoders can represent the associated driving encounters.

Learned Representations Using DTW and NED

(74) FIG. 7 shows examples of the learned representations using DTW and NED. It can be known that both DTW and NED methods can capture the distance information of two trajectories in an individual driving encounter. In the extracted representations of using DTW, deep red indicates a large value and deep blue indicates a small value. For DTW, it can capture the spatial information of all position over the whole trajectory. In addition, it also capture the dynamic information since it computes the distance of trajectories over time, while the NED does not capture the dynamic information. In addition, FIG. 7 presents the associated reconstructed outputs of DTW and NED. Experimental results demonstrate that the ConvAE and LSTMAE can capture the underlying information of the extracted representations using DTW and NED.

Learned Representations Using Autoencoders

(75) Each driving encounter endows a ten-dimensional representation in a vector h∈ custom character .sup.10×1 via autoencoders. FIG. 8 shows box plots of hidden layers in different autoencoders for all driving encounters. For each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. As shown in FIG. 8A, it can be found that DTWConvAE obtains recognizable hidden representations; that is, distributions between each element of representation h do not have significant differences, compared to NED-LSTMAE and LSTMAE, As shown in FIGS. 8B and 8C respectively. All elements in extracted representations using DTW-ConvAE have similar median values almost around zero and similar ranges of [25th, 75th] percentiles. For the NED-LSTMAE and LSTMAE methods, both of their hidden representations are recognizable, significantly differing from DTW-ConvAE. Their median values and ranges of [25th, 75th] percentiles are scatteredly distinctive.

Clustering Results and Analysis

(76) Based on the extracted representations, FIG. 9 compares all approaches to cluster driving encounters through showing the within- and between-cluster metrics (BC and WC), as shown in FIGS. 9A and 9C respectively, and their change rates over the number of clusters k, as shown in FIGS. 9B and 9D respectively. We found that the within-cluster distance decreases and the between-cluster distance increases with increasing the number of clusters. According to their change rate of BC or WC, it can be concluded that when the number of clusters k=10, their performance metrics would go to converge and change slightly, implying that k=10 is the preferred selection. As we discussed before, the goal of clustering is to put driving encounters into groups with maximizing the between-cluster distance and minimizing the within-cluster distance.

(77) According to FIG. 9, it can be concluded that the DTW-kMC outperforms other counterparts, with BC=0:923 at k=10. For all autoencoders at k=10, the NED-LSTMAE-kMC obtains the best performance with BC=0:803, compared to LSTMAE-kMC with BC=0:574 and DTW-ConvAE-kMC with BC=0:559. This can be explained according to FIG. 8. The DTW-ConvAE-kMC (top in FIG. 8) obtains the worst performance as its extracted representations of driving encounters are not such recognizable and distinctive. For distance-based approaches, the DTW-kMC outperforms the NED-kMC with BC=0:648. This is because the DTW can capture the dynamic features over time as well as the distance features of two trajectories; however, the NED could not.

(78) FIG. 10 displays the visual results of all clustered driving encounters with their GPS data. Here we only show the results with the optimal number of clusters k=10 because we have seen in FIG. 9 that a plateau can be observed on the change rate of within-cluster criteria λ.sub.BC with respect to the number of cluster starting at cluster sizes of 10 for all approaches. FIG. 11 lists the amount of driving encounters in each cluster. It can be found that some clusters covering very common driving encounter behaviors (e.g., cluster #2, cluster #7 and cluster #9) and some clusters representing rare driving encounter behavior (e.g., cluster #4 and cluster #10) can be detected.

(79) TABLE-US-00002 TABLE 2 Efficiency Comparison of Representation Extraction Approaches Methods Computing Time Unit of Time NED 101.41 Second DTW 120.77 Second LSTMAE ≈ 10 Hour ConvAE ≈ 5 Hour

Computing Efficiency Analysis

(80) In order to evaluate the conservativeness of the algorithm and its applicability to real-time applications, its computational costs when implemented on a standard laptop computer are presented and a comparison with other counterparts is given. All autoencoders were built using Python™ in Keras (see the resource identified as /building-autoencoders-in-keras.html at the domain name blog.keras.io), and all distance-based representation extraction algorithms were also programmed using Python™. Table 2 presents the average computation time for extracting representations on a standard laptop computer with an Intel® Core™ i7 running at 2.5 GHz, with 16 GB of RAM. It can be seen that the NED and DTW algorithms can extract feature much faster than two others, only with 120 s for 2568 driving encounters. However, the autoencoders with neural nets require few hours to extract representations.

Further Discussion

(81) This disclosure presents a comprehensive investigation of driving encounter classification through five approaches, which can be categorized into two types: deep learning-based and distance-based. The distance-based method outperforms all other counterparts. However, some challenges still exist in our developed approaches and discussed as follows.

Layers of Autoencoders

(82) For the deep learning-based approaches, we empirically set the hyperparameters of them, like the amount of decoder/encoder layers and the number of nodes between layers because there is no a general approach for all application cases to determine the optimal layers and nodes between layers. The hyperparameter selection in our disclosure mainly concerns two aspects: computational cost and model performance. Autoencoders with large numbers of layers could somewhat enhance model performance but at a significantly great computational cost. Therefore, in this disclosure we selected a moderate number of layers to learn hidden representation of driving encounters.

Contextual Road Information

(83) This disclosure mainly utilized the vehicle trajectories of GPS data without any other information to group driving encounters since GPS data is very easy to obtain at a low cost such as via mobile-phone and equipped localization sensors. Our experiment validation did not consider other information such as contextual road information and vehicle speed, but it is flexible to extend our developed approach to other driving case of including high-dimensional time-series data. For instance, if more road context information could be obtained such as highway and intersections (T-shape, Y-shape, and cross-shape, etc.), our developed approached can help get insights into diversity of driving encounters. Therefore, considering road contextual information will be one of our future work.

(84) This disclosure investigated the clustering problem of naturalistic driving encounters consisting of pairs of vehicle trajectories. Two kinds of representation learning ways—deep autoencoder and distance-based, including five specific approaches, were developed and evaluated. Clustering approaches, were considered, and k-means clustering was selected and evaluated. The clustering results can provide insights into driver interactive behaviors in different scenarios and help to design a friendly and efficient decision-making policy for intelligent vehicles in order for intelligent vehicles such as autonomous vehicle to perform maneuvers based on driving encounters.

(85) FIG. 12 shows an exemplary embodiment of a vehicle 1210 with one or more sensors. The sensors can sense trajectory information associated with one or more vehicles. The one or more sensor may be able to sense a location of a second vehicle 1240. Each sensor may be a LiDAR sensor, a camera, or any other sensor suitable for sensing a location of a vehicle. The location of the second vehicle 1240 may be sensed relative to the first vehicle 1210. The one or more sensors may include a first sensor 1220, a second sensor 1225, and a third sensor 1230. The first sensor 1220 and the second sensor 1225 can be LiDAR sensors. The third sensor 1230 can be a camera. The first vehicle 1210 can have a GPS sensor (not shown). The first vehicle 1210 can use the GPS sensor to sense the trajectory 1250, which may also be referred to as the first trajectory information 1250, of the first vehicle 1210. The GPS sensor can also sense the absolute location of the first vehicle 1210. The first vehicle 1210 can sense the relative location of the second vehicle 1240 with respect to the first vehicle 1210 using on or more of the sensors 1220, 1225, 1230. The trajectory 1260, which may also be referred to as the second trajectory information 1260, of the second vehicle 1240 can then be calculated by comparing the relative location of the second vehicle 1240 to the absolute location of the first vehicle 1210 at a given time. In some embodiments, the second vehicle 1240 may have a GPS sensor that can measure the second trajectory information 1260.

(86) FIG. 13. shows an example 1300 of a process for facilitating a vehicle to perform a maneuver with some embodiments of the disclosed subject matter. As shown in FIG. 13, at 1301, the process 1300 can receive information from a one or more sensors couple to a first vehicle. The one or more sensors can include any suitable sensor as described above. The information can include parameters such as GPS coordinates, location of the second vehicle in relation to the first vehicle, time, a velocity of the first vehicle or other applicable parameters as described above. The process 1300 can then proceed to 1302.

(87) At step 1302, the process 1300 can determine that the first vehicle is within a predetermined distance of the second vehicle. The predetermined distance can be a distance that implies a driving encounter is occurring, such as 100 meters as described above. The process 1300 can sense the vehicles are within the predetermined distance of each other using the information received from the one or more sensors coupled to the first vehicle. If the first vehicle and second vehicle are within the predetermined distance of each other (“YES” at step 1302), the process 1300 can proceed to step 1303. If the first vehicle and second vehicle are not within the predetermined distance of each other (“NO” at step 1302), the process 1300 can proceed to 1301, forming an iterative loop.

(88) At 1303, process 1300 can start a sampling timer. The sampling timer may be set to count down from a predetermined sampling time. The sampling time can be a minimum threshold time that a trajectory length is meaningful, such as about 10 seconds as described above. The process 1300 may need to continue to receive information from the one or more sensors until sufficient information has been obtained determine vehicle trajectory information and to classify vehicle trajectory information, as will be discussed below. The process can then proceed to 1304.

(89) At 1304, the process 1300 can determine if the sampling timer is expired. If the process 1300 determines that the sampling timer is not expired, the (“NO”) at 1304, the process 1300 may proceed to 1304. If the process 1300 determines that the sampling timer is expired, the (“YES”) at 1304, the process may proceed to 1306.

(90) At 1306, process 1300 can determine, using the information received from the one or more sensors coupled to the first vehicle, the trajectory, also known as first trajectory information, of the first vehicle. The one or more sensors can include any suitable sensor as described above. The process 1300 can also determine the trajectory, also known as second trajectory information, of the second vehicle as described above. The process 1300 can then proceed to 1308.

(91) At 1308, the process 1300 can normalize the first trajectory information and second trajectory information to a predetermined number of data points. The number of data points can be the length that a driving encounter feature extractor, which will be described below, uses. In some embodiments, the number of data points can be 100-200 samples. The process 1300 can use trajectory transformation, re-sampling methods, trajectory substitution, points of interest, scale-invariant features, or another other suitable technique for normalizing the first trajectory information and second trajectory information to a predetermined number of data points. The process 1300 can then proceed to 1310.

(92) At 1310, the process 1300 can extract a feature vector using the driving encounter feature extractor. A current driving encounter, which can include the first trajectory information and the second trajectory information, can be input to the driving encounter feature extractor, which can then extract the feature vector. The driving encounter feature extractor can be deep-learning approach including a trained autoencoder such as the LSTMAE, DTW-ConvAE, NED-LSTMAE, or other suitable autoencoder trained using unsupervised learning. The autoencoder can be trained using a plurality of suitable driving encounters unified to the predetermined number of data points, as described above. The feature vector can be the middle layer h of the autoencoder as described above. The driving encounter feature extractor can also use a distanced based approach such as DTW or NED, where the current driving encounter is input and the feature vector is directly output. The process 1300 can then proceed to 1312.

(93) At 1312, the process 1300 can provide the feature vector to a trained classifier. The trained classifier can take an input feature vector and output a classification of the feature vector. The classifier can be trained using unsupervised learning including clustering techniques such as k-means clustering, DBSCAN, OPTICS, or another suitable unsupervised learning method for grouping feature vectors. If clustering is used, the number of clusters may be selected by maximizing the between-cluster distance and minimizing the within-cluster distance using an optimization technique known in the art. If k-means clustering is used, the number of clusters can be about 10. If k-means clustering is used, the classification output can be the cluster that the input feature vector belongs to. The classifier can be trained by using a plurality of feature vectors extracted from a plurality of driving encounters. If the driving encounter feature extractor is an autoencoder, the classifier can be trained using a plurality of feature vectors extracted from the plurality driving encounters used to train the autoencoder. The process 1300 can then proceed to 1314.

(94) At 1314, the process 1300 can receive the classification output at 1312 in order to facilitate the first vehicle to perform a maneuver based on the current driving encounter.

(95) Although the invention has been described in considerable detail with reference to certain embodiments, one skilled in the art will appreciate that the present invention can be practiced by other than the described embodiments, which have been presented for purposes of illustration and not of limitation. Therefore, the scope of the appended claims should not be limited to the description of the embodiments contained herein.

REFERENCES

(96) [1] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles move? A unified framework for maneuver classification and motion prediction,” arXiv preprint arXiv:1801.06523, 2018. [2] J. Taylor, X. Zhou, N. M. Rouphail, and R. J. Porter, “Method for investigating intradriver heterogeneity using vehicle trajectory data: A dynamic time warping approach,” Transportation Research Part B: Methodological, vol. 73, pp. 59-80, 2015. [3] P. C. Besse, B. Guillouet, J.-M. Loubes, and F. Royer, “Destination prediction by trajectory distribution-based model,” IEEE Transactions on Intelligent Transportation Systems, 2017. [4] C. Piciarelli, C. Micheloni, and G. L. Foresti, “Trajectory-based anomalous event detection,” IEEE Transactions on Circuits and Systems for video Technology, vol. 18, no. 11, pp. 1544-1554, 2008. [5] Z. Sun, P. Hao, X. J. Ban, and D. Yang, “Trajectory-based vehicle energy/emissions estimation for signalized arterials using mobile sensing data,” Transportation Research Part D: Transport and Environment, vol. 34, pp. 27-40, 2015. [6] Z.-J. Chen, C.-Z. Wu, Y.-S. Zhang, Z. Huang, J.-F. Jiang, N.-C. Lyu, and B. Ran, “Vehicle behavior learning via sparse reconstruction with '2 □ 'p minimization and trajectory similarity,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 2, pp. 236-247, 2017. [7] T. Appenzeller, “The scientists' apprentice,” Science, vol. 357, no. 6346, pp. 16-17, 2017. [8] G. Yuan, P. Sun, J. Zhao, D. Li, and C. Wang, “A review of moving object trajectory clustering algorithms,” Artificial Intelligence Review, vol. 47, no. 1, pp. 123-144, 2017. [9] Z. Feng and Y. Zhu, “A survey on trajectory data mining: Techniques and applications,” IEEE Access, vol. 4, pp. 2056-2067, 2016. [10] P. C. Besse, B. Guillouet, J.-M. Loubes, and F. Royer, “Review and perspective for distance-based clustering of vehicle trajectories,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 11, pp. 3306-3317, 2016. [11] D. Yao, C. Zhang, Z. Zhu, Q. Hu, Z. Wang, J. Huang, and J. Bi, “Learning deep representation for trajectory clustering,” Expert Systems, DOI: 10.1111/exsy.12252. [12] H. Zhao, C. Wang, Y. Lin, F. Guillemard, S. Geronimi, and F. Aioun, “On-road vehicle trajectory collection and scene-based lane change analysis: Part I,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 1, pp. 192-205, 2017. [13] W. Yao, Q. Zeng, Y. Lin, D. Xu, H. Zhao, F. Guillemard, S. Geronimi, and F. Aioun, “On-road vehicle trajectory collection and scene-based lane change analysis: Part II,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 1, pp. 206-220, 2017. [14] M. Y. Choong, L. Angeline, R. K. Y. Chin, K. B. Yeo, and K. T. K. Teo, “Modeling of vehicle trajectory clustering based on Icss for traffic pattern extraction,” in Automatic Control and Intelligent Systems (I2CACIS), 2017 IEEE 2nd International Conference on. IEEE, 2017, pp. 74-79. [15] P. Zhao, K. Qin, X. Ye, Y. Wang, and Y. Chen, “A trajectory clustering approach based on decision graph and data field for detecting hotspots,” International Journal of Geographical Information Science, vol. 31, no. 6, pp. 1101-1127, 2017. [16] J. Wang, C. Wang, X. Song, and V. Raghavan, “Automatic intersection and traffic rule detection by mining motor-vehicle gps trajectories,” Computers, Environment and Urban Systems, vol. 64, pp. 19-29, 2017. [17] X. Zhan, Y. Zheng, X. Yi, and S. V. Ukkusuri, “Citywide traffic volume estimation using trajectory data,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 2, pp. 272-285, 2017. [18] J. Kim and H. S. Mahmassani, “Spatial and temporal characterization of travel patterns in a traffic network using vehicle trajectories,” Transportation Research Part C: Emerging Technologies, vol. 59, pp. 375-390, 2015. [19] F. T. Pokomy, K. Goldberg, and D. Kragic, “Topological trajectory clustering with relative persistent homology,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 16-23. [20] S. Li, W. Wang, Z. Mo, and D. Zhao, “Clustering of naturalistic driving encounters using unsupervised learning,” arXiv preprint arXiv:1802.10214, 2018. [21] E. Galceran, A. G. Cunningham, R. M. Eustice, and E. Olson, “Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment,” Autonomous Robots, vol. 41, no. 6, pp. 1367-1382, 2017. [22] J. Bian, D. Tian, Y. Tang, and D. Tao, “A survey on trajectory clustering analysis,” arXiv preprint arXiv: 1802.06971, 2018. [23] D. Yao, C. Zhang, Z. Zhu, J. Huang, and J. Bi, “Trajectory clustering via deep representation learning,” in Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 2017, pp. 3880-3887. [24] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332-1338, 2015. [25] M. C. Nechyba and Y. Xu, “Stochastic similarity for validating human control strategy models,” IEEE Transactions on Robotics and Automation, vol. 14, no. 3, pp. 437-451, 1998. [26] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [27] M. Muller, “Dynamic time warping,” Information retrieval for music and motion, pp. 69-84, 2007. [28] A. A. Taha and A. Hanbury, “An efficient algorithm for calculating the exact hausdorff distance,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 11, pp. 2153-2163, 2015. [29] B. Aronov, S. Har-Peled, C. Knauer, Y. Wang, and C. Wenk, “Frechet distance for curves, revisited,” in European Symposium on Algorithms. Springer, 2006, pp. 52-63. [30] T. W. Liao, “Clustering of time series data: A survey,” Pattern recognition, vol. 38, no. 11, pp. 1857-1874, 2005. [31] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on neural networks, vol. 16, no. 3, pp. 645-678, 2005. [32] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,” IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 881-892, 2002. [33] W. Wang, C. Liu, and D. Zhao, “How much data are enough? a statistical approach with case study on longitudinal driving behavior,” IEEE Transactions on Intelligent Vehicles, vol. 2, no. 2, pp. 85-98, 2017.

(97) The citation of any document is not to be construed as an admission that it is prior art with respect to the present invention.

Unsupervised classification of encountering scenarios using connected vehicle datasets

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06F18/23213

PHYSICS

Classification Explorer

G06F18/23

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G05D2201/0213

PHYSICS

Classification Explorer

G05D1/0088

PHYSICS

Classification Explorer

G06V20/13

PHYSICS

Classification Explorer

G06N3/088

PHYSICS

Classification Explorer

G06F18/24133

PHYSICS

Classification Explorer

G06F18/24

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G05D1/0221

PHYSICS

Classification Explorer

G05D1/0289

PHYSICS

Classification Explorer

G06F18/22

PHYSICS

International classification

Classification Explorer

G05D1/00

PHYSICS

Classification Explorer

G05D1/02

PHYSICS

Classification Explorer

G06K9/42

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06K9/46

PHYSICS

Abstract

Claims

Description