Vision based target tracking using tracklets

10860863 ยท 2020-12-08

Assignee

Inventors

Cpc classification

International classification

Abstract

A non-hierarchical and iteratively updated tracking system includes a first module for creating an initial trajectory model for multiple targets from a set of received image detections. A second module is connected to the first module to provide identification of multiple targets using a target model, and a third module is connected to the second module to solve a joint object function and maximal condition probability for the target module. A tracklet module can update the first module trajectory module, and after convergence, output a trajectory model for multiple targets.

Claims

1. A non-hierarchical and iteratively updated tracking system, comprising: a processor; and system memory coupled to the processor and storing instructions configured to cause the processor to: derive trajectory models for multiple targets from a set of received image detections; identify targets, from among the multiple targets, using a target model including solving a joint object function and maximal condition probability for the target model dependent on a hyper-parameter set, including: formulate the hyper-parameter set from the trajectory models; solve a unary term describing how a hidden state value fits an observation dependent on the hyper-parameter set; solve a pairwise term defining a probability that adjacent nodes in a graph correspond to the same label depending on the hyper-parameter set; and merge the solution of the unary term, the solution of the pairwise term, and a normalization factor into a solution of the joint object function and maximal conditional probability; update the trajectory models, including linking tracklets to one another based on the solution of the joint object function and maximal conditional probability; detect trajectory convergence for the multiple targets within the updated trajectory models based on equations associated with the linked tracklets satisfying a belief threshold; and output the updated trajectory models.

2. The iteratively updated tracking system of claim 1, wherein instructions configured to derive trajectory models comprise instructions configured to access sliding windows initializable from at least one of a first frame and a previous sliding window.

3. The iteratively updated tracking system of claim 1, wherein instructions configured to identify targets using a target model comprise instructions configured to identify targets using a Markov random field model.

4. The iteratively updated tracking system of claim 1, wherein instructions configured to solve a joint object function and maximal condition probability comprise instructions configured to use a loopy belief propagation algorithm.

5. The iteratively updated tracking system of claim 1, further comprising instructions configured to reset the number of trajectory models.

6. The iteratively updated tracking system of claim 1, further comprising instructions configured to initialize the trajectory models.

7. An iteratively updated tracking system, comprising: a processor; and system memory coupled to the processor and storing instructions configured to cause the processor to derive trajectory models for multiple targets from a set of received image detections; identify targets, from among the multiple targets, using a Markov random field model including solving a joint object function and maximal condition probability of the Markov random field model using a loopy belief propagation algorithm depending on a hyper-parameter set, including: formulate the hyper-parameter set from the trajectory models; solve a unary term describing how a hidden state value fits an observation dependent on the hyper-parameter set; solve a pairwise term defining a probability that adjacent nodes in a graph correspond to the same label depending on the hyper-parameter set; and merge the solution of the unary term, the solution of the pairwise term, and a normalization factor into a solution of the joint object function and maximal conditional probability; update the trajectory models, including linking tracklets to one another based on the solution of the joint object function and maximal condition probability; detect trajectory convergence for the multiple targets within the updated trajectory models; and output the updated trajectory models.

8. The iteratively updated tracking system of claim 7, wherein instructions configured to derive trajectory models comprise instructions configured to access sliding windows initializable from at least one of a first frame and a previous sliding window.

9. The iteratively updated tracking system of claim 7, wherein instructions configured to update the trajectory models comprising instructions configured to update the trajectory models based on model metric data.

10. The iteratively updated tracking system of claim 7, wherein instructions configured to detect trajectory convergence for the multiple targets within the updated trajectory models comprise instructions configured to detect trajectory convergence for the multiple targets within the updated trajectory models based on equations associated with the linked tracklets satisfying a belief threshold.

11. The iteratively updated tracking system of claim 7, further comprising instructions configured to reset the number of trajectory models.

12. The iteratively updated tracking system of claim 7, further comprising instructions configured to initialize the trajectory models.

13. The iteratively updated tracking system of claim 8, wherein instructions configured to identify targets comprises instructions configured to infer target identification for detections in a sliding window from among the sliding windows.

14. A non-hierarchical and iteratively updated tracking method, comprising: deriving trajectory models for multiple targets from a set of received image detections; identifying targets, from among the multiple targets using a target model including solving a joint object function and maximal condition probability for the target model dependent on a hyper-parameter set, including: formulating the hyper-parameter set from the trajectory models; solving a unary term describing how a hidden state value fits an observation dependent on the hyper-parameter set; solving a pairwise term defining a probability that adjacent nodes in a graph correspond to the same label depending on the hyper-parameter set; and merging the solution of the unary term, the solution of the pairwise term, and a normalization factor into a solution of the joint object function and maximal conditional probability; updating the trajectory models, including linking tracklets to one another based on the solution of the joint object function and maximal conditional probability; detecting trajectory convergence for the multiple targets within the updated trajectory models based on equations associated with the linked tracklets satisfying a belief threshold; and outputting the updated trajectory models.

15. The method of claim 14, wherein deriving trajectory models comprises accessing sliding windows initializable from at least one of a first frame and a previous sliding window.

16. The method of claim 14, wherein identifying targets using a target model comprise identifying targets using a Markov random field model.

17. The method of claim 14, wherein solving a joint object function and maximal condition probability comprises using a loopy belief propagation algorithm.

18. The method of claim 14, further comprising resetting the number of trajectory models.

19. The method of claim 14, further comprising initializing the trajectory models.

20. The method of claim 15, wherein identify targets comprises inferring target identification for detections in a sliding window from among the sliding windows.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a system and method for providing trajectory data based on multiple tracklets;

(2) FIG. 2 illustrates a system that provides an online unified framework to determine local-to-global trajectory models as a joint optimal assignment;

(3) FIG. 3 illustrates use of a Markov Random Field (MRF) model;

(4) FIG. 4 illustrates an iterative algorithm that automatically learn trajectory models from the local-to-global information; and

(5) FIG. 5 illustrates a system that provides an online unified framework to determine local-to-global trajectory models using a regularized pairwise constrained component analysis (PCCA) algorithm.

DETAILED DESCRIPTION

(6) FIG. 1 is a cartoon illustrating a system and method 100 for creating a trajectory model based on a first module 110 able to create an initial trajectory model for multiple targets created from a set of received image detections. A second module 112 is connected to the first module to produce a Markov random field model able to provide identification of multiple targets. A third module 114 is connected to the second module and includes a loopy belief propagation algorithm to solve a joint object function and maximal condition probability of the Markov random field model. A tracklet module 116 is connected to the third module 118 that is able to update the first module trajectory module. After convergence, a trajectory model 118 for multiple targets is output.

(7) In contrast to conventional hierarchical methods that heuristically formulate a data association method as two separate optimization schemes (e.g. local, then global), FIG. 2 shows a system 200 that provides an online unified framework to determine local-to-global trajectory models as a joint optimal assignment. The system 200 uses an iterative algorithm to alternately update the trajectory models and link detections or tracklets into longer fragments. As the iterative process continues, the trajectory models become accurate, and the broken tracklets are connected to form longer trajectories. Data association is treated as inferences of target IDs for all the detections using pairwise Markov Random Field (MRF). In one embodiment, a loopy belief propagation (LBP) algorithm is used to optimize the MRF model so as to generate separated tracklets.

(8) As seen in FIG. 2, system 200 is multi-target tracking system and method that is able automatically link the tracklets or detections 210 into trajectories based on a local-to-global trajectory model. At the beginning of each sliding window, trajectory models 220 are initialized by the local information either/both a first frame or from the previous sliding window. A pairwise Markov Random Field model 230 infers target identification for all detections in the sliding window and can employ a loopy belief propagation algorithm 240 to solve the joint object functionmaximal conditional probability of MRF model. Detections with the same label in adjacent frames are linked to form reliable tracklets 250. Finally, trajectory models 260 are updated using the reliable tracklets 250. Meanwhile, number of trajectory models can be reset to eliminate the false models caused by false alarms and add new models for newly emerging targets. The trajectory models for all targets and maximization of the conditional probability of the MRF model can be alternately continued until the result converges.

(9) In more detail, let Y={y.sub.1,y.sub.2, . . . ,y.sub.N} be a set of detections and L={I.sub.1,I.sub.2, . . . ,I.sub.N} be their labels (target IDs). The overall goal is to find the optimal assignment for the identity of targets based on the detection set. It is equivalent to maximize the conditional probability P(L|Y) of the MRF model 300 as shown in FIG. 3, where for each node i, y.sub.i and I.sub.i correspond to its observation and its state to be estimated, respectively. Assume there are K targets in the scene, then I.sub.i{1, . . . ,K}, where denotes false detections. Using this model, P(L|Y) is defined as:

(10) P ( L | Y ; ) = 1 Z p .Math. i ( l i , Y i ; ) .Math. .Math. ij .Math. ( l i , l j , Y i , Y j ; )
where Z.sub.p is the normalization factor. The unary term (I.sub.i,Y.sub.i;) describes how the hidden state value I.sub.i fits the observation Y.sub.i. The pairwise term (I.sub.i,I.sub.j,Y.sub.i,Y.sub.j;) defines the probability that two adjacent nodes possess the same label. The neighborhood of node i in the proposed MRF model consists of all the nodes from both the previous frame t.sub.i1, the next frame t.sub.i+1 and all other nodes within the same frame t.sub.i.

(11) In P(L|Y), the probability depends on a hyper-parameter set . It is composed of the trajectory models for all targets, ={.sub.1, . . . .sub.K}. Each .sub.k is defined as =[.sub.k.sup.p,.sub.k.sup.v,.sub.k.sup.a,.sub.k.sup.s], where .sub.k.sup.p={p.sub.k.sup.0,T.sub.k,O.sub.k,.sub.k.sup.p} denotes the position parameters of target k that include the initial position p.sub.k.sup.0, the Kalman Filter parameters with the transition matrix T.sub.k and observation matrix O.sub.k, and the variance .sub.k.sup.p. .sub.k.sup.s={.sub.k.sup.s,.sub.k.sup.s} denotes the mean and variance of its (d.sup.x,d.sup.y) velocity, .sub.k.sup.s={.sub.k.sup.s,.sub.k.sup.s} denotes the scalar mean and variance of its size, and represents a target-specific classifier that is trained using the previous detections, and consequently used to classify the new detections. We also represent each detection with Y.sub.i=[Y.sub.i.sup.p,Y.sub.i.sup.v,Y.sub.i.sup.a,Y.sub.i.sup.s], including its (d.sup.x,d.sup.y) position Y.sub.i.sup.p, velocity Y.sub.i.sup.v, appearance Y.sub.i.sup.a, and size Y.sub.i.sup.s.

(12) Model inference can be based on an iterative algorithm that automatically learn trajectory models from the local-to-global information, shown in cartoon form 400 in FIG. 4. In effect, the iterative algorithm alternatively optimizes the trajectory models for all targets and maximizes the conditional probability of MRF model. Specifically, local-to-global trajectory model learning includes step (A) where detections are output from the object detector; (B) the estimated path by the initial local trajectory models is determined; (C) detections are linked into tracklets by the local trajectory models; (D) the trajectory models are re-learned by the reliable tracklets TL 1 and TL 2; (E) tracklets are relinked by the global trajectory models; and (F) the final trajectory for multiple objects is provided.

(13) In some embodiments, initialization of the trajectory models handles two tasks: (1) initializing the trajectory models at the beginning of the tracking task, i.e., in the first iteration of the first sliding window; and (2) initializing every time when sliding the analysis windows, i.e., in the first iteration of all analysis windows except the first one.

(14) Maximization of MRF conditional probability P(L|Y;) can use a MRF model whose generative and link probabilities are calculated by the established trajectory models . A sum-product loopy belief propagation algorithm which computes the marginal distribution by iteratively passing messages between neighbors can be used. The BP message-update equations are iterated until they converge. In order to select confident nodes, we set a threshold for the belief b(l.sub.i) of node i, i.e., node i will be assigned label k when b(l.sub.i=k)>T.sub.b. Thus the nodes with the same label k in adjacent frames are linked to form tracklet TL.sub.k, which is a relatively reliable segment of the final target trajectory.

(15) After maximizing the MRF conditional probability by the LBP algorithm and generating a set of confident and separated tracklets, the trajectory model learning handles the following two tasks: (1) updating the number of trajectory models K to accommodate false positive detections and newly emerging targets; and (2) updating by the reliable tracklets.

(16) Alternative tracking system embodiments are also contemplated. For example, another embodiment that uses tracklet association to form long trajectories for robust multi-target tracking in a single camera can involve improvement of learned appearance features to handle the dynamics of visual targets that exhibit a large amount of variability. Different from other online-learning methods, effective similarity metrics can be learned in an iterative. Reliable tracklets can be segmented into multiple non-overlapping sliding windows; and for each temporal window, positive and negative training samples collected to learn the metrics in an online fashion. Up-to-date metrics associate tracklets in adjacent windows to update our training samples. This allows better metrics can be re-learned through such iterative processes, and long trajectories can be formed window-by-window. In effect, providing both a framework for collecting samples online to learn the appearance model during tracking, and using an iterative process to obtain more training samples that are less sensitive to the variation of targets' visual appearance, allows better handling of inter-object occlusions and interactions, while improving overall tracking ability.

(17) This embodiment of a tracking system and method 500 is seen with respect to FIG. 5. Initial detection 510 based on a whole video segmented into non-overlapping short windows is followed by development of an Markov random field model and grouping in reliable tracklets 520. In sliding window module 530, initial training samples are only generated inside each individual sliding window, before correspondence between the triangle and the circle labeled tracklets is developed. For each sliding window, training samples are used to construct an initial metric with a regularized pairwise constrained component analysis (PCCA) algorithm, where spatial-temporal constraints are essential to guarantee the accuracy of positive/negative samples (module 540). Once this online metric learning is finished, short tracklets in adjacent windows can be associated to extend generate a training samples set designated at the initial appearance module 550. As tracklets are linked into longer trajectories, more samples could be collected to update training of more discriminative target appearances (module 560). Using the expanded training set, a new appearance model 570 can be further obtained, such that a more effective metric function can be re-learned in an iterative fashion. This new metric can be used to link all the target tracklets window by window (module 580) to form longer trajectories in trajectories module 590 and/or to pass selected information back to sliding window module 530.

(18) In more detail, the previous system and method is based on a learning algorithm that associates tracklets. The algorithm can be divided into three major steps: sample collection, online metric learning and tracklet association/

(19) Collecting initial training samples of tracklets sets in each generated non-overlapping temporal window, for the t-th window, specifically, can be understood by defining the tracklet set as s.sub.t.sup.P. Meanwhile xR.sup.d is the feature vector representing the appearance of detection responses in tracklets. Then, positive samples are generated from the same tracklet and negative samples from different tracklets, where the positive training set S.sub.t.sup.P and the negative training set S.sub.t.sup.N can be defined as:
S.sub.t.sup.P={R.sub.k:(x.sub.m.sup.k,x.sub.m.sup.k)|x.sub.m.sup.k,x.sub.n.sup.kF.sub.i}
S.sub.t.sup.N={R.sub.k:(x.sub.m.sup.k,x.sub.m.sup.k)|x.sub.m.sup.kF.sub.i,x.sub.n.sup.kF.sub.j,ik}
where R.sub.k is the k-th sample pair, and F.sub.i, F.sub.jW.sub.t.

(20) Only the first N frames of each tracklet is used to generate sample pairs to insure the accuracy of collected samples. For negative samples, some spatial-temporal constrains are adopted to guarantee that two tracklets belong to different targets. A limiting function is based on these constraints can be used:
C.sub.ij=C.sub.t(F.sub.i,F.sub.j)C.sub.v(F.sub.i,F.sub.j)

(21) The first constraint Ct is based on the observation that one object cannot belong to two tracklets. It is defined as

(22) C t ( F i , F j ) = { 0 , if t i e > t j s & t j e > t i s 1 , otherwise
where t.sub.i.sup.e, t.sub.j.sup.e are the end frames of tracklet F.sub.i, F.sub.j, and t.sub.i.sup.s, t.sub.j.sup.s are the start frames of tracklet F.sub.i, F.sub.j. This function represents that if there is an overlap over time between two tracklets, they are treated as different persons.

(23) The second constraint Cv is based on the fact that targets should not change their velocity abruptly, otherwise the tracklets cannot belong to the same person. For two non-overlapping tracklets, the function can be defined as

(24) C v ( F i , F j ) = { 0 , if .Math. P i s - ( P j e + V _ j .Math. t ) .Math. > 1 , otherwise
where P.sub.i.sup.s is position of the start frame of F.sub.i, P.sub.j.sup.e is position of the end frame of F.sub.j, V is average velocity of F.sub.j, t is the time gap between F.sub.i and F.sub.j, and is a threshold. This function represents that if there is a significant velocity difference between two tracklets, they are treated as different targets.

(25) According to the limiting function, negative samples are collected from tracklet pairs whose C.sub.ij is equal to 0. Once the process of collecting training samples is finished, an online learning algorithm is used to build a discriminative appearance-based model.

(26) The online learned appearance model can be based, for example, on an Online Algorithm for Scalable Image Similarity (OASIS), Probabilistic Relative Distance Comparison (PRDC), Information Theoretic Metric Learning (ITML), or Logistic Discriminant Metric Learning. In one embodiment, regularized pairwise constrained component analysis (PCCA) is used as a supervised on-line metric learning algorithm.

(27) Using regularized PCCA, projected distances between samples from the same class should be smaller than a given threshold T while for the inter-class samples, the distances should be larger than T (to ensure the generality of algorithm, T is set to 1). With the obtained k-th sample pair (x.sub.m.sup.k,x.sub.n.sup.k), a projected Mahalanobis-like distance
D.sub.M(x.sub.m.sup.k,x.sub.n.sup.k)=(x.sub.m.sup.kx.sub.n.sup.k).sup.TM(x.sub.m.sup.kx.sub.n.sup.k)
is used to measure the distances between identical or different person, where M=P.sup.TP and P is the projection matrix to learn, which maps data points into a low-dimensional space of dimension d<<d.

(28) The overall objective function to learn the projection matrix is based on a generalized logistic loss function

(29) ( x ) = 1 log ( 1 + e x ) .
The objective function is defined as:

(30) min P E ( P ) = .Math. k = 1 N ( y k ( D P 2 ( x m k , x n k ) - 1 ) ) + .Math. P .Math. 2
where N is the number of sample pairs; y.sub.k is a class label; for the positive samples, y.sub.k=1 and otherwise, y.sub.k=1. The regularization parameter is used to maximize the inter-class margin. Then, a dd matrix P can be found using a gradient descent-based method.

(31) Next training samples and the appearance model are updated. Given two tracklets from adjacent sliding windows (F.sub.iW.sub.t,F.sub.jW.sub.t+1), the first N frames are extracted from both F.sub.i and F.sub.j as test samples, then the similarity of the appearance descriptors of (F.sub.i,F.sub.j) can be defined as:

(32) S a ( F i , F j ) = ( .Math. m = 1 N .Math. n = 1 N D P ( x m i , x n j ) / N 2 ) - 1
where x.sub.m.sup.i is a feature vector of a detection response from Fi, and similarly x.sub.n.sup.i is from F.sub.j. This function uses the mean value of the relative distances between any two samples from F.sub.i and F.sub.j as the appearance similarity of the two tracklets.

(33) Since the threshold is set so T=1, for positive samples, S.sub.a should be larger than 1 while for negative samples this value should be smaller than 1. To ensure the accuracy of training data, a strict restriction is adopted, i.e., >1,<1 to represent the threshold of positive and negative samples. Then the regenerated samples are used to extend our training data and a more discriminative appearance-based model can be relearned.

(34) Once an updated metric function is and a new tracklet set W.sub.i={F.sub.i} the tracklets to can be linked to form longer trajectories. To calculate the affinity score between F.sub.i and F.sub.j, the objective function is defined as:
S.sub.ij(F.sub.i,F.sub.j)=C.sub.a(F.sub.i,F.sub.j)C.sub.v(F.sub.i,F.sub.j)
where C.sub.a, C.sub.v represent the similarity of appearance and velocity descriptors respectively.

(35) Specifically, to calculate C.sub.a, instead of taking only the first N frames as in the training process, to handle significantly environmental factors and people postures changes that would happen over the time, test samples can be randomly drawn from tracklets to calculate the appearance similarity. Therefore C.sub.a is defined by:

(36) C a ( F i , F j ) = { 1 , if S a ( F i , F j ) > 1 0 , otherwise

(37) The test pair (F.sub.i,F.sub.j) can be regarded as a same person in the case of S.sub.ij=1. The sliding window can move to the next and repeat the previous steps to generate longer trajectories window-by-window. Since there are still some gaps among adjacent tracklets in each trajectory possibility due to missed detections and occlusions, trajectory over the gaps based can be estimated according to a linear motion model.

(38) In practice, implementations of the described algorithms provide for accurate tracking of multiple targets from a camera. Experimental results on two widely used public datasets: PETS09 and TUD-Stadmitte, are discussed as follow, with comparison to other state-of-the-art methods. The evaluation metrics used are listed below in Table I:

(39) TABLE-US-00001 TABLE I Name Definition Recall (Frame-based) correctly-matched objects/total ground truth objects. Precision (Frame-based) correctly-matched objects/total output objects. FA/Frm (Frame-based) No. of false alarms per frame. GT No. of groundtruth (GT) trajectories. MT % Mostly tracked: Percentage of GT trajectories which are covered by tracked output for more than 80% in length. ML % Mostly lost: Percentage of GT trajectories which are covered by tracked output for less than 20%. The smaller the better. PT % Partially tracked: 1.0-MT-ML. The smaller the better. Frag Fragments: The total of No. of times that a groundtruth is interrupted in tracking result. The smaller the better. IDs ID switches: The total of No. of times that a tracked trajectory changes its matched GT identity. The smaller the better. MOTA The Multiple Object Tracking Accuracy takes into account false positives, missed targets and identity switches. MOTP The Multiple Object Tracking Precision is simply the average distance between true and estimated targets.

(40) To fairly compare with existing works, original annotations of the datasets were used to ensure the consistency of offline learned human detector responses. An F1-measure was adopted to measure the two metrics comprehensively:

(41) 0 F 1 = 2 .Math. precision .Math. recall precision + recall
which represents the harmonic mean of precision and recall.

(42) Three videos (S2.L1,S2.L2,S2.L3) from PETS09 datasets contain different scenarios with a progressive increase in person density. The video length is 795, 436, 240 frames long respectively. The results are listed in Table II. Since tracklet generation is based on the described MRF model, it is used as a baseline and make a comparison. Online and Offline represents that the training samples for metric learning are respectively collected from all of the three videos in an online or offline fashion. A comparison with some other state-of-art methods is also included.

(43) TABLE-US-00002 TABLE II Comparison of Tracking Results with State-of-art Methods on the PETS09 Dataset. MOTA MOTP Method Recall (%) Precision (%) F.sub.1 (%) FAF GT MT (%) PT (%) ML (%) Frag IDs (%) (%) PETS-S2.L1 B. Yang 97.8 94.8 96.28 0.31 19 95 5 0 2 0 S. Zhang 97.0 98.6 97.79 0.08 19 95 5 0 4 0 95.6 91.6 V. Chari 92.4 94.3 93.3 19 95 5 0 74 56 85.5 76.2 Offline 95.3 96.4 95.8 0.13 19 95 5 0 2 3 94.2 91.3 Online 97.1 98.7 97.9 0.10 19 95 5 0 3 0 95.6 91.7 PETS-S2.L2 A. Milan 65.5 89.8 75.8 1.43 74 37.8 45.9 16.3 99 73 56.9 59.4 S. Zhang 62.1 92.0 74.2 1.03 74 28.4 55.4 16.2 91 112 55.6 67.1 V. Chari, 60.6 88.6 72.0 43 14.0 79.1 6.9 379 244 50.4 60.6 Offline 64.9 91.0 75.7 1.05 74 32.4 55.4 12.2 102 94 56.9 66.9 Online 65.8 90.8 76.3 1.22 74 35.1 51.4 13.5 95 71 57.8 66.9 PETS-S2.L3 S. Zhang 43.4 96.4 59.8 0.21 44 13.6 34.1 52.3 8 13 42.6 63.9 V. Chari, 45.4 91.2 60.7 44 27.3 34.1 38.6 50 44 40.3 61.2 Offline 40.1 96.7 56.7 0.25 44 18.2 33.4 48.4 13 21 41.7 62.5 Online 43.0 98.7 59.9 0.19 44 18.2 29.6 52.3 9 10 43.6 64.8

(44) Experiments were also conducted using the TUD Stadtmitte dataset, which is 179 frames long. It was captured on a street with frequent occlusions and interactions among the pedestrians. Some of tracking results are shown in Table III below:

(45) TABLE-US-00003 TABLE III Comparison of Tracking Results between with State-of-art Methods on the TUD- Stadtmitte Dataset. MOTA MOTP Method Recall (%) Precision (%) F.sub.1 (%) FAF GT MT (%) PT (%) ML (%) Frag IDs (%) (%) B. Yang 87.0 96.7 91.59 0.18 10 70 30 0 1 0 S. Zhang 85.8 98.1 91.54 0.10 10 80 20 0 2 1 84.2 86.5 V. Chari 59.6 89.9 72.0 10 20 80 0 22 15 51.6 61.6 Ours (offline) 84.2 98.4 90.7 0.17 10 70 30 0 3 5 81.9 85.9 Ours (online) 87.5 97.6 92.3 0.13 10 90 10 0 2 0 85.4 86.7

(46) In the above disclosure, reference has been made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific implementations in which the disclosure may be practiced. It is understood that other implementations may be utilized and structural changes may be made without departing from the scope of the present disclosure. References in the specification to one embodiment, an embodiment, an example embodiment, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

(47) Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.

(48) Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (SSDs) (e.g., based on RAM), Flash memory, phase-change memory (PCM), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.

(49) An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network. A network is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.

(50) Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.

(51) Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, an in-dash vehicle computer, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.

(52) Further, where appropriate, functions described herein can be performed in one or more of: hardware, software, firmware, digital components, or analog components. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. Certain terms are used throughout the description and claims to refer to particular system components. As one skilled in the art will appreciate, components may be referred to by different names. This document does not intend to distinguish between components that differ in name, but not function.

(53) It should be noted that the sensor embodiments discussed above may comprise computer hardware, software, firmware, or any combination thereof to perform at least a portion of their functions. For example, a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code. These example devices are provided herein purposes of illustration, and are not intended to be limiting. Embodiments of the present disclosure may be implemented in further types of devices, as would be known to persons skilled in the relevant art(s).

(54) At least some embodiments of the disclosure have been directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a device to operate as described herein.

(55) While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate implementations may be used in any combination desired to form additional hybrid implementations of the disclosure.