COMPUTATIONALLY EFFICIENT TRAJECTORY REPRESENTATION FOR TRAFFIC PARTICIPANTS

Abstract

The present disclosure relates generally to autonomous vehicles, and more specifically to techniques for representing trajectories of objects such as traffic participants (e.g., vehicles, pedestrians, cyclists) in a computationally efficient manner (e.g., for multi-object tracking by autonomous vehicles). An exemplary method for generating a control signal for controlling a vehicle includes: obtaining a parametric representation of a trajectory of a single object in the same environment as the vehicle; updating the parametric representation of the single-object trajectory based on data received by one or more sensors of the vehicle within a framework of multi-object and multi-hypothesis tracker; and generating the control signal for controlling the vehicle based on the updated trajectory of the object.

Claims

1. A method for generating a control signal for controlling a vehicle, comprising: obtaining a parametric representation of a trajectory of a single object in the same environment as the vehicle; updating the parametric representation of the single-object trajectory based on data received by one or more sensors of the vehicle within a framework of multi-obj ect and multi-hypothesis tracker; and generating the control signal for controlling the vehicle based on the updated trajectory of the object.

2. The method of claim 1, wherein the control signal is generated based on the updated trajectory of the object and at least one other object in the same environment as the vehicle.

3. The method of claim 1, further comprising: providing the control signal to the vehicle for controlling motion of the vehicle.

4. The method of claim 1, further comprising: determining an intent associated with the object based on the updated trajectory, wherein the control signal is determined based on the intent.

5. The method of claim 4, wherein the intent comprises exiting a road, entering a road, changing lanes, crossing street, making a turn, or any combination thereof.

6. The method of claim 1, further comprising: inputting the updated trajectory into a trained machine-learning model to obtain an output, wherein the control signal is determined based on the output of the trained machine-learning model.

7. The method of claim 6, wherein the machine-learning model is a neural network.

8. The method of claim 1, wherein obtaining the parametric representation of the trajectory comprises retrieving, from a memory, a plurality of control points.

9. The method of claim 8, further comprising: transforming the obtained parametric representation to a new coordinate system based on movement of the vehicle.

10. The method of claim 9, wherein transforming the obtained parametric representation comprises transforming the plurality of control points of the parametric representation to the new coordinate system.

11. The method of claim 1, wherein updating the parametric representation comprises: predicting an expected parametric representation based on the obtained parametric representation and a motion model; comparing the expected parametric representation with the data received by the one or more sensors of the vehicle; and updating the parametric representation based on the comparison.

12. The method of claim 11, wherein predicting the expected parametric representation comprises determining a plurality of control points of the expected parametric representation.

13. The method of claim 12, wherein determining the plurality of control points of the expected parametric representation comprises obtaining a mean and/or a covariance of the plurality of control points of the expected parametric representation.

14. The method of claim 11, wherein the motion model is a linear model configured to shift the obtained parametric representation forward by a time period.

15. The method of claim 11, wherein the parametric representation is updated based on a Kalman filter algorithm.

16. The method of claim 11, further comprising: determining whether the object is abnormal based on the comparison.

17. The method of claim 1, wherein the data is a first data and the updated parametric representation is a first parametric curve representation, and the method further comprises: updating the obtained parametric representation of the trajectory based on a second data received by the one or more sensors of the vehicle to obtain a second updated parametric representation; and storing the first updated parametric representation and the second updated parametric representation as hypotheses associated with the object.

18. The method of claim 1, wherein the object is a traffic participant.

19. A vehicle, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: obtaining a parametric representation of a trajectory of a single object in the same environment as the vehicle; updating the parametric representation of the single-object trajectory based on data received by one or more sensors of the vehicle within a framework of multi-object and multi-hypothesis tracker; and generating the control signal for controlling the vehicle based on the updated trajectory of the object.

20. A system for generating a control signal for controlling a vehicle, comprising: one or more programs, wherein the one or more programs are stored in memory and configured to be executed by one or more processors, the one or more programs including instructions for: obtaining a parametric representation of a trajectory of a single object in the same environment as the vehicle; updating the parametric representation of the single-object trajectory based on data received by one or more sensors of the vehicle within a framework of multi-object and multi-hypothesis tracker; and generating the control signal for controlling the vehicle based on the updated trajectory of the object.

Description

DESCRIPTION OF THE FIGURES

[0036] For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.

[0037] FIG. 1A illustrates a scenario in which only the current states of two vehicles are known, in accordance with some embodiments.

[0038] FIG. 1B illustrates a scenario in which only the current states of two vehicles are known, in accordance with some embodiments.

[0039] FIG. 1C illustrates a scenario in which both the current states and historical trajectories of objects are known, in accordance with some embodiments.

[0040] FIG. 1D illustrates a scenario in which both the current states and historical trajectories of objects are known, in accordance with some embodiments.

[0041] FIG. 2 illustrates an exemplary process for generating a control signal for a vehicle, in accordance with some embodiments.

[0042] FIG. 3A illustrates an exemplary trajectory representation, in accordance with some embodiments.

[0043] FIG. 3B illustrates an exemplary trajectory representation, in accordance with some embodiments.

[0044] FIG. 4 illustrates an example of a computing device, in accordance with some embodiments.

DETAILED DESCRIPTION

[0045] The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments. Thus, the various embodiments are not intended to be limited to the examples described herein and shown, but are to be accorded the scope consistent with the claims.

[0046] The present disclosure is directed to methods, electronic devices, systems, apparatuses, and non-transitory storage media for generating control signals for autonomous vehicles and operating autonomous agents. Embodiments of the present disclosure can represent trajectories of traffic participants (e.g., the ego vehicle, other vehicles, pedestrians, cyclists, etc.) in a memory efficient and computationally efficient manner. In some embodiments, a trajectory of an object is represented as a parametric representation. The parametric representation can be a generalization of a Bezier curve, i.e., a linear combination of basis functions (e.g., first basis functions, second basis functions, third basis functions, etc.) that are learnt from data but maintains essential properties of Bezier curves that guarantee low memory footprint and computational efficiency of coordinate transformations.

[0047] Using parametric representations such as Bezier curves to represent traffic trajectories provides a number of technical advantages. For example, Bezier curves are well-suited for representing typical traffic trajectories (e.g., smooth driven trajectories) observed from autonomous agents. Further, historical trajectories together with the current state of the environment can provide rich information about other agents’ intent and can significantly reduce uncertainties and simplify the planning process.

[0048] Further, the parametric representations require significantly less memory than naive trajectory representations. A naive trajectory representation includes a time sequence of kinematic states each defined by a plurality of kinematic state parameters (i.e., position, velocity, acceleration). Thus, in a multi-hypotheses approach, the computational cost scales both in memory and in computation as M*L, the product of the number of hypotheses maintained (M) and the length of the trajectory tracked (L) (e.g., number of time steps tracked) for an object. In contrast, a Bezier curve or its generalization is parameterized by a plurality of control points and thus can summarize the observed kinematic states over a time period of several seconds (e.g., tens to hundreds of timestamps/cycles) using a constant number of parameters (e.g., 8 for cubic Bezier curves, 12 for Quintic Bezier curves). Parametric representations bring down the computational cost of any trajectory tracking algorithm from scaling with the length of the tracked trajectory (L) to a constant. In other words, embodiments of the present disclosure can keep the scaling of tracking algorithm comparable to that of only tracking the current state while at the same time providing rich temporal context. With this, reductions in memory footprint >95% can be achieved.

[0049] Further, the parametric representations of trajectories require significantly less computational resources than naive trajectory representations. For example, to obtain the Bezier curve representation seen from a different coordinate system (e.g., due to ego-vehicle movement), the system can translate and rotate the control points, i.e., the parameters of the curve, to obtain the exact representation of the curve in a computationally efficient manner. In some embodiments, the control points can be transformed using an affine transformation. In other words, they are transformed in the same way a static position of an environmental feature, such as a traffic sign, is transformed. This is in contrast to, for example, a polynomial representation in some coordinate frame which does not allow direct transformation of its parameters. If the trajectory of an object is fitted using a polynomial representation, ego motion compensation would be difficult because a polynomial in the coordinate frame of the ego vehicle does not remain a polynomial under rotation. Thus, the system would be limited to keeping a list of all measured points, compensate those for ego motion, and refit the points with a polynomial, which can be computationally expensive.

[0050] Further, the parametric representations can be updated in a computationally efficient manner. The motion model needed for the prediction step in a tracking algorithms is a time invariant linear transformation that only depends on the cycle time. When observing position, velocity, and acceleration of an object, the observation model is linear time invariant and thus the Kalman update equations for the parameters of the trajectory are exact. The parameters are fully interpretable. The standard kinematic state vector of an object along the trajectory (position, velocity, acceleration) can be recovered using a linear transform. There is no fitting of the data required, rather, the update step directly incorporates new observations into the parameters of the curve. Thus, the sequence of observations does not need to be stored in order to calculate a Bezier curve representation or its generalization.

[0051] As described herein, some embodiments of the present disclosure obtain multivariate Gaussian distributions over the control points of a Bezier Curve or its generalization together with an adapted motion model and measurement model as a direct drop-in in the Kalman update equations for the Gaussian distribution over kinematic state vectors used in state trackers. This would transform any state tracking algorithm into a trajectory tracking algorithm without the computational and memory cost of maintaining the sequence of states that form the trajectory. At the same time, no information is lost when compared to ordinary state tracking as the last point of the trajectory always corresponds to the current state of the object. Since Bezier curves represent comfortable and smooth trajectories of limited jerk, deviations for actually measured object states can be used to detect anomalies in the behavior of other traffic participants. The compact Bezier representation is also uniquely suited as an input for AI algorithms for situation understanding in autonomous agents as they summarize the past behavior of agents in the context of the traffic scene.

[0052] The following description sets forth exemplary methods, parameters, and the like. It should be recognized, however, that such description is not intended as a limitation on the scope of the present disclosure but is instead provided as a description of exemplary embodiments.

[0053] Although the following description uses terms “first,” “second,” etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first graphical representation could be termed a second graphical representation, and, similarly, a second graphical representation could be termed a first graphical representation, without departing from the scope of the various described embodiments. The first graphical representation and the second graphical representation are both graphical representations, but they are not the same graphical representation.

[0054] The terminology used in the description of the various described embodiments herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0055] The term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event],” depending on the context.

[0056] Safe and comfortable navigation of an autonomous vehicle necessitates anticipatory planning, i.e., the ability to form expectations and make predictions about the future behavior of dynamic objects in the environment. The basis for such predictions is an accurate estimate of the present state of dynamic objects based on past observations. Naturally, such state estimates are probabilistic due to uncertainties in the measurement process or unobservable quantities such as driver intent. State space models are uniquely suited for this task as they provide a solid probabilistic framework to sequentially absorb observations into estimates of the current state of dynamic objects and track their motion over time. The standard technique for this is the Kalman filter or any of its variants and extensions. Below the main operational steps of the Kalman filter, some of the weaknesses of this approach, and how it can be improved are described.

Kalman Filtering Algorithms

[0057] The probability distributions over the current state vector of a single object xt given a sequence of all past observations of the object up to time t can be considered as P(x.sub.t|o.sub.t,..., o.sub.0} ≡ P.sub.t|t. For example, an autonomous vehicle tracked in two dimensions is described by a typical state vector X.sub.t comprising 6 entries: x-position, y-position, velocity v, acceleration a, yaw angle ψ and turn rate ψ̇. In the simplest case, the probability density for this vector is a Gaussian distribution P.sub.t|t = N(x.sub.t; .Math..sub.t|t, Σ.sub.t|t ) with mean .Math..sub.t|t and convariance Σ.sub.t|t.

[0058] Of interest are further distributions over future state vectors at time t + δt given current state vectors x.sub.t : P(x.sub.t+δt|x.sub.t). Again, the simplest case is a Gaussian distribution P(x.sub.t+δt|x.sub.t) = N(x.sub.t+δt; x̂.sub.t+δt, Q(δt)) with the mean being a linear function x̂.sub.t+δt = F(δt)x.sub.t and covariance matrix Q(δt) and matrices F(δt) and Q(δt) only depending on δt.

[0059] Additionally, the likelihood of observations o.sub.t for a given state vector P(o.sub.t|x.sub.t) is considered. Again, the simplest case is a Gaussian distribution P(o.sub.t|x.sub.t) = N(ot; ôt, R) with the mean being a linear function of the given state vector ô.sub.t =Hx.sub.t and covariance R. As an example, for the state vector described above, and a sensor that only returns position and velocity estimates, the matrix H ∈ ℝ.sup.3×6 may be:

[00001] $H= (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \end{matrix})$

and the observation vector o.sub.t would comprise of three entries: x-position, y-position and velocity v.

[0060] With these distributions at hand, the system can assimilate new observations sequentially from time t + δt into refined estimates of the state vector x.sub.t+δt. This happens on the basis of all previously acquired information via the iteration of a prediction step and subsequent application of Bayes’ Rule in an update step for each observation in a sequence:

[00002] $P_{t + (δ t| t} \equiv P ((x_{t + δ t}| o_{t}, . ., o_{0}) = \int d x_{t} P ((x_{t + δ t}| x_{t}) P_{(t| t}$

[00003] $P_{t + δ t |t + δ t)} \equiv P (x_{t + δ t} |o_{t + δ t}, . ., o_{0})) \propto P (o_{t + δ_{t}} |x_{t + δ t})) P_{t + δ t |t)}$

[0061] Assuming Gaussian distributions for the parameter vectors and Gaussian likelihoods for observations together with linear motion and observation models, these equations can be solved in closed form and one obtains the standard Kalman filter equations for the prediction step P.sub.t+δt|t = N(x.sub.t+δt; .Math..sub.t+δt|t, Σ.sub.t+δt|t) with:

[00004] $μ_{t + δ t |t)} = F (δ t) μ_{t |t)}$

[00005] $Σ_{t + δ t |t)} = F (δ t) Σ_{t |t)} F {(δ t)}^{T} + Q (δ t)$

[0062] The updated P.sub.t+δt|t+δt = N (x.sub.t+δt; .Math..sub.t+δt|t+δt, Σ.sub.t+δt|t+δt) is then calculated as:

[00006] $S = H Σ_{t + δ t |t)} H^{T} + R$

[00007] $K = Σ_{t + δ t |t)} H^{T} S^{- 1}$

[00008] $μ_{t + δ t |t + δ t)} = μ_{t + δ t |t)} + K (o_{t + δ t} - H μ_{t + δ t |t)})$

[00009] $Σ_{t + δ t |t + δ t)} = (I - KH) Σ_{t + δ t |t)}$

[0063] Using these equations, the current state of a single object can be tracked efficiently from sequential observations.

[0064] If tracking is performed from a moving sensor as is the case for autonomous vehicles then the system has to compensate for sensor movement in a process called ego-compensation. The system can track an object in a fixed world coordinate system which requires transforming the observations from the coordinate system of the moving vehicle into a fixed coordinate system where prediction and update step are performed. Situation interpretation and planning, however, generally happen in the coordinate frame of the vehicle and so the system needs to transform the updated state estimates back to the vehicle coordinate system.

[0065] Preferably, tracking happens directly in the coordinate frame of the vehicle and thus the system needs to transform the state estimates to the current coordinate system of the vehicle in which measurements are taken. Updates can then be performed directly in the coordinate frame of the vehicle.

Multi-Object Tracking

[0066] The environment of autonomous vehicles can contain more than one dynamic object to be tracked with multi-object tracking algorithm (MOT). Consequently, sensors will return a set of detections for different objects. For example, radar sensors will return multiple detected objects based on radar reflections or multiple objects may be detected in camera images. Typical object detectors for radar reflections, camera images or LiDAR sensors work on a frame by frame basis, i.e., there is no established one-to-one correspondence of an object detection i in a sensor reading at time t to the state vector of object i estimated on the basis of previous readings. Such direct correspondence in general cannot be established as objects enter and leave the range of sensors, and object detectors may produce false positive object detections or miss object detections due to occlusions or simple detector insufficiencies.

[0067] Let {o.sub.t} be the set of k.sub.t object detections at time t with o.sub.t,i representing detection i at time t. The state estimate of a single object depends on the precise association of object detections to tracked objects in the update step of the Kalman equations. For example, considering the state of object j at time

[00010] $t = 3, P (x_{3}^{j} |o_{3, 2}, o_{2, 0}, o_{1, 1}, o_{0, 0}))$

is different from

[00011] $P (x_{3}^{j} |o_{3, 1}, o_{2, 0}, o_{1, 2}, o_{0, 1}))$

because of different data association sequences.

[0068] Further, the data association sequences of different tracked objects must be consistent. If an object detection can only be associated with a single tracked object, then the data association sequences of different objects must be disjoint - a specific detection o.sub.t,i cannot occur in the association sequence of two different objects.

[0069] This consistency of data association is ensured in multi-object tracking algorithms prior to the update step. The ability to calculate the likelihood of every single object detection i to arise from a predicted state of any object j as

[00012] $P (o_{t + δ t, i} |x_{t + δ t}^{j}))$

allows to select the most likely consistent associations of the set of k.sub.t+δt observations {o.sub.t+δt} to the currently tracked objects and proceed to the update step with those.

Multi Hypothesis Tracking

[0070] However, it may be possible that at any time step there are several consistent data association possibilities of comparable likelihood. This ambiguity arises in particular in cluttered scenes with many occlusions as is typical for example in inner city traffic.

[0071] It is understood that the sequential association of observations into state estimates generally does not allow for the correction of errors in data association and thus choosing the wrong data associations poses the risk of potentially misestimating the current state of the environment and consequently not being able to plan a safe motion.

[0072] This risk can be mitigated by lifting the restriction of only working with a single consistent data association. Multi-hypothesis tracking (MHT) algorithms maintain a list of the top M most likely consistent data associations at each point in time and proceed with each of them independently in a branching process. In this way, data associations of similarly high likelihood and resulting estimates of the states of the dynamic environment are maintained until further observations have been gathered that resolve ambiguities.

[0073] A consistent data association for all tracked objects maintained in an MHT algorithm is called “global hypothesis”. A data association sequence for a single tracked object and the resulting state estimate is called “local hypothesis”. A global hypothesis thus includes a set of local hypotheses that are consistent. A single local hypothesis may appear in several global hypotheses as long as consistency is not violated. The number of global hypotheses maintained is typically in the hundreds and the number of local hypotheses is of the same order.

[0074] To reduce memory footprint, local hypotheses that are not part of any global hypotheses are pruned from memory.

Difficulties With This Approach

[0075] Though theoretically appealing, the approach outlined above has a few shortcomings that the present disclosure aims to address:

[0076] The choice of state vector x is generally motivated from prior knowledge about the kinematics of the tracked objects, e.g. for a vehicle one typically chooses as state variables position, velocity, acceleration, heading angle, and turn rate as already outlined. While intuitive, this state definition implies a non-linear motion model that necessitates to introduce approximations in the update equations. For example, the x-position at time t + δt would have to be calculated as

[00013] $x_{t + δ t} = x_{t} + \int_{t}^{t + δ t} \cos (ψ_{t^{'}}) v_{t^{'}} d t^{'}$

which is non-linear in the state variables. In fact, the most common extensions of the classical Kalman filter, so-called extended or unscented filters aim at allowing non-linear motion and observation models. Further, motion models often entail parameters that are not generally available to an observer. For example, in the vehicle model described, velocity v and turn rate ψ̇ are connected via steering angle α and wheelbase

[00014] $L via ψ_{t} = \frac{v_{t}}{L} \tan α_{t} .$

The deviations of actual object motion and motion model used are generally accounted for by the noise term Q, the so-called process noise. It is clear that model misspecification need to be accounted for with larger noise terms and thus makes estimates more uncertain.

[0077] Further, such a definition of state vector is primarily aimed at exactly representing the current kinematic state of the vehicle and not at representing information from past observations that is relevant for predicting the future. For example, in the scenario in FIG. 1A, two vehicles are observed with their full kinematic state vector at the present state on a highway on-off ramp. However, this kinematic state does not contain information that reduces uncertainty about the future. For both vehicles, entering or exiting the highway appears equally likely and thus an observer travelling behind these two vehicles would have to contend with 4 equally probable future evolutions of the scene in front of it as shown in FIG. 1B.

[0078] Since driver intent is generally unobservable, having the entire or at least part of the past trajectory of traffic participants available would make predictions about the future behavior of traffic participants less uncertain and thus facilitate planning. FIGS. 1C and 1D below illustrate this. With the past trajectories given, the uncertainty about the probable future evolution of the scene practically vanishes.

[0079] A naive approach to trajectory tracking would be to simply maintain a list of the last L estimates of the kinematic state of a dynamic object. If tracking is run at a frequency of 10 Hz, then maintaining such a list for ΔT seconds would need hold L = 10ΔT state vectors. The memory footprint thus scales linearly with L for every tracked object in multi-object tracking or for every tracked local hypothesis in multi-hypothesis tracking. See Granström et al (https://arxiv.org/abs/1912.08718, Table I) for a discussion of computational requirements in state of the art implementations of MHT trackers.

[0080] Further, if tracking is to be performed in the coordinate frame of the vehicle, then a naive list of L historic states as trajectory would require ego-compensation of every single time point which is computationally expensive.

[0081] The present disclosure includes embodiments directed to using an object’s past trajectory over L time steps as its state. But instead of using a list, the system uses a parametric representation for this trajectory with small memory footprint independent of L and a corresponding linear motion and observation models that allow estimation of the parameters of this representation using Kalman update equations without the need for approximations. The small memory footprint makes trajectory tracking possible to be used in conjunction with multi-object tracking algorithms in real-time capable systems run on embedded hardware.

[0082] Further, the ego compensation in this representation is computationally very cheap as only the parameters need to be transformed and thus the computational effort does not depend on the length of the trajectory tracked.

[0083] Irrespective of whether one is tracking a single or multiple objects and whether one is tracking a single or multiple data association hypotheses, the core of every tracking algorithm that follows the paradigm of sequential Bayesian data assimilation is similar. One needs a state representation, a motion model that propagates the state forward in time and an observation model that allows to assess the likelihood of observations for a given state estimate. Described below are the mathematical details of the proposed representation for trajectories in d dimensions based on n+1 basis functions, a corresponding motion model and observation model.

[0084] The system considers n+1 trajectory basis functions of time Φ.sub.i(t) arranged in an n+1 dimensional vector Φ(t) = [Φ.sub.0(t), Φ.sub.1(t), Φ.sub.2(t), ..., Φ.sub.n(t)].

[0085] The system now defines an (n+1) × d dimensional matrix P of control points. Each row in P corresponds to one control point in d dimensions. The control points are the parameters of a trajectory and we find the position along the trajectory parameterized through P at time t as c(t) = Φ.sup.T(t)P where c(t) is a d dimensional vector. The superscript T denotes transposition as usual. It is clear that P fully specifies c(t) for all t and we thus only need to estimate and track P if we want to track c(t).

[0086] Ego motion compensation of this trajectory is obtained by only transforming the control points that trivially transform under sensor translation and rotation as any fixed point in space.

[0087] The motion model for P is obtained as a time invariant linear transform that can be calculated directly from the basis functions.

[0088] The observation model is obtained in the following way: sensors for dynamic objects are generally able to obtain positional information as well as time derivative information such as velocities. Due to our chosen representation, the j.sup.th time-derivative of the trajectory is given as a linear transform of our control points d.sup.j/dt.sup.j c(t) = H.sub.jP with H.sub.j=d.sup.j/dt.sup.i Φ.sup.T(t).

[0089] FIG. 2 illustrates an exemplary process for generating a control signal for a vehicle, in accordance with some embodiments. Process 200 is performed, for example, using one or more electronic devices implementing a software platform. In some examples, process 200 is performed using one or more electronic devices on an autonomous vehicle (e.g., the ego vehicle). In some embodiments, process 200 is performed using a client-server system, and the blocks of process 200 are divided up in any manner between the server and one or more client devices. Thus, while portions of process 200 are described herein as being performed by particular devices, it will be appreciated that process 200 is not so limited. In process 200, some blocks are, optionally, combined, the order of some blocks is, optionally, changed, and some blocks are, optionally, omitted. In some examples, additional steps may be performed in combination with the process 200. Accordingly, the operations as illustrated (and described in greater detail below) are exemplary by nature and, as such, should not be viewed as limiting.

[0090] At block 202, (e.g., one or more electronic devices) obtains a set of estimates of parametric trajectory representations of objects. The objects can be traffic participants in the same environment as the vehicle and can be a vehicle, a pedestrian, a cyclist, a drone, an animal, etc.

[0091] A parametric trajectory representation in d dimensions comprises of n + 1 basis functions of time and n + 1 control points in d dimensions. The n + 1 basis functions of time τ arranged in an n + 1 dimensional vector:

[00015] $Φ (τ) = [ϕ_{0} (τ) ϕ_{1} (τ) ϕ_{2} (τ) .Math. ϕ_{n} (τ)]$

[0092] The system aims to represent trajectories of length ΔT by the intervalτ ∈ [0,1], which is always possible for any ΔT by rescaling dr = dt / ΔT.

[0093] At any moment in time t, a point along the past trajectory of an object in d dimensions over a timespan of ΔT is then given by a linear combination of the n + 1 basis functions and the n + 1 current control points in d dimensions arranged in an (n + 1) × d matrix P.sub.t where each row corresponds to one control point:

[00016] $c (τ) = Φ^{T} (τ) P_{t}$

[0094] Here c(τ) ∈ ℝ.sup.d is a point along the tracked trajectory with τ = 0 corresponding t - ΔT and τ = 1 corresponding to the current time t. The control points of P.sub.t are the parameters of the curve that change in time as the object moves. They transform under movement of the coordinate system in the same way as points in space and hence control points is a fitting name.

[0095] An example choice for such basis functions are the Bernstein Polynomials

[00017] $ϕ_{i} (τ) = (\begin{matrix} n \\ i \end{matrix}) τ^{i} {(1 - τ)}^{(n - i)}$

[0096] In this case, the curves c(τ) are known as Bezier curves and the control points P.sub.t have a particularly intuitive interpretation. We will make reference to this special case as a running example. However, it should be understood by one of ordinary skill in the art that all statements are applicable for general basis functions. In particular, it is possible to optimize basis functions for the accurate representation of empirically measured trajectories.

[0097] In order to be able to use this trajectory representation in a Kalman filter and to be able to specify distributions over the parameters of the trajectory, the vector x.sub.t ∈ ℝ.sup.(n+1)d is introduced as state representation at time t. x.sub.t is simply formed by concatenating the entries of the (n + 1) × d matrix of control points P.sub.t, the first 2 entries corresponding to p.sub.o, the second two entries corresponding to p.sub.1 and so forth. It can be assumed that distributions over this state are modelled via a multivariate Gaussian distribution in ℝ.sup.(n+1)d with mean .Math..sub.t and covariance Σ.sub.t. Due to the equivalence of P.sub.t and x.sub.t the following description uses whichever notation is most convenient at the time.

[0098] FIG. 3A illustrates an example using a cubic Bezier curve in 2 dimensions (i.e., n = 3 and d = 2). A cubic Bezier curve is parameterized by 4 control points: p.sub.0, p.sub.1, p.sub.2, and p.sub.3. Each control point is a 2-dimensional point represented by 2 values. The state vector x.sub.t of such a trajectory is of size (n + 1 )d= 8. In the figure, the probabilistic aspects are also indicated. The annotated points and corresponding ellipses denote the mean .Math..sub.t and 95% confidence intervals of the entries in x.sub.t corresponding to the control points. The solid line denotes the mean estimate of the trajectory ĉ(τ) = (Φ.sup.T (r) .Math. I.sub.2).Math..sub.t and the dashed lines correspond to sample trajectories c(τ) = (Φ.sup.T (τ) .Math. I.sub.2)x.sub.t with x.sub.t sampled from a multivariate Gaussian distribution with mean .Math..sub.t and covariance Σ.sub.t.

[0099] Block 206 then performs the data assimilation step by updating the set of local hypotheses based on data received by one or more sensors of the autonomous system. At block 206a ego compensation is performed. For the parametric trajectory representation, only the control points, i.e., the state vector, needs to be transformed. This requires a much smaller computational effort than transforming a list of kinematic state vectors. Since this is a linear transformation, we can directly apply it to the parameters of the state density. Assuming the frame of reference is translated by a d-dimensional vector Δo and rotated through the action of an d × d matrix R the system can first transform the covariance matrix of our state density. The covariance matrix is only affected by the rotation R:

[00018] $Σ_{t} \leftarrow (I_{n} .Math. R) Σ_{t} {(I_{n} .Math. R)}^{T}$

[0100] The mean of the state density is first converted to homogeneous coordinates, i.e., we introduce an additional dimension to the control points that is constant and equal to one. The homogenized mean vector is calculated as

[00019] $μ^{h} = [μ_{1}, .Math. μ_{d}, 1, μ_{d + 1}, .Math. μ_{2 d}, 1, .Math., μ_{n d}, .Math. μ_{(n + 1) d}, 1]$

[0101] Then, we introduce the d × (d + 1) matrix

[00020] $T = [R - R Δ o]$

[0102] and can transform the mean vector as As before, I.sub.n is an n × n unit matrix.

[0104] At block 206b, trajectory extensions are predicted using a specific motion model for the control points of the trajectory.

[0105] Assume at time t, the system has m > n + 1 samples of a trajectory c(τ.sub.i) along a trajectory at different times τ.sub.i, i ∈ {1,..,m} between τ.sub.0 = 0 and T.sub.m = 1. These samples can be arranged as the rows of an m × d matrix C. An m × (n + 1) matrix B can be formed so that row i of B corresponds to Φ(τ.sub.i). Then, the control points can be estimated as least square fit to the samples of the trajectory samples:

[00022] $P_{t} = {(B^{T} B)}^{- 1} B^{T} C$

[0106] We are now interested in the motion model of the control points, i.e. under a move along the trajectory. For this, another m × (n + 1) matrix B′ can be formed so that row i of B′ corresponds to Φ(τ.sub.i + δ.sub.t/ΔT). The system can then obtain the transformed control points:

[00023] $P_{t + δ t} = \underset{F (δ t)}{\underset{︸}{{(B^{T} B)}^{- 1} B^{T} B^{'}}} P_{t}$

[0107] and thus have a linear motion model F(δt)for the control points that only depends on the choice of basis functions. Correspondingly for the state vector, the system can find: [0108] where I.sub.d is the identity matrix in d dimensions.

[0109] In particular, the end point of the moved trajectory will be c(1) = Φ.sup.T (1)P.sub.t+bt.FIG. 3B illustrates this for the running example of a cubic Bezier curve.

[0110] Matrix P.sub.i, i.e. control points P.sub.0,P.sub.1,P.sub.2 and P.sub.3parameterizing the trajectory at time t are propagated to the matrix P.sub.t+bt.Math.,i.e. control points P′.sub.0, P′.sub.1, P′.sub.3, and P′.sub.3parameterizing the trajectory at time t + δ.sub.t. By construction, the new trajectory estimate follows the old estimate exactly up to time t.

[0111] The prediction step in the Kalman filter equations is then written as

[00025] $\begin{array}{l} μ_{t + δ t |t)} = (F (δ t) .Math. I_{d}) μ_{t |t)} \\ Σ_{t + δ t |t)} = (F (δ t) .Math. I_{d}) Σ_{t |t)} {(F (δ t) .Math. I_{d})}^{T} + Q (δ t) \end{array}$

[0112] At block 206d, the likelihood of current sensor readings based on the set of predicted trajectory extensions for the local hypotheses. Typical object detectors provide measurements of object positions, velocities and possibly accelerations. These kinds of kinematic measurements are easily obtained. The i-th derivative is simply:

[00026] ${(\frac{d^{i}}{d t^{i}} c (t)|}_{t^{'}} = \frac{1}{Δ T^{i}} {(\frac{d^{i}}{d τ^{i}} Φ (τ)|}_{τ = t^{'} / Δ T} P$

[0113] For trajectory tracking, the natural choice is to consider the most recent, i.e. last, point of the trajectory at τ = 1. Depending on the sensory information available for each object, we can then form an observation matrix H.

[0114] For example, if the system is tracking the trajectory of an object in 2D and the sensors provide with position and respective velocities, form an observation vector o.sub.t = [x,y,v.sub.x,v.sub.y]. We then form the 2 × (n+1) matrix H.sub.o with rows Φ(1) and

[00027] $\frac{1}{Δ T} {(\frac{d}{d τ} Φ (τ)|}_{τ = 1}$

and obtain the matrix H necessary for the update step of the Kalman equations from above:

[00028] ${H = H}_{σ} .Math. I_{2}$

[0115] With this, the system has all necessary parts in order to be able to track the trajectory of an objects over time horizon ΔT by always updating the trajectory with the most recent observation.

[0116] At block 206d, the M most likely global hypotheses are formed based on the likelihood of the current sensor readings calculated in the previous step.

[0117] At block 206e, all local hypotheses not used in the M most likely global hypotheses are pruned from memory.

[0118] Process 206d then returns the set of the M most likely global hypotheses and the corresponding local hypotheses for further processing in block 208.

[0119] At block 208, the system determines a control signal for the vehicle based on the updated trajectory of the object. In some embodiments, the system can determine an intent associated with the object based on the updated trajectory of the object and determine a control signal for the vehicle accordingly. For example, as discussed with reference to FIGS. 1A-D, historical trajectories can be used to determine intent of a traffic participant (e.g., exiting highway, entering highway, crossing the street, making a turn). Based on the intent associated with the object, the system can determine a control signal for the vehicle to avoid collision with the object. The control signal can be provided or transmitted to the vehicle to control the vehicle (e.g., maintaining speed, accelerating, decelerating, changing direction, etc.).

[0120] In some embodiments, the system can use the updated trajectory for various downstream analyses. For example, the system can input the updated trajectory into a machine-learning model for situation understanding. For example, the machine-learning model can be configured to receive a trajectory of an object and identify an intent of the object, identify abnormal behaviors, predict the future trajectory, etc. The compact parametric representations are uniquely suited for AI algorithms for situation understanding in autonomous vehicles as they summarize the past behavior of traffic participants in the context of the traffic scene. Due to the compactness of the Bezier representations, a compact machine-learning model can be trained in a computationally efficient manner and the trained model can provide fast analyses. The machine-learning models described herein include any computer algorithms that improve automatically through experience and by the use of data. The machine-learning models can include supervised models, unsupervised models, semi-supervised models, self-supervised models, etc. Exemplary machine-learning models include but are not limited to: linear regression, logistic regression, decision tree, SVM, naive Bayes, neural networks, K-Means, random forest, dimensionality reduction algorithms, gradient boosting algorithms, etc.

[0121] In some embodiments, the system stores multiple hypotheses for the object as discussed above. Each hypothesis for the object includes a Bezier curve representation (e.g., the control points parameterizing the curve). Thus, the updated Bezier curve representation can be stored as one of many hypotheses associated with the object.

[0122] In some embodiments, the system evaluates the updated trajectory of the object to determine if the object is behaving in an abnormal manner. Because Bezier curves are smooth curves and represent typical driving trajectories (e.g., comfortable and smooth trajectories with limited jerk) well, deviations for actually measured object states can be used to detect anomalies in the behavior of other traffic participants. In some embodiments, the system determines a deviation between the expected curve (e.g., as obtained in block 206b) and the actual observed trajectory and compares the deviation to a predefined threshold. If the threshold is exceeded, the system can determine that the object is exhibiting abnormal behavior (e.g., reckless driving). Based on the detected anomalies, the system can generate a control signal (e.g., to stay away from an abnormal object).

[0123] While the techniques described with reference to process 200 are directed to using Bezier curves to represent trajectories of traffic participants other than the ego vehicle, the techniques can be applied to track the trajectory of the ego vehicle itself using a Bezier curve representation. Further, while the techniques described with reference to process 200 involve the use of Bezier curves, it should be understood that the Bezier curve representations can be replaced by any linear combination of basis functions (e.g., first basis functions, second basis functions, third basis functions, etc.). In some embodiments, machine learning models such as neural networks or Gaussian Processes may be used to calculate the basis functions.

[0124] The operations described herein are optionally implemented by components depicted in FIG. 4. FIG. 4 illustrates an example of a computing device in accordance with one embodiment. Device 400 can be a host computer connected to a network. Device 400 can be a client computer or a server. As shown in FIG. 4, device 400 can be any suitable type of microprocessor-based device, such as a personal computer, workstation, server or handheld computing device (portable electronic device) such as a phone or tablet. The device can include, for example, one or more of processor 410, input device 420, output device 430, storage 440, and communication device 460. Input device 420 and output device 430 can generally correspond to those described above, and can either be connectable or integrated with the computer.

[0125] Input device 420 can be any suitable device that provides input, such as a touch screen, keyboard or keypad, mouse, or voice-recognition device. Output device 430 can be any suitable device that provides output, such as a touch screen, haptics device, or speaker.

[0126] Storage 440 can be any suitable device that provides storage, such as an electrical, magnetic or optical memory including a RAM, cache, hard drive, or removable storage disk. Communication device 460 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or device. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly.

[0127] Software 450, which can be stored in storage 440 and executed by processor 410, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the devices as described above).

[0128] Software 450 can also be stored and/or transported within any non-transitory computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 440, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.

[0129] Software 450 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch instructions associated with the software from the instruction execution system, apparatus, or device and execute the instructions. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic or infrared wired or wireless propagation medium.

[0130] Device 400 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.

[0131] Device 400 can implement any operating system suitable for operating on the network. Software 450 can be written in any suitable programming language, such as C, C++, Java or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.

[0132] Although the disclosure and examples have been fully described with reference to the accompanying figures, it is to be noted that various changes and modifications will become apparent to those skilled in the art. Such changes and modifications are to be understood as being included within the scope of the disclosure and examples as defined by the claims.

[0133] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the techniques and their practical applications. Others skilled in the art are thereby enabled to best utilize the techniques and various embodiments with various modifications as are suited to the particular use contemplated.

COMPUTATIONALLY EFFICIENT TRAJECTORY REPRESENTATION FOR TRAFFIC PARTICIPANTS

Assignee

Inventors

Cpc classification

Classification Explorer

B60W60/00276

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

B60W30/0956

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

B60W2554/4045

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G08G1/167

PHYSICS

Classification Explorer

G08G1/166

PHYSICS

Classification Explorer

B60W60/0011

PERFORMING OPERATIONS; TRANSPORTING

International classification

Classification Explorer

B60W30/095

PERFORMING OPERATIONS; TRANSPORTING

Abstract

Claims

Description