Efficient teleoperation of mobile robots via online adaptation

Abstract

Described herein is a framework for efficient task-agnostic, user-independent adaptive teleoperation of mobile robots and remotely operated vehicles (ROV), including ground vehicles (including legged systems), aircraft, watercraft and spacecraft. The efficiency of a human operator is improved by minimizing the entropy of the control inputs, thereby minimizing operator energy and achieving higher performance in the form of smoother trajectories by concurrently estimating the user intent online and adaptively updating the action set available to the human operator.

Claims

1. A method comprising iterating, for each of a plurality of discrete time steps during a time period of operation of a robot, the steps of: receiving a current state of the robot; selecting a set of motion primitives that are dynamically feasible and safe for the robot to execute during the current time step, given the current state of the robot, wherein each motion primitive is a parameterized trajectory based on system dynamics of the robot; predicting a user intent of an operator of the robot based on one or more motion primitives executed by the robot during one or more previous time steps; selecting a subset of motion primitives comprising one or more motion primitives from the set of motion primitives based on the predicted user intent; receiving a current input from the operator; and selecting, as the motion primitive to be executed by the robot during the current time step, the motion primitive from the selected subset of motion primitives having a closest match to the current input from the operator.

2. The method of claim 1 wherein selecting the set of motion primitives comprises: determining a range of motion which is dynamically feasible given the current state of the robot; and discretizing the range of motion into the selected set of motion primitives.

3. The method of claim 2 wherein selecting a set of motion primitives further comprises: receiving one or more sensed environmental conditions; and limiting the range of motion to a safe range given the sensed environmental conditions.

4. The method of claim 1 wherein one or more sets of motion primitives are pre-computed and associated with a state of the robot, further comprising: selecting the set of motion primitives from the one or more pre-computed sets of motion primitives, the selected set of motion primitives being the pre-computed set of motion primitives having a closest match to the current state of the robot.

5. The method of claim 4 further comprising: receiving one or more sensed environmental conditions; and eliminating unsafe motion primitives from the selected set of pre-computed motion primitives, wherein the unsafe motion primitives are determined based on the sensed environmental conditions.

6. The method of claim 1 wherein the user intent is a probabilistic distribution, further comprising: selecting the subset of motion primitives by selecting motion primitives from the set of motion primitives which are closest to a center of the probabilistic distribution.

7. The method of claim 6 wherein the granularity of the selected subset of motion primitives is broader based on the user input having a greater range of change over a pre-determined number of iterations of the discrete steps of the method.

8. The method of claim 6 wherein the granularity of the selected subset of motion primitives is narrower based on the user input having a lesser range of change over a predetermined number of iterations of the discrete steps of the method.

9. The method of claim 6 wherein the probabilistic distribution represents a likelihood of selection of a particular motion primitive from the selected subset of motion primitives given one or more motion primitives executed by the robot during one or more previous time steps.

10. The method of claim 6 further comprising: estimating a reward function based on a trajectory generated from motion primitives executed by the robot during a predetermined number of previous time steps; assigning a weight to each motion primitive in the set of motion primitives based on the reward function; and selecting a predetermined number of the weighted motion primitives having the highest weights for inclusion in the selected subset of motion primitives.

11. A system comprising: a mobile robot; an interface to a control for providing operator inputs; a processor and memory onboard the robot, the memory containing software executing on the processor and iterating, during each of a plurality of discrete time steps during a time period of operation of the robot, the functions of: receiving a current state of the robot; selecting a set of motion primitives that are dynamically feasible and safe for the robot to execute for the current time step, given the current state of the robot, wherein each motion primitive is a parameterized trajectory based on system dynamics of the robot; predicting a user intent of an operator of the robot based on one or more motion primitives executed by the robot during one or more previous time steps; selecting a subset of motion primitives comprising one or more motion primitives from the set of motion primitives based on the predicted user intent; receiving a current input from the operator; and selecting, as the motion primitive to be executed by the robot for the current time step, the motion primitive from the selected subset of motion primitives having a closest match to the current input from the operator.

12. The system of claim 11 wherein, to select a set of motion primitives, the software performs the further functions of: determining a range of motion which is dynamically feasible given the current state of the robot; receiving one or more sensed environmental conditions; limiting the range of motions to a safe range given the sensed environmental conditions; and discretizing the limited range of motion into the selected set of motion primitives.

13. The system of claim 11 wherein one or more sets of motion primitives are pre-computed and associated with a state of the robot, the software performing the further functions of: selecting the set of motion primitives from the one or more pre-computed sets of motion primitives, the selected set of motion primitives being the pre-computed set of motion primitives having a closest match to the current state of the robot; receiving one or more sensed environmental conditions; and eliminating unsafe motion primitives from the selected set of motion primitives, the unsafe motion primitives determined based on the sensed environmental conditions.

14. The system of claim 11 wherein the user intent is a probabilistic distribution, further comprising: creating the selected subset of motion primitives by selecting motion primitives from the set of motion primitives which are closest to a center of the probabilistic distribution.

15. The system of claim 14 wherein the granularity of the selected subset of motion primitives is broader based on the user input having a greater range of change over a predetermined number of iterations of the discrete steps of the method.

16. The system of claim 14 wherein the granularity of the selected subset of motion primitives is narrower based on the user input having a lesser range of change over a predetermined number of iterations of the discrete steps of the method.

17. The system of claim 14 wherein the probabilistic distribution represents a likelihood of selection of a particular motion primitive from the selected subset of motion primitives given one or more motion primitives executed by the robot during one or more previous time steps.

18. The system of claim 14, the software performing the further functions of: estimating a reward function based on a trajectory generated from motion primitives executed by the robot during a predetermined number of previous time steps; assigning a weight to each motion primitive in the set of motion primitives based on the reward function; and selecting a predetermined number of the weighted motion primitives having the highest weights for inclusion in the selected subset of motion primitives.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a depiction of the subset of the motion primitive library available to the user for selection changing over time based on dynamically feasible motions given the current state of the robot and previously selected motion primitives.

(2) FIG. 2 is a high-level block diagram of one embodiment of the invention

(3) FIGS. 3A and 3B show examples of the MPLs for ground and air vehicles, respectively.

(4) FIG. 3C depicts the MPL selected at each time t with the initial condition matching that of the current robot state, thereby forming a smooth trajectory.

(5) FIG. 4 shows the algorithm for selecting the allowed subset of motion primitives

(6) FIG. 5 shows motion distributions over times showing the prediction of the motion primitive becomes more peaked at the mean of the predicted motion primitive.

DETAILED DESCRIPTION OF THE INVENTION

(7) Traditional control methodologies of mobile robot teleoperation compute the desired motion reference with respect to a velocity reference provided by an external input device (e.g. a joystick, a steering wheel, etc.). An alternate strategy is to represent the action space as a set of pre-computed motion primitives, called a motion primitive library (MPL). Instead of employing the external inputs as velocity references, the external input becomes a selector function that corresponds to a specific motion primitive at each instance in time. Motion primitives can be computed in advance and designed to ensure dynamic feasibility and vehicle stability assuming nominal operating conditions.

(8) FIG. 2 shows a high-level block diagram of one embodiment of the invention. A user 200 provides input to the robot using a controller, for example, a joystick input 202. A collection of allowed motion primitives 204 is supplied to input selector 206. Input selector 206 selects a motion primitive from motion primitive collection 204 based on a closest match between the parameters of a motion primitive and the input provided by user 200 via controller 202. Selected motion primitive 208 is sent to the mobile robot to be executed by the robot as an actual motion. User 200 may be assisted in providing input via the controller 202 based on visual feedback 212. The current state of the robot 214 is provided and used to select a motion primitive library 216 containing collection of motion primitives. The motion primitives contained in motion primitive library 216 contain a collection of motion primitives that are both dynamically feasible and safe for robot 210 to execute. The motion library is sampled at 218, with the sampling centered around a determined user intent. An estimated reward function 220 based upon a previous pre-determined number of selected motion primitives is used to provide weights for the motion primitives in motion primitive library 216. Based on the weighting, the subset of allowed motion primitives 204 is selected and sent to input selector 206 for selection in the next time period. In an alternative embodiment, the motion primitive library 216 may be calculated dynamically based upon the state of the robot 214 and the environment instead of selected from a pre-calculated collection of libraries based on the state of the robot 214.

(9) An action can be defined as a={a.sub.1, . . . , a.sub.q} for q input dimensions. A corresponding motion primitive generated using an action may be denoted by γ(α). An MPL, Γ={γ.sub.i(α.sub.i)}, i=1, . . . , N, is generated via an action set of size N, {α.sub.i}, i=1, . . . , N. The set of MPL can be further defined to be a motion primitive library collection, denoted by {Γ.sub.j}, j=1, . . . , M.

(10) A parameterized action formulation for ground and air vehicles is chosen based on forward-arc motion primitives that propagate the dynamics of a unicycle model with a constant linear, angular, and vertical velocity for a specified amount of time, T. The motion primitives are given by the solutions to the unicycle model:

(11) $\begin{matrix} x_{t + T} = x_{t} + [\begin{matrix} \frac{v_{xt}}{ω_{t}} (\sin (ω_{t} T + θ_{t}) - \sin (θ_{t})) \\ \frac{v_{xt}}{ω_{t}} (\cos (θ_{t}) - \cos (ω_{t} T + θ_{t})) \\ v_{zt} T \\ ω_{t} T \end{matrix}] & (1) \end{matrix}$
where x.sub.t=[x.sub.t, y.sub.t, z.sub.t, θ.sub.t].sup.T represents the pose of the vehicle at time t, and ν.sub.xt, ν.sub.zt and ω.sub.t are the linear and angular velocities of the vehicle at time tin the body frame, respectively. Hence, the action space of the vehicle is given by uniformly dense sets: ν.sub.x={ν.sub.xi}, i=1, . . . , N.sub.xv, Ω={ω.sub.j}, j=1, . . . , N.sub.ω, ν.sub.z={v.sub.zk}, k=1, . . . , N.sub.vz.

(12) A forward-arc motion primitive at each time t is represented by γ.sub.t={α.sub.t, T}, with α.sub.t={ν.sub.xt, ν.sub.zt, ω.sub.t}. For ground vehicles, the heading of the vehicle is fixed to the yaw of the vehicle. Although aerial platforms such as quadrotors can independently control heading from yaw, the use of a unicycle model is maintained by ensuring that the heading is equivalent to the yaw of the vehicle, as humans naturally optimize for curved trajectories in robot control.

(13) To ensure continuity in the selection of motion primitives such that the trajectory followed by the vehicle is smooth, an MPL collection is generated based on a set of finely discretized initial states in the higher derivatives. The MPL in the collection to which the initial condition is sufficiently close is chosen to be the motion library at each time t. An example of the forward-arc MPL is shown in FIGS. 3A and 3B for ground and air vehicles, respectively.

(14) FIG. 3C depicts the MPL selected at each time t with the initial condition matching that of the current robot state. The selected motion primitive at each time t forms a smooth trajectory.

(15) Operator behavior can be represented by modeling the underlying intent of the operator as an optimal controller. It may be assumed that the operator is a rational, single intent agent with smooth transitions in movements. Specifically, the operator inherently optimizes a reward function, but the choices of actions at each time step poorly reflect this function. Because the operator is assumed to achieve single intent tasks in this scenario, the reward function is assumed to not depend on time. At each input time t, the operator issues action a which is in some neighborhood of the optimal action a*:

(16) $\begin{matrix} a^{*} = \underset{a}{argmax} R_{t} (γ (a)) \approx \underset{a}{argmax} {.Math.}_{i}^{P} α_{t}^{i} ϕ_{t}^{i} (γ (a)) & (2) \end{matrix}$

(17) where ϕ.sup.i's are basis functions that describe intrinsic natural human or robot behavior that operators may seek to optimize, and α.sup.i's are their corresponding weights for P basis functions. The reward function can be described with linear basis terms, however other representations can also be used. The underlying reward function {circumflex over (R)}.sub.t=Σ.sub.i.sup.p{circumflex over (α)}.sup.iϕ.sub.t.sup.i can be inferred from a series of noisy user inputs {α.sub.t−m, α.sub.t−m+1, . . . , α.sub.t−1}.

(18) Using this model, the inference over the user behavior becomes the solution to the problem:

(19) $\begin{matrix} \begin{matrix} {\hat{γ}}_{t + 1} = \underset{γ_{t + 1} (a_{t + 1})}{argmax} R_{t} (γ_{t - m : t} (a_{t - m : t}), γ_{t + 1} (a_{t + 1})) \\ = \underset{γ_{t + 1}}{argmax} {.Math.}_{i}^{P} α_{t}^{i} ϕ_{t}^{i} (γ_{t - m : t}, α_{t + 1}) \end{matrix} & (3) \end{matrix}$
where γ.sub.t−m:t={γ.sub.t−m, γ.sub.t−m+1, γ.sub.t} represents a trajectory from the past m motion primitives, and γ.sub.t+1∈Γ.sub.t+1.

(20) As user inputs are directly mapped to motion primitives, actions and motion primitives γ.sub.t are equivalent for some fixed duration T. This formulation removes the dependence of trajectories on the continuous input space, thus allowing inference to be made over motion primitives.

(21) The behavior recognition and prediction update for a user is computed based on a distribution over the set of all motion primitives, given the traveled trajectory and an estimate of the reward function. For generality, it is assumed that the prediction window can be selected to accommodate temporal basis functions if needed. The probability of a motion primitive being selected given the past m motion primitives is given by:

(22) $\begin{matrix} \begin{matrix} p (γ_{t + 1} .Math. γ_{t - m : t}, {\hat{R}}_{t}) = \frac{p ({\hat{R}}_{t} .Math. γ_{t - m : t}, γ_{t + 1}) p (γ_{t + 1} .Math. γ_{t - m : t})}{p ({\hat{R}}_{t} .Math. γ_{t - m : t})} \\ = η p ({\hat{R}}_{t} .Math. γ_{t - m : t}, γ_{t + 1}) p (γ_{t + 1} .Math. γ_{t - m : t}) \end{matrix} & (4) \end{matrix}$
where p({circumflex over (R)}|γ.sub.t−m:t, γ.sub.t+1) is a distribution over the estimated reward function of the user and η is a normalization weight.

(23) To infer the true underlying reward function from a past window of m motion primitives, a belief distribution of the reward, p({circumflex over (R)}.sub.t|γ.sub.t−m:t, γ.sub.t+1), given the set of motion primitives, is constructed via an online function approximation to estimate the reward function {circumflex over (R)}.sub.t(γ.sub.t−m:t, γ.sub.t+1)=Σ.sub.i.sup.p{circumflex over (α)}.sup.iϕ.sub.t.sup.i(γ.sub.t−m:t, γ.sub.t+1). This function is estimated using Locally Weighted Projection Regression (LWPR), a computationally efficient online method for local approximations or high-dimensional nonlinear functions. The incremental algorithm performs global function approximation by taking a weighted sum of the local regressors that influence the region. Note that this formulation of operator intent is task independent unless the basis functions incorporate environment or task information.

(24) The regression over the reward bases is defined with respect to a linear global reward function that is estimated using LWPR:

(25) $\begin{matrix} \begin{matrix} {\hat{R}}_{t} = {.Math.}_{i}^{P} {\hat{α}}^{i} ϕ_{t}^{i} = \frac{1}{{.Math.}_{j}^{Q} d_{j}} {.Math.}_{j}^{Q} d_{j} {\hat{y}}_{j} \\ = {\hat{α}}^{T} Φ_{t} = \frac{1}{{.Math.}_{j}^{Q} d_{j}} {.Math.}_{j}^{Q} d_{j} (β_{j}^{T} Φ_{j}) \end{matrix} & (5) \end{matrix}$

(26) where ŷ.sub.j=β.sub.j.sup.TΦ.sub.j are the individual receptive fields used in LWPR, and d.sub.j is the measure of locality of the j.sup.th receptive field, out or a total of Q receptive fields.

(27) The online estimation of the user behavior provides insight into the prediction of the user behavior based on hindsight, which can be constructively utilized to aid the choice of selection in the control of the robot. Using the update Eq. (4), the framework infers an empirical distribution over the uniform, dense set of all motion primitives. The model prior is iteratively adapted following Eq. 4. At each time step, the prior reflects the distribution of the likelihood of the user selecting the next motion primitive that maximizes their intent function based on the estimate of the reward function at the previous time step.

(28) The subset of available motion primitives can be adaptively modified from an underlying, dense uniform discretization, such that the density of the subsampling reflects the reward function distribution, p({circumflex over (R)}|γ.sub.t−m:t, γ.sub.t+1). Adaptation of the allowable set of motion primitives provides fine-grained control of the action to be taken by the robot and expels inputs misaligned with the user's underlying intent. By construction, selected motions from the subsampled set closely advance the user's underlying intention and circumvent misaligned motions to the user's interest. The key assumption here is that the human operator follows a satisficing property, i.e. the user tends to converge and operate near a small set of actions that are within the region of interest, unless the region of interest changes. Sampling from a dense, underlying set of motion primitives with respect to a belief distribution that reflects the operator's predicted intent allows the user with full control by providing finer precision control near the region of interest while eliminating noisy inputs. The algorithm for selecting the subset of available motion primitives is shown in FIG. 4.

(29) Let the weight of the n.sup.th motion primitive be ω.sub.n=p({circumflex over (R)}|γ.sub.t−m:t,γ.sub.t+1). Given a motion primitive library Γ of size N, we sample K motion primitives using the weights {ω.sub.n}, n=1, . . . , N with replacement such that we obtain a subsampled set
Γ={γ.sup.k}.Math.Γ, k=1, . . . , K

(30) The choice of motion primitive is limited to this set. Using the selector function shown in Eq. 6, the motion primitive with the closest parameterization of the actual joystick value α.sub.joy is selected.

(31) $\begin{matrix} γ_{selected} = γ {\underset{a}{\arg \min} a - a_{joy}} \in \overline{Γ} & (6) \end{matrix}$

(32) A visualization of this algorithm is provided in FIG. 5. Motion primitives and distribution over the motion primitives is shown for selected times for an aerial robot along a racetrack at t=0, t=5 and t=150. The prediction becomes more peaked near the mean of the predicted motion primitive.

(33) Joystick Steering Entropy is a direct, on line method to efficiently evaluate the workload performance of an operator given continuous inputs, without asking the operator to deviate their focus. Steering entropy quantifies the smoothness of the operator's actions directly from past inputs such that only hindsight information is used. Steering entropy is used herein to measure the efficiency of the operator for each trial. Given a continuous input u∈ custom character , the input error is derived from the difference between a second order Taylor approximation of the input at time t with the actual input provided by the user:
e.sub.t=u.sub.t−û.sub.t (7)
where
û.sub.t=u.sub.t−1(u.sub.t−1−u.sub.t−2)+½((u.sub.t−1−u.sub.t−2)−(u.sub.t−2−u.sub.t−3)) (8)

(34) A frequency distribution of the error is then constructed and divided into 9 bins. The total steering entropy, Hp, for each run is given by:
Hp=Σ.sub.i−P.sub.i log.sub.9 P.sub.i (9)

(35) A slight modification to the algorithm is made by padding the proportion of each bin by e to avoid asymptotes:

(36) $\begin{matrix} P_{i} = \frac{n_{i}}{\underset{i}{.Math.} n_{i}} + e, i, .Math., 9 & (10) \end{matrix}$

(37) As efficiency increases, the steering entropy decreases accordingly.

(38) A task-independent adaptive teleoperation methodology that seeks to improve operator performance and efficiency by concurrently modeling user intent and adapting the set of available actions according to the predicted intent has been set forth herein. The framework may be used by any human-controlled mobile robot and the invention is not meant to be limited by specific uses, but instead, the scope of the invention is set forth in the following claims.

(39) The invention has certain advantages in situations having a high latency between the operator and the robot. In one embodiment, the underlying mathematical form from which high-frequency control references are generated locally are transmitted to the robot rather than sending the high-frequency control references directly. As such, sensitivity to latency is greatly decreased as the high-frequency control references typically susceptible to round-trip latency performance and stability degradation are replaced with primitive objects that are less effected by latency. In another embodiment, recommendations are sent to the robot of how it should move in response to remote operator guidance in a mathematical form that the robot can transform into local high frequency references. Should the reference be delayed or lost, a prior motion plan is already operational and being executed in a stable manner. The lower-level references that would be more susceptible to latency are not sent, but instead the system on the robot is trusted to make its own local decisions to ensure stability.

(40) In preferred embodiments, the system implementing the method of the invention will be present on board the mobile robot and will consist of, for example, a processor running software to perform the steps of the method. The method may share memory and processor with, and may be implemented as part of, the control system for the robot. The method will preferably include a communications channel to the control software of the robot enabling the method to receive information regarding the current state of the robot as well as input from environmental sensors. In preferred embodiments, the robot may be teleoperated via a wireless connection by a user having a means for providing input controls used for determining the selected motion primitive, for example, a joystick or steering wheel.

Efficient teleoperation of mobile robots via online adaptation

Assignee

Inventors

Cpc classification

Classification Explorer

G05B2219/40174

PHYSICS

Classification Explorer

G05B2219/40

PHYSICS

Classification Explorer

G05D1/0016

PHYSICS

Classification Explorer

G05D3/00

PHYSICS

Classification Explorer

G05D1/0038

PHYSICS

International classification

Classification Explorer

G05D1/00

PHYSICS

Classification Explorer

G05D3/00

PHYSICS

Abstract

Claims

Description