Method and system for robot action imitation learning in three-dimensional space
11529733 · 2022-12-20
Assignee
Inventors
Cpc classification
B25J9/1607
PERFORMING OPERATIONS; TRANSPORTING
B25J13/088
PERFORMING OPERATIONS; TRANSPORTING
B25J9/161
PERFORMING OPERATIONS; TRANSPORTING
B25J9/1664
PERFORMING OPERATIONS; TRANSPORTING
G05B19/42
PHYSICS
B25J9/1653
PERFORMING OPERATIONS; TRANSPORTING
International classification
Abstract
The present invention provides a method for robot action imitation learning in a three-dimensional space and a system thereof, relates to the technical fields of artificial intelligence and robots. A method based on a series-parallel multi-layer backpropagation (BP) neural network is designed for robot action imitation learning in a three-dimensional space, which applies an imitation learning mechanism to a robot learning system, under the framework of the imitation learning mechanism, to train and learn by transmitting demonstrative information generated from a mechanical arm to the series-parallel multi-layer BP neural network representing a motion strategy. The correspondence between a state characteristic matrix set of the motion and an action characteristic matrix set of the motion is learned, to reproduce the demonstrative action, and generalize the actions and behaviors, so that when facing different tasks, the method does not need to carry out action planning separately, thereby achieving high intelligence.
Claims
1. A method for robot action imitation learning in a three-dimensional space, comprising: dividing a three-dimensional space based on a workspace requirement of a robot mechanical arm; obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration; setting up a mechanical arm system state space model based on the mechanical arm joint angle state information and end position state information; obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the mechanical arm system state space model and characteristic information; obtaining a series-parallel multi-layer backpropagation (BP) neural network model for action imitation learning based on the state characteristic matrix set of mechanical arm motion, the action characteristic matrix set of mechanical arm motion and a pre-built series-parallel multi-layer BP neural network model; obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task, and obtaining an action characteristic matrix set of mechanical arm motion for the real-time target task based on the series-parallel multi-layer BP neural network model and the state characteristic matrix set of mechanical arm motion for the real-time target task, so that the mechanical arm executes an action defined by the action characteristic matrix set of mechanical arm motion for the real-time target task through joint rotation.
2. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the characteristic information comprises: mechanical arm system state characteristic information, and task-related environment information; the mechanical arm system state characteristic information comprises: angle state information of each current joint angle of the mechanical arm, and position state information of an end point of the mechanical arm; the task-related environment information comprises: position state information of a demonstrative-task target point in a spatial coordinate system.
3. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the method further comprises, before the obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion: constructing a characteristic matrix set with selected characteristic parameters from the task-related characteristic information, and preprocessing the characteristic matrix set.
4. The method for robot action imitation learning in a three-dimensional space according to claim 3, wherein the task-related characteristic information comprises: mechanical arm system state characteristic information and environment state characteristic information; the mechanical arm system state characteristic information comprises: angle state characteristic θ.sub.ij of each current joint angle of the mechanical arm, and position state characteristic S.sub.k of the end point of the mechanical arm; the environment state characteristic information comprises: position state characteristic G.sub.k of a demonstrative-task target point, where G.sub.k denotes the coordinate of the demonstrative-task target point G in direction k.
5. The method for robot action imitation learning in a three-dimensional space according to claim 4, wherein the preprocessing includes: calculating an action characteristic parameter of the mechanical arm according to the equation below:
α.sub.ijt=θ.sub.ijt+1−θ.sub.ijt where θ.sub.ijt+1 is the angle of a joint angle i in the direction j at time t+1, θ.sub.ijt is the angle of the joint angle i in the direction j at time t, α.sub.ijt is the angle that the joint angle i rotates in the direction j at time t; calculating a state characteristic parameter of the mechanical arm relative to a target point according to the equation below:
D.sub.kt=S.sub.kt−G.sub.kt where S.sub.kt is the coordinate of the end point S of the mechanical arm in the direction k at time t, G.sub.kt is the coordinate of the target point G in the direction k at time t, D.sub.kt is the relative displacement of the end point S of the mechanical arm with respect to the target point G in the direction k at time t.
6. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the pre-built series-parallel multi-layer BP neural network model comprises an input layer, a hidden layer and an output layer; the input layer contains S nodes, the output layer contains T nodes and the hidden layer contains C nodes; the number of hidden-layer nodes is calculated according to the equation below:
C=┌√{square root over (S+T)}+a┐ where a is an adjustment constant, 0≤a≤10; and ┌.Math.┐ is a round-down function.
7. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the obtaining a series-parallel multi-layer BP neural network model for action imitation learning comprises: S401. obtaining input state samples X=[X.sub.1, X.sub.2, X.sub.n, . . . , X.sub.N] and expected output action samples Y=[Y.sub.1, Y.sub.2, . . . Y.sub.n, . . . , Y.sub.N] at a corresponding time from the state characteristic matrix set and the action characteristic matrix set, where X is an nth input sample, and the nth input sample X.sub.n matrix contains P joint angle state parameters and Q relative displacement state parameters, totaling L parameters, denoted as X.sub.n=[x.sub.1.sup.nx.sub.2.sup.n, . . . , x.sub.l.sup.n, . . . , x.sub.L.sup.n].sup.T, where x.sub.l.sup.n is an lth input parameter of the nth input sample; Y.sub.n is the expected output action result sample for the nth input sample X.sub.n, Y.sub.n contains P joint angle of rotation action result parameters, denoted as Y.sub.n, [y.sub.1.sup.n, y.sub.2.sup.n, L, y.sub.p.sup.n, L, y.sub.P.sup.n].sup.T, where Y.sub.p.sup.n is a pth expected output result parameter for the nth sample, 1≤n≤N; S402. randomly selecting k input samples (X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.k) from the samples, as the input samples for an mth training, where m is the number of trainings and is initialized as m=.sup.1; S403. obtaining an output value u.sub.p.sup.k of a pth node in the hidden layer of a kth training sample in the mth training according to the equation below, to obtain a hidden-layer node output value matrix u.sub.k=[u.sub.1.sup.k, u.sub.2.sup.k, . . . , u.sub.p.sup.k, . . . , u.sub.S.sup.k] of the kth training sample in the mth training:
u.sub.p.sup.k=ω.sub.p.sup.m×X.sub.k.sup.T where ω.sub.p.sup.m is a link weight coefficient matrix in the mth training between the pth node in the hidden layer and S starting nodes in the input layer, ω.sub.p.sup.m=[ω.sub.p1.sup.m, ω.sub.p2.sup.m, . . . , ω.sub.pj.sup.m, . . . , ω.sub.pS.sup.m]; ω.sub.pj.sup.m denotes a link weight coefficient in the mth training between a jth input-layer node and the pth hidden-layer node; X.sub.k.sup.T is the transpose of the kth training sample; S404. calculating an output value y.sub.q.sup.k of a qth node in the output layer of the kth training sample in the mth training according to the equation blow, to obtain an output-layer node output value matrix y.sub.k=[y.sub.1.sup.k, y.sub.2.sup.k, . . . , y.sub.q.sup.k, . . . , y.sub.T.sup.k] of the kth training sample in the mth training:
y.sub.q.sup.k=a.sup.m+λ.sub.q.sup.m×u.sub.k.sup.T+φ.sub.q.sup.m×X.sub.k.sup.T where a.sup.m is a bias term matrix of q nodes in the output layer in the mth training, a.sup.m=[a.sub.1.sup.m, a.sub.2.sup.m, . . . , a.sub.q.sup.m, . . . , a.sub.T.sup.m].sup.T; λ.sub.q.sup.m is a link weight coefficient matrix in the mth training between the qth node in the output layer and C nodes in the hidden layer, λ.sub.q.sup.m=[λ.sub.q1.sup.m, λ.sub.q2.sup.m, . . . , λ.sub.qp.sup.m, . . . , λ.sub.qC.sup.m]; λ.sub.qp.sup.m denotes a link weight coefficient in the mth training between a pth hidden-layer node and the qth output-layer node; φ.sub.q.sup.m is a link weight coefficient matrix in the mth training between the qth node in the output layer and S nodes in the input layer, φ.sub.q.sup.m=[φ.sub.q1.sup.m, φ.sub.q2.sup.m, . . . , φ.sub.qj.sup.m, . . . , φ.sub.qS.sup.m]; φ.sub.qj.sup.m denotes a link weight coefficient in the mth training between a jth input-layer node and the qth output-layer node; X.sub.k.sup.T is the transpose of the kth training sample; S405. calculating a mean-square error (MSE) between output results of the output-layer nodes in the mth training and expected output results of an mth group of input samples (X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.k) according to the equation below:
C=┌√{square root over (S+T)}+a┐
u.sub.q.sup.k=ω.sub.pj*×X.sub.k.sup.T
y.sub.q.sup.k=a.sub.q*+λ.sub.qp*×u.sub.k.sup.T+φ.sub.qj*×X.sub.k.sup.T where C is the number of hidden-layer nodes, T is the number of output-layer nodes, S is the number of input-layer nodes; a is an adjustment constant, 0≤a≤10; ┌.Math.┐ is a round-down function; u.sub.p.sup.k is an output value of a pth node in the hidden layer; y.sub.q.sup.k is an output value of a qth node in the output layer; ω.sub.pj*, λ.sub.qp* and φ.sub.qj* are the optimal link weight coefficients; a.sub.q* is the bias term; X.sub.k.sup.T is the transpose of a kth training sample.
8. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task comprises: obtaining in real time mechanical arm current state information and environment state information, extracting a mechanical arm angle characteristic from the mechanical arm state information, and distance characteristic information of the end point of the mechanical arm to the target point in the environmental state information, to obtain the state characteristic matrix set of mechanical arm motion for a real-time target task.
9. The method for robot action imitation learning in a three-dimensional space according to claim 1, wherein the method further comprises, after the executing an action defined by the action characteristic matrix set: determine whether the mechanical arm completes the target task; the determination comprises: calculating a displacement between the end point of the mechanical arm and the target point in the current state, according to the equation below:
DT=√{square root over (D.sub.x.sup.2+D.sub.y.sup.2+D.sub.z.sup.2)} where D.sub.x, D.sub.y, D.sub.z are relative displacements of the end point of the mechanical arm with respect to the target point in the directions x, y and z, respectively; DT is the displacement between the end point of the mechanical arm and the target point in the current state; comparing DT with a predetermined error accuracy w, if DT≤ψ, determining the target task is completed; if DT>ψ, determining the task is not completed; when the task is not completed, re-obtaining a state characteristic matrix set of mechanical arm motion for the real-time target task.
10. A system for robot action imitation learning in a three-dimensional space, comprising a computer, the computer comprising: at least one storage unit; at least one processing unit; wherein the at least one storage unit is stored with at least one instruction, the at least one instruction is loadable and executable by the at least one processing unit to implement the steps of: dividing a three-dimensional space based on a workspace requirement of a robot mechanical arm; obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration; setting up a mechanical arm system state space model based on the mechanical arm joint angle state information and end position state information; obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the mechanical arm system state space model and characteristic information; obtaining a series-parallel multi-layer backpropagation (BP) neural network model for action imitation learning based on the state characteristic matrix set of mechanical arm motion, the action characteristic matrix set of mechanical arm motion and a pre-built series-parallel multi-layer BP neural network model; obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task, and obtaining an action characteristic matrix set of mechanical arm motion for the real-time target task based on the series-parallel multi-layer BP neural network model and the state characteristic matrix set of mechanical arm motion for the real-time target task.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) In order to illustrate more clearly the embodiments of the present disclosure or the technical solutions in the prior art, accompanying drawings to be used in the description of the embodiments and the prior art will be briefly described hereinafter. As a matter of course, the drawings are only some examples of the embodiments of the present invention; those skilled in the art understand that other drawings can be obtained based on these drawings without inventive effort.
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF PARTICULAR EMBODIMENTS
(8) In order to make clearer the objects, technical solutions and advantages of the embodiments of the present disclosure, the technical solutions in the embodiments will be described clearly and fully. As a matter of course, the embodiments described herein are merely some examples of the present invention; any other embodiment obtained by those skilled in the art based on the embodiments of the present disclosure without inventive effort shall fall within the scope of protection of the present invention.
(9) The embodiments of the present disclosure provide a method for robot action imitation learning in a three-dimensional space and a system thereof, solving the technical problem in the prior art that robots are less intelligent in terms of actions imitation learning in three-dimensional space, and improving the intelligence of robots in actions imitation learning in three-dimensional space.
(10) In order to solve the above technical problem, the technical solutions in the embodiments of the present disclosure have a general idea as follows:
(11) A method based on a series-parallel multi-layer backpropagation (BP) neural network is designed for robot action imitation learning in a three-dimensional space, which applies an imitation learning mechanism to a robot learning system, under the framework of the imitation learning mechanism, to train and learn by transmitting demonstrative information generated from a mechanical arm to the series-parallel multi-layer BP neural network representing a motion strategy. The correspondence between a state characteristic matrix set of the motion and an action characteristic matrix set of the motion is learned, to reproduce the demonstrative action, and generalize the actions and behaviors, so that when facing different tasks, the method does not need to carry out action planning separately, thereby achieving high intelligence.
(12) In order to better understand the above technical solutions, the technical solutions will be described in detail in conjunction with the accompanying drawings and specific implementations.
(13) The present disclosure provides a method for robot action imitation learning in a three-dimensional space as shown in
(14) S1. dividing a three-dimensional space based on a workspace requirement of a robot mechanical arm; obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration;
(15) S2, setting up a mechanical arm system state space model based on the mechanical arm joint angle state information and end position state information;
(16) S3. obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the mechanical arm system state space model and characteristic information;
(17) S4, obtaining a series-parallel multi-layer backpropagation (BP) neural network model for action imitation learning based on the state characteristic matrix set of mechanical arm motion, the action characteristic matrix set of mechanical arm motion and a pre-built series-parallel multi-layer BP neural network model;
(18) S5. obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task, and obtaining an action characteristic matrix set of mechanical arm motion for the real-time target task based on the series-parallel multi-layer BP neural network model and the state characteristic matrix set of mechanical arm motion for the real-time target task, so that the mechanical arm executes an action defined by the action characteristic matrix set of mechanical arm motion for the real-time target task through joint rotation.
(19) The embodiment of the present disclosure applies an imitation learning mechanism to a robot learning system, under the framework of the imitation learning mechanism, to train and learn by transmitting demonstrative information generated from the mechanical arm to a series-parallel multi-layer BP neural network representing a motion strategy. The correspondence between a state characteristic matrix set of the motion and an action characteristic matrix set of the motion is learned, to reproduce the demonstrative action, and generalize the actions and behaviors, so that when facing different tasks, the embodiment does not need to carry out action planning separately, thereby achieving high intelligence.
(20) The steps are described in detail below:
(21) It should be noted that, in this embodiment of the present disclosure, information collection is by sensors, e.g., task-related information during demonstration can be collected by lidar sensors and infrared sensors on the mechanical arm.
(22) Step S1, dividing a three-dimensional space based on a workspace requirement of a robot mechanical arm and obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration, may include, specifically:
(23) S101. based on a specified reachable range of the mechanical arm, dividing an actual three-dimensional space into M regions in a direction that the robot faces, where a first motion joint, shoulder, is the coordinate origin, and an end point S of the mechanical arm is draggable within the M regions to demonstrate reach at a target point. It should be noted that, in this embodiment of the present disclosure, the three-dimensional space is divided into eight regions, namely I, II, III, IV, V, VI, VII and VIII, and the division is shown in
(24) S102. demonstrating by dragging the mechanical arm in the three-dimensional space, and obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration.
(25) In the embodiment of the present disclosure, the mechanical arm includes three joints and three connecting rods. The three joints include: shoulder (joint 1), elbow (joint 2) and wrist (joint 3); the three connecting rods include: connecting rod 1, connecting rod 2 and connecting rod 3.
(26) The characteristic information includes mechanical arm system state characteristic information and environment state characteristic information. Specifically, the mechanical arm system state characteristic information includes: angle state characteristic θ.sub.ij of each current joint angle of the mechanical arm, and position state characteristic S.sub.k of the end point of the mechanical arm. The environment state characteristic information includes: position state characteristic G.sub.k of a demonstrative-task target point, where G.sub.k denotes the coordinate of a point Gin direction k.
(27) It should be noted that, in the embodiment of the present disclosure, assuming the above three characteristics (θ.sub.ij, S.sub.k G.sub.k) have P, Q and R characteristic parameters, respectively, the characteristic matrix set v=[v.sub.1, v.sub.2, . . . , v.sub.s, . . . , v.sub.S] has P+Q+R characteristic parameters, where v.sub.s denotes an sth characteristic parameter, and the first P characteristic parameters in the characteristic matrix set v are the angle state characteristic parameters of the joint angles of the mechanical arm; the (P+1)th to (P+Q)th characteristic parameters are the mechanical arm end point position state characteristic parameters; the (P+Q+1)th to (P+Q+R)th characteristic parameters are the position state characteristic parameters of the target point, s=1, 2, . . . , S, S=P+Q+R. In the embodiment of the present disclosure, the angle state characteristic parameters of the joint angles of the mechanical arm include: the angles of the first joint angle in the directions x, y and z; the angles of the second joint angle in the directions x, y and z; and the angles of the third joint angle in the directions x, y and z, that is, P=3+3+3=9. The mechanical arm end point position state characteristic parameters include: the coordinates of the end point S of the mechanical arm in the directions x, y and z, that is, Q=3. The position state characteristic parameters of the target point include: the coordinates of the demonstrative-task target point Gin the directions x, y and z, that is, R=3.
(28) Step S2, setting up a mechanical arm system state space model based on the mechanical arm joint angle state information and end position state information, may include, specifically:
(29) Because tasks performed by a robot mechanical arm can be described directly and conveniently with the mechanical arm end position, and because the spatial position and motion state of a fixed-length mechanical arm are determined by the angle of rotation of the joint, the mechanical arm system state space model can be expressed with the mechanical arm joint angle state variables and the mechanical arm end point position state variables, in the following expression:
ξ.sub.t=(θ.sub.ij,S.sub.k),
(30) where
(31) θ.sub.ij denotes the angle of a joint angle i in the direction j;
(32) S.sub.k denotes the coordinate of a point S in the direction k.
(33) Step S3, obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the mechanical arm system state space model and characteristic information, may include, specifically:
(34) S301. constructing a characteristic matrix set with selected characteristic parameters from the task-related characteristic information, and preprocessing the characteristic matrix set. The preprocessing includes:
(35) calculating an action characteristic parameter of the mechanical arm according to Equation (1):
α.sub.ijt=θ.sub.ijt+1−θ.sub.ijt (1)
(36) where
(37) θ.sub.ijt+, is the angle of a joint angle i in the direction j at time t+1, θ.sub.ijt is the angle of the joint angle i in the direction j at time t, α.sub.ijt is the angle that the joint angle i rotates in the direction j at time t;
(38) calculating a state characteristic parameter of the mechanical arm relative to a target point according to Equation (2):
D.sub.kt=S.sub.kt−G.sub.kt (2)
(39) where
(40) S.sub.kt is the coordinate of the end point S of the mechanical arm in the direction k at time t, G.sub.kt is the coordinate of the target point G in the direction k at time t, D.sub.kt is the relative displacement of the end point S of the mechanical arm with respect to the target point G in the direction k at time t.
(41) S302. obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the preprocessed characteristic matrix set. Specifically,
(42) Task-related action characteristics of mechanical arm motion are the angle of rotation characteristics of the mechanical arm joint angles, and hence the action characteristic matrix set of mechanical arm motion is α.sub.ijt=[α.sub.ij].
(43) Task-related state characteristics of mechanical arm motion include: mechanical arm joint angle state characteristics, and mechanical arm end-point-to-target-point displacement state characteristics, and the hence the state characteristic matrix set of mechanical arm motion is ζ.sub.t=[(θ.sub.ij, D.sub.k].sub.t.
(44) In the embodiment of the present disclosure, task-related action characteristic parameters of mechanical arm motion include: the angles that the first joint angle rotates in the directions x, y and z at time t; the angles that the second joint angle rotates in the directions x, y and z at time t; the angles that the third joint angle rotates in the directions x, y and z at time t. The action characteristic matrix set is expressed as:
α.sub.ijt=[α.sub.11,α.sub.12,α.sub.13,α.sub.21,α.sub.22,α.sub.23,α.sub.31,α.sub.32,α.sub.33].sub.t
(45) where
(46) α.sub.11, α.sub.12, α.sub.13 are action characteristic parameters of the first joint angle; α.sub.21, α.sub.22, α.sub.23 are action characteristic parameters of the second joint angle; α.sub.31, α.sub.32, α.sub.33 are action characteristic parameters of the third joint angle.
(47) Task-related state characteristic parameters of mechanical arm motion include: the angles of the first joint angle of the mechanical arm in the directions x, y and z at time t; the angles of the second joint angle of the mechanical arm in the directions x, y and z at time t; the angles of the third joint angle of the mechanical arm in the directions x, y and z at time t; the displacements of the end point of the mechanical arm with respect to the target point in the directions x, y and z at time t. The state characteristic matrix set is expressed as:
ζ.sub.t=[θ.sub.11,θ.sub.12,θ.sub.13,θ.sub.21,θ.sub.22,θ.sub.23,θ.sub.31,θ.sub.32,θ.sub.33,D.sub.x,D.sub.y,D.sub.z].sub.t
(48) where
(49) θ.sub.11, θ.sub.12, θ.sub.13 are angle state characteristic parameters of the first joint angle; θ.sub.21, θ.sub.22, θ.sub.23 are angle state characteristic parameters of the second joint angle; θ.sub.31, θ.sub.32, θ.sub.33 are angle state characteristic parameters of the third joint angle; D.sub.x, D.sub.y, D.sub.z are relative displacement state characteristic parameters of the end point of the mechanical arm with respect to the target point.
(50) Step S4, obtaining a series-parallel multi-layer BP neural network model for action imitation learning by training a pre-built series-parallel multi-layer BP neural network model based on the state characteristic matrix set and action characteristic matrix set of mechanical arm motion, may include, specifically:
(51) In the embodiment of the present disclosure, the pre-built series-parallel multi-layer BP neural network model includes an input layer, a hidden layer and an output layer. Specifically, the input layer contains S nodes, the output layer contains T nodes and the hidden layer contains C nodes. The number C of hidden-layer nodes is calculated according to Equation (3):
C=┌√{square root over (S+T)}+a┐ (3)
(52) where
(53) T is the number of output-layer nodes, S is the number of input-layer nodes, a is an adjustment constant, 0≤a≤10, and ┌.Math.┐ is a round-down function.
(54) In the embodiment of the present disclosure, the mechanical arm has three joint angles and moves in a three-dimensional space, and there are nine (3×3) joint angle state characteristic parameters and three displacement state characteristic parameters of the end point of the mechanical arm with respect to the target point; therefore, the input layer contains S=12 nodes. The three joint angles of the mechanical arm move in a three-dimensional space, and there are nine (3×3) action characteristic parameters of the mechanical arm; therefore, the output layer contains T=9 nodes.
(55) In practice, the adjustment constant may be determined so that the number of hidden-layer nodes is between the number of input-layer nodes and the number of output-layer nodes. In an embodiment of the present disclosure, a=6. Because the number of input-layer nodes is 12 and the number of output-layer nodes is 9, the number of hidden-layer nodes is calculated to be 10.
(56) S401. obtaining input state samples X=[X.sub.1, X.sub.2, . . . X.sub.n, . . . , X.sub.N] and expected output action samples Y=[Y.sub.1, Y.sub.2, . . . Y.sub.n, . . . , Y.sub.N] at a corresponding time from the state characteristic matrix set and the action characteristic matrix set, where X is an nth input sample, and the nth input sample X.sub.n matrix contains P joint angle state parameters and Q relative displacement state parameters, totaling L parameters, denoted as X.sub.n=[x.sub.1.sup.n, x.sub.2.sup.n, . . . , x.sub.l.sup.n, . . . , x.sub.L.sup.n].sup.T, where x.sub.l.sup.n is an lth input parameter of the nth input sample; Y.sub.n is the expected output action result sample for the nth input sample X.sub.n, Y.sub.n contains P joint angle of rotation action result parameters, denoted as Y.sub.n=[y.sub.1.sup.n, y.sub.2.sup.n, L, y.sub.p.sup.n, L, y.sub.P.sup.n].sup.T, where Y.sub.p.sup.n is a pth expected output result parameter for the nth sample, 1≤n≤N.
(57) In practice, a standard number of samples may be selected from each of the eight spatial regions (I, II, III, IV, V, VI, VII and VIII), the sample size for each region should ensure effective training while being kept as little as possible.
(58) S402. randomly selecting k input samples (X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.k) from the samples, as the input samples for an mth training, where m is the number of trainings and is initialized as m=1;
(59) S403. obtaining an output value up of a pth node in the hidden layer of a kth training sample in the mth training according to Equation (4), to obtain a hidden-layer node output value matrix u.sub.k=[u.sub.1.sup.k, u.sub.2.sup.k, . . . , u.sub.p.sup.k, . . . , u.sub.S.sup.k] of the kth training sample in the mth training:
u.sub.p.sup.k=ω.sub.p.sup.m×X.sub.k.sup.T (4)
(60) where
(61) ω.sub.p.sup.m is a link weight coefficient matrix in the mth training between the pth node in the hidden layer and S starting nodes in the input layer, ω.sub.p.sup.m=[ω.sub.p1.sup.m, ω.sub.p2.sup.m, . . . , ω.sub.pj.sup.m, . . . , ω.sub.pS.sup.m]; ω.sub.pj.sup.m denotes a link weight coefficient in the mth training between a jth input-layer node and the pth hidden-layer node; X.sub.k.sup.T is the transpose of the kth training sample.
(62) S404. calculating an output value y.sub.q.sup.k of a qth node in the output layer of the kth training sample in the mth training according to Equation (5), to obtain an output-layer node output value matrix y.sub.k=[y.sub.1.sup.k, y.sub.2.sup.k, . . . , y.sub.q.sup.k, . . . , y.sub.T.sup.k] of the kth training sample in the mth training:
y.sub.q.sup.k=a.sup.m+λ.sub.q.sup.m×u.sub.k.sup.T+φ.sub.q.sup.m×X.sub.k.sup.T (5)
(63) where
(64) a.sup.m is a bias term matrix of q nodes in the output layer in the mth training, a.sup.m=[a.sub.1.sup.m, s.sub.2.sup.m, . . . , a.sub.q.sup.m, . . . , a.sub.T.sup.m].sup.T, λ.sub.q.sup.m is a link weight coefficient matrix in the mth training between the qth node in the output layer and C nodes in the hidden layer, λ.sub.q.sup.m=[λ.sub.q1.sup.m, λ.sub.q2.sup.m, . . . , λ.sub.qp.sup.m, . . . , λ.sub.qC.sup.m]; λ.sub.qp.sup.m denotes a link weight coefficient in the mth training between a pth hidden-layer node and the qth output-layer node; φ.sub.q.sup.m is a link weight coefficient matrix in the mth training between the qth node in the output layer and S nodes in the input layer, φ.sub.q.sup.m=[φ.sub.q1.sup.m, φ.sub.q2.sup.m, . . . , φ.sub.qj.sup.m, . . . , φ.sub.qS.sup.m]; φ.sub.aj.sup.m denotes a link weight coefficient in the mth training between a jth input-layer node and the qth output-layer node; X.sub.k.sup.T is the transpose of the kth training sample.
(65) S405. calculating a mean-square error (MSE) between output results of the output-layer nodes in the mth training and expected output results of an mth group of input samples (X.sub.1, X.sub.2, X.sub.3, . . . , X.sub.k) according to Equation (6):
(66)
(67) where
(68) C denotes the number of output nodes, k denotes the number of training samples, Y′.sub.i.sup.k denotes the actual output value, Y.sub.i.sup.k denotes the expected output.
(69) S406. If MSE.sub.m>ε, maintaining the link weight coefficients and bias terms, and executing step S407; if MSE.sub.m≤ε, executing step S408, where ε is a specified error accuracy, and ε=0.02 in the embodiment of the present disclosure.
(70) S407. correcting according to Equation (7), to obtain a corrected link weight coefficient ω′.sub.pj.sup.m between a pth hidden-layer node and a jth input-layer node in the mth a corrected link weight coefficient λ′.sub.qp.sup.m between a qth output-layer node and a pth hidden-layer node in the mth training, a corrected link weight coefficient φ′.sub.qj.sup.m between a qth output-layer node and a jth input-layer node in the mth training, a corrected bias term a′.sub.q.sup.m between a qth output-layer node and the hidden-layer nodes in the mth training:
(71)
(72) where
(73) α, β, γ and δ are learning strides, and ρ is a stride coefficient. In an embodiment of the present disclosure, the learning strides may be optimized so that the pre-built series-parallel multi-layer BP neural network model changes the iteration rate according to a change in the error when samples are given, so as to meet error requirements.
(74) The stride coefficient can be calculated according to Equation (8):
(75)
(76) It should be noted that the embodiment of the present disclosure adopts a modified series-parallel multi-layer neural network, where fixed learning coefficients have been modified into learning strides and a stride coefficient, and the stride coefficient changes as the error changes. The larger the error, the larger the stride coefficient; the smaller the error, the smaller the stride coefficient. As a result, the farther the iteration is from the optimization point, the faster the rate; the closer the iteration is to the optimization point, the slower the rate.
(77) S408, assigning ω′.sub.pj.sup.m to ω.sub.pj.sup.m; assigning λ′.sub.qp.sup.m to λA.sub.qp.sup.m; assigning φ′.sub.qj.sup.m to φ.sub.qj.sup.m; assigning a′.sub.q.sup.m to a.sub.q.sup.m;
(78) S409. assigning m+1 to m, and judging whether m>M, where M is a specified number of training iterations, if so, determining that a link weight coefficient ω.sub.pj.sup.m between a jth input-layer node and a pth hidden-layer node in the mth training, a bias term a.sub.q.sup.M of a qth output-layer node in the mth training, a link weight coefficient λ.sub.qp.sup.M between a pth hidden-layer node and a qth output-layer node in the mth training and a link weight coefficient φ.sub.qj.sup.M between a jth input-layer node and a qth output-layer node in the mth training are obtained, and determining them as the optimal link weight coefficients Ω.sub.pj*, λ.sub.qp* and φ.sub.aj* and bias term a.sub.q*; otherwise, returning to step S404.
(79) S410. substituting with the optimal link weight coefficients and bias term in Equation (4) and Equation (5), and combining Equation (3), to obtain the series-parallel multi-layer BP neural network model for action learning:
C=┌√{square root over (S+T)}+a┐
u.sub.p.sup.k=ω.sub.pj*×X.sub.k.sup.T
y.sub.q.sup.k=a.sub.q*+λ.sub.qp*×u.sub.k.sup.T+φ.sub.qj*×X.sub.k.sup.T
(80) where
(81) C is the number of hidden-layer nodes, T is the number of output-layer nodes, S is the number of input-layer nodes; a is an adjustment constant, 0≤a≤10; ┌.Math.┐ is a round-down function; u.sub.p.sup.k is an output value of a pth node in the hidden layer; y.sub.q.sup.k is an output value of a qth node in the output layer; ω.sub.pj* , λ.sub.qp* and φ.sub.qj* are the optimal link weight coefficients; a.sub.q* is the bias term; X.sub.k.sup.T is the transpose of a kth training sample.
(82) Step S5: obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task, and obtaining an action characteristic matrix set of mechanical arm motion for the real-time target task based on the series-parallel multi-layer BP neural network model and the state characteristic matrix set of mechanical arm motion for the real-time target task, so that the mechanical arm executes an action defined by the action characteristic matrix set of mechanical arm motion for the real-time target task through joint rotation.
(83) It should be noted that in practice, in order to determine whether the mechanical arm completes a target task, a completion degree parameter may be used. In an embodiment of the present disclosure, when the coordinates of a specified task target point O in the mechanical arm workspace are O (Ox, Oy, Oz), the completion degree parameter may be defined by the displacement between the end point of the mechanical arm and the target point in the current state. The completion degree parameter may be calculated according to the equation below:
DT=√{square root over (D.sub.x.sup.2+D.sub.y.sup.2+D.sub.z.sup.2)}
(84) where D.sub.x, D.sub.y, D.sub.z are relative displacements of the end point of the mechanical arm with respect to the target point in the directions x, y and z, respectively; DT is the completion degree parameter.
(85) S501, obtaining in real time mechanical arm current state information and environment state information, extracting a mechanical arm angle characteristic from the mechanical arm state information, and distance characteristic information of the end point of the mechanical arm to the target point in the environmental state information, inputting the extracted characteristic matrix sets into the trained series-parallel multi-layer BP neural network model to obtain a mechanical arm action for the current state, that is, the joint angle of rotation for the mechanical arm to execute the action defined by the action characteristic matrix set. The robot executes the action, and collects state information and environment information when the action is completed for extraction of characteristic information.
(86) It should be noted that, in an embodiment of the present disclosure, a sensor may collect joint angles of the mechanical arm and end coordinates of the actuator and transmit them to a robot control system controlled by a robot servo component. Target point coordinate information collected by a sensor may also be transmitted the same computer processor, which performs data processing to extract joint angle state characteristics and mechanical arm end-point-to-target-point displacement state characteristics to form a characteristic matrix set and input characteristic matrix set into the trained network, to obtain a mechanical arm joint angle of rotation action output for the current state. The mechanical arm executes the action, and collects current state information in real time and feeds it back to the robot control system. A specific process is shown in
(87) S502, determining whether the task is completed: if DT v, determining the task is completed; if DT>ψ, determining the task is not completed; when the task is not completed, returning to step S501, where ψ is a specified error accuracy, and in an embodiment of the present disclosure, ψ=0.1.
(88) It should be noted that, in the embodiment of the present disclosure, a series-parallel multi-layer BP neural network model is obtained through steps S1 to S4. Once the series-parallel multi-layer BP neural network model is obtained, the steps S1 to S4 are no longer necessary; in subsequent application process, only step S5 is needed.
(89) The present disclosure also provides a system for robot action imitation learning in a three-dimensional space, the system including a computer, and the computer including:
(90) at least one storage unit;
(91) at least one processing unit;
(92) wherein the at least one storage unit is stored with at least one instruction, the at least one instruction is loadable and executable by the at least one processing unit to implement the steps of:
(93) S1. dividing a three-dimensional space based on a workspace requirement of a robot mechanical arm; obtaining mechanical arm joint angle state information, end position state information and characteristic information in the three-dimensional space during demonstration;
(94) S2, setting up a mechanical arm system state space model based on the mechanical arm joint angle state information and end position state information;
(95) S3. obtaining a state characteristic matrix set and an action characteristic matrix set of mechanical arm motion based on the mechanical arm system state space model and characteristic information;
(96) S4, obtaining a series-parallel multi-layer backpropagation (BP) neural network model for action imitation learning based on the state characteristic matrix set of mechanical arm motion, the action characteristic matrix set of mechanical arm motion and a pre-built series-parallel multi-layer BP neural network model;
(97) S5. obtaining a state characteristic matrix set of mechanical arm motion for a real-time target task, and obtaining an action characteristic matrix set of mechanical arm motion for the real-time target task based on the series-parallel multi-layer BP neural network model and the state characteristic matrix set of mechanical arm motion for the real-time target task.
(98) Those skilled in the art understand that the system for robot action imitation learning in a three-dimensional space in this embodiment of the present disclosure corresponds to the above method for robot action imitation learning in a three-dimensional space; therefore, for explanations, examples and beneficial effects of the system, reference can be made to the corresponding contents in the above method, which are omitted here.
(99) In summary, compared with the prior art, the embodiments of the present disclosure have the following beneficial effects:
(100) 1. The embodiments of the present disclosure apply an imitation learning mechanism to a robot learning system, under the framework of the imitation learning mechanism, to train and learn by transmitting demonstrative information generated from a mechanical arm to a series-parallel multi-layer BP neural network representing a motion strategy. The correspondence between a state characteristic matrix set of the motion and an action characteristic matrix set of the motion is learned, to reproduce the demonstrative action, and generalize the actions and behaviors, so that when facing different tasks, the embodiments do not need to carry out action planning separately, thereby achieving high intelligence.
(101) 2. In an embodiment of the present disclosure, demonstration is done by dragging a humanoid three-joint multi-degree-of-freedom mechanical arm in the three-dimensional space; the spatially redundant multi-degree-of-freedom mechanical arm is more flexible and enables tasks in complex scenarios or of great difficulty, and more importantly, the joint motion agrees with human habits. Additionally, in an embodiment of the present disclosure, sensors are used to collect motion information data directly, which avoids the problem of low identification accuracy with using a camera to collect the motion information, and improves the accuracy of data collection.
(102) 3. In the method for robot action imitation learning in a three-dimensional space according to an embodiment of the present disclosure, various characteristics of the demonstrative information are collected and used to train a modified series-parallel multi-layer BP neural network, which optimizes the training process of the neural network and improves the generalization and speed while retaining the training accuracy compared with the traditional series-parallel multi-layer BP neural network. The embodiment can obtain mechanical arm motion information more quickly using sample information, and improve the autonomy of the motion.
(103) It should be noted that, in the specification, relational terms such as first and second are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between them. Moreover, the terms “include”, “contain” or any other variation thereof are intended to indicate a non-exclusive inclusion such that a process, method, object or device that “includes” a set of elements includes not only those elements, but also other elements that are not explicitly listed, or also includes an element that is inherent to such process, method, object or device. Without further limitation, an element that is defined with the phrase “including an . . . ” does not preclude the presence of other identical elements in the process, method, object or device including said element.
(104) The present invention has been described in detail with reference to the embodiments; however, the embodiments are for illustrative purposes only and shall not be construed as liming the scope of the invention. Those skilled in the art should understand that modification or equivalents can be made to the technical solutions described herein without deviation from the spirit and scope of the present invention.