METHOD AND DEVICE FOR COLLABORATIVE SERVO CONTROL OF MOTION VISION OF ROBOT IN UNCALIBRATED AGRICULTURAL SCENE

Abstract

A device and method for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene is provided. The device includes a robot arm, a to-be-gripped target object, an image sensor and a control module. An end of a robot arm is provided with a mechanical gripper, and a to-be-gripped target object is within a grip range of the robot arm. A control module drives the mechanical gripper to grip the to-be-gripped target object, and controls an image sensor to perform image sampling on a process of gripping the to-be-gripped target object by the robot arm. The image sensor sends sampled image data to the control module. The device does not need to perform precise spatial calibration on the to-be-gripped target object and the related environment in the scene. The robot arm is guided to complete the gripping task according to trained networks.

Claims

1. A device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene, comprising: a robot arm, a to-be-gripped target object, an image sensor and a control module, wherein an end of the robot arm is provided with a mechanical gripper; the to-be-gripped target object is within a grip range of the robot arm; the control module is electrically connected to the robot arm and the image sensor, respectively; the control module drives the mechanical gripper to grip the to-be-gripped target object, and controls the image sensor to perform image sampling on a process of gripping the to-be-gripped target object by the robot arm; and the image sensor sends sampled image data to the control module.

2. The device according to claim 1, wherein the robot arm is a six-degree-of-freedom robot arm.

3. A method for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene, comprising: constructing a scene space feature vector acquisition network, and acquiring a scene space feature vector; acquiring a demonstrated action sample; constructing an inverse reinforcement reward policy network; subjecting the inverse reinforcement reward policy network to a transfer training; and acquiring, based on a visual feature extraction network and the inverse reinforcement reward policy network, a forward-guided programming result by using a guided policy search (GPS) algorithm.

4. The method according to claim 3, wherein the scene space feature vector acquisition network is a vision-based convolutional neural network.

5. The method according to claim 3, wherein the step of acquiring the scene space feature vector comprises: performing, by an image sensor, image sampling on a process of gripping a to-be-gripped target object by a robot arm, and extracting red, green and blue (RGB) image information; and inputting the RGB image information into the scene space feature vector acquisition network to output the scene space feature vector.

6. The method according to claim 3, wherein the step of acquiring the demonstrated action sample comprises: pulling a robot arm to complete gripping a to-be-gripped target object, and acquiring demonstrated gripping action data of a single demonstrated gripping; driving the robot arm to simulate the demonstrated gripping action data and autonomously complete an action of gripping the to-be-gripped target object, and acquiring image feature data of a demonstrated gripping scene through shooting; and integrating the demonstrated gripping action data and the image feature data of the demonstrated gripping scene to obtain the demonstrated action sample.

7. The method according to claim 3, wherein the step of constructing the inverse reinforcement reward policy network comprises: constructing the inverse reinforcement reward policy network for fitting and representing a reward; generating a simulation parameter through a simulation domain randomization algorithm; programming and simulating a virtual gripping action by using a robot operating system (ROS) programming library, and obtaining a simulated gripping path through sampling; and subjecting the inverse reinforcement reward policy network to a simulation pre-training.

8. The method according to claim 3, wherein the transfer training of the inverse reinforcement reward policy network comprises: performing optimization training on the inverse reinforcement reward policy network by using the demonstrated action sample.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0034] In order to describe the technical solutions in the embodiments of the present invention more clearly, the drawings required to describe the embodiments are briefly described below. Apparently, the drawings described below are only some embodiments of the present invention. Those skilled in the art may further obtain other drawings based on these drawings without creative efforts. In the drawings:

[0035] FIG. 1 is a view illustrating a structure of a device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene according to an embodiment of the present invention.

[0036] FIG. 2 is a view illustrating hardware connection of the device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene according to an embodiment of the present invention.

[0037] FIG. 3 is a view illustrating a software hierarchy of the device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene according to an embodiment of the present invention.

[0038] FIG. 4 is a flowchart of a method for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene according to an embodiment of the present invention.

[0039] FIG. 5 is a view illustrating a structure of a scene space feature vector acquisition network according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0040] The method and device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene provided by the present invention are described in detail below with reference to the drawings. The embodiments are implemented based on the technical solutions of the present invention. Although the detailed implementations and specific operation procedures are described, the protection scope of the present invention is not limited to the embodiments. Those skilled in the art can modify and polish the present invention without changing the spirit and content of the present invention.

[0041] In the embodiments of the present invention, a scene space feature vector acquisition network, that is, a vision-based convolutional neural network, is constructed to extract spatial features of a scene and a to-be-gripped target object. An inverse reinforcement reward policy network is constructed to indirectly describe a possible driven gripping policy. Meanwhile, gripping is simulated through a domain randomization algorithm in a simulation environment, and simulation data is used to pre-train the inverse reinforcement reward policy network. The scene space feature vector acquisition network and the inverse reinforcement reward policy network are respectively pre-trained, which decouples the traditional complex servo control of vision and motion and reduces the complexity of network training. The domain randomization algorithm can be used to quickly generate a large amount of training data, which reduces the number of manual demonstrations, and improves the training effect of the networks within limited time and resources. Finally, through the integration of a real scene and demonstration data, the system network is modified to adapt to the real scene and task. After the network trainings are completed, a programming result is given through a guided policy search (GPS) algorithm. In the final application process, there is no need for precise spatial calibration on the to-be-gripped target object and the related environment in the scene. It is only necessary to guide the robot arm to complete the gripping task according to the trained networks, which have low requirements for space perception equipment, high environmental adaptability, and can be applied to a variety of tasks.

Embodiment 1

[0042] This embodiment of the present invention provides a device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene. The device includes: a robot arm, a to-be-gripped target object, an image sensor and a control module, as shown in FIG. 1.

[0043] The robot arm is a UR5 robot arm 6, and the UR5 robot arm 6 is a six-degree-of-freedom robot arm. A mechanical gripper 7 is provided at an end of the robot arm. The mechanical gripper 7 is able to complete the gripping of a to-be-gripped target object 3 through clamping and releasing motions. The UR5 robot arm 6 is fixed through a base frame 5 in a scene 8.

[0044] The to-be-gripped target object 3 is preferably a fruit or a vegetable, for example, a tomato, which is placed on a working platform 4. The working platform 4 is a stable working plane with a certain height, such as a desk. The working platform 4 is placed in the scene 8. The to-be-gripped target object 3 is within a grip range of the UR5 robot arm 6.

[0045] The image sensor is a kinect image sensor 1, specifically a Kinect 2.0 image sensor. The Kinect image sensor 1 is fixed on a kinect mounting bracket 2. The Kinect mounting bracket 2 is a device that is able to fix the kinect image sensor 1 at a certain height, and is preferably constructed using an aluminum profile. The kinect mounting bracket 2 is placed beside the UR5 robot arm 6 and the to-be-gripped target object 3. The kinect image sensor 1 is able to shoot the UR5 robot arm 6, the to-be-gripped target object 3 and the scene 8.

[0046] The control module is a Jetson TX1 control board 9. The Jetson TX1 control board 9 is electrically connected to the UR5 robot arm 6 and the kinect image sensor 1, respectively. The Jetson TX1 control board 9 drives the UR5 robot arm 6 to grip the to-be-gripped target object 3 through the mechanical gripper 7, and controls the kinect image sensor 1 to perform image sampling on a process of the UR5 robot arm 6 gripping the to-be-gripped target object 3. The kinect image sensor 1 sends sampled image data to the Jetson TX1 control board 9.

[0047] Specifically, referring to FIG. 2, the Kinect image sensor 1 converts an interface to a USB3.0 interface through a Kinect adapter 10. The Kinect adapter 10 is connected to the Jetson TX1 control board 9 via USB3.0. The UR5 robot arm 6 obtains power by connecting a robot arm control box 12. The robot arm control box 12 and the Jetson TX1 control board 9 are connected by a network cable. The Jetson TX1 control board 9 inputs a robot arm control signal to the robot arm control box 12 through a network cable interface.

[0048] Preferably, the Jetson TX1 control board 9 is connected to a display screen 11 through a high-definition multimedia interface (HDMI) interface.

[0049] Further, referring to FIG. 3, an Ubuntu operating system and drive components are installed in the Jetson TX1 control board 9. Other software is installed in the Jetson TX1 control board 9 through a Jetpack developer kit. A Kinect support library is installed to enable the control module, i.e., the Jetson TX1 control board 9 to drive the Kinect image sensor, and use relevant image processing tools and algorithms. By installing a python dependency library and MongoDB database software, an embedded database is built in the Jetson TX1 control board 9 to save relevant data for subsequent trainings. Docker container engine is installed to create an independent software operating environment, and the robot operating system (ROS) and Tensorflow framework are installed in the Docker container. In this way, the Jetson TX1 control board 9 is provided with a container engine with a complete development environment, which can be quickly transferred to other hardware systems.

[0050] The ROS includes algorithm nodes for red, green and blue-depth (RGB-D) sampling processing and sampling control nodes for the UR5 robot arm. The Tensorflow framework includes a GPS algorithm control program, as well as trained visual space feature extraction and reinforcement reward policy networks.

Embodiment 2

[0051] This embodiment of the present invention provides a method for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene, which is based on the device for collaborative servo control of motion vision of a robot in an uncalibrated agricultural scene according to Embodiment 1. Referring to FIG. 4, the method includes the following steps:

[0052] S101: Construct a scene space feature vector acquisition network, and acquire a scene space feature vector.

[0053] In this embodiment, the scene space feature vector acquisition network is a vision-based convolutional neural network. Referring to FIG. 5, the transfer of the scene space feature vector acquisition network uses the first five layers of CIFAR-1000 VGG16 as an image feature extraction network, which is a convolutional neural network.

[0054] A convolutional layer of the convolutional neural network is calculated as follows:

[00001] $x_{j}^{l} = f (\underset{i \in M_{j}}{.Math.} x_{i}^{l - 1} k_{ij}^{l} + b_{j}^{l})$

[0055] where, x.sub.j.sup.l represents a j-th feature map of an l-th layer;

[00002] $\underset{i \in M_{j}}{.Math.} x_{i}^{l - 1} k_{ij}^{l}$

represents a convolution operation and summation for a related feature map x.sub.i.sup.l−1 of l−1 layers and a j-th convolution kernel k.sub.ij.sup.l of the l-th layer; b.sub.j.sup.l is an offset parameter supplementing the j-th feature map of the l-th layer; and f( . . . ) is an excitation function, which is used to generate an output from a convolution result on a right side of the equation.

[0056] A pooling layer of the convolutional neural network is calculated as follows:

x.sub.j.sup.l=f(β.sub.j.sup.ldown(x.sub.j.sup.l−1))

[0057] where, x.sub.j.sup.l represents the j-th feature map of the l-th layer; β.sub.j.sup.l represents a weight; in a 2×2 pooling process in this embodiment, β.sub.j.sup.l is taken as ¼; down( . . . ) is a down-sampling function; f( . . . ) is an excitation function, which is used to generate an output from a pooling result on a right side of the equation.

[0058] In this embodiment, the structure of the scene space feature vector acquisition network used is as follows:

TABLE-US-00001 TABLE 1 No. Name Parameter 1 Input 240 × 240 × 3 2 conv1 + pool1 Convolution: 7 × 7 × 64, sliding step 1, pooling: 2 × 2, sliding step 1 3 conv2 + pool2 Convolution: 5 × 5 × 32, sliding step 1, pooling: 2 × 2, sliding step 1 4 conv3 + pool3 Convolution: 5 × 5 × 64, sliding step 1, pooling: 2 × 2, sliding step 1 5 Softmax 32 6 Fully connected 5 64 7 Fully connected 6 64 8 Fully connected 7 40

[0059] In this embodiment, the Jetson TX1 control board 9 controls the kinect image sensor 1 to shoot and gripping scene and extract RGB image information to acquire a 240×240×3 3-channel RGB image. The image data is input into the scene space feature vector acquisition network, and the scene space feature vector acquisition network finally outputs a 40-dimensional sparse vector F, which is used to represent a scene image feature.

[0060] S102: Acquire a demonstrated action sample.

[0061] In this embodiment, the acquiring the demonstrated action sample includes:

[0062] S1021: Pull the robot arm to complete gripping of a to-be-gripped target object, and acquire demonstrated gripping action data of a single demonstrated gripping.

[0063] The UR5 robot arm 6 is manually pulled to run through a demonstrated gripping path of the UR5 robot arm 6, such that the mechanical gripper 7 at the end of the UR5 robot arm 6 reaches a position where the to-be-gripped target object 3 can be directly gripped. In the gripping process, the Jetson TX1 control board 9 continuously samples state information of joints in motion at a frequency f and acquires the demonstrated gripping action data of a single demonstrated gripping.

[0064] In this embodiment, the UR5 robot arm 6 includes six-degree-of-freedom joints. The state information of each joint is denoted as S.sub.robot(θ.sub.i, ω.sub.i, a.sub.i, v.sub.i, a.sub.i′, x.sub.i), including: a rotation θ.sub.i, a velocity ω.sub.i, an angular acceleration a.sub.i, a space motion velocity v.sub.i of a joint node center, a space motion acceleration a.sub.i′ of the joint node center and a displacement relative to an initial position x.sub.i. S.sub.robot.sup.direct(θ.sub.i, ω.sub.i) can be directly acquired, including the rotation θ.sub.i and the velocity ω.sub.i, and an initial zero point of the joint is θ.sub.i=0, ω.sub.i=0. S.sub.robot.sup.indirect(a.sub.i, v.sub.i, a.sub.i′, x.sub.i) is acquired indirectly. The joint state is calculated based on S.sub.robot.sup.direct(θ.sub.i, ω.sub.i) and a sampling step T=1/f, f being the sampling frequency.

[0065] During the gripping process, the UR5 robot arm drive node program continuously samples the state information of the joints in motion at the frequency f. In each sample, the UR5 robot arm drive node program acquires the joint state information S.sub.robot.sup.direct(θ.sub.i, ω.sub.i) that can be directly acquired, and synchronously calculates the joint state information S.sub.robot.sup.indirect(a.sub.i, v.sub.i, a.sub.i′, x.sub.i) that can be indirectly acquired. The S.sub.robot.sup.direct(θ.sub.i, ω.sub.i) and S.sub.robot.sup.indirect(a.sub.i, v.sub.i, a.sub.i′, x.sub.i) in a single sampling are combined into a single joint state information sampling result S.sub.robot(θ.sub.i, ω.sub.i, a.sub.i, v.sub.i, a.sub.i′, x.sub.i).

[0066] Then, all joint state information sampling results S.sub.robot(θ.sub.i, ω.sub.i, a.sub.i, v.sub.i, a.sub.i′, x.sub.i) acquired through multiple samplings during the gripping process are arranged in the order of sampling time to form a continuous joint state information data sequence. This sequence serves as the demonstrated gripping action data of a single demonstrated gripping.

[0067] S1022: Drive the robot arm to simulate the demonstrated gripping action data and autonomously complete an action of gripping the to-be-gripped target object, and acquire image feature data of a demonstrated gripping scene through shooting.

[0068] After a single demonstrated action is completed, the personnel leaves the scene. Based on the state information of the six-degree-of-freedom joints of the UR5 robot arm 6 included in the demonstrated gripping action data, the Jetson TX1 control board 9 drives the UR5 robot arm 6 to simulate the demonstrated process to complete a single action of gripping the to-be-gripped target object 3. Meanwhile, the Jetson TX1 control board 9 drives the kinect image sensor 1 to perform image sampling on the gripping process at the frequency of f, so as to acquire the image feature data of a single gripping in the demonstrated gripping scene.

[0069] S1023: Integrate the demonstrated gripping action data and the image feature data of the demonstrated gripping scene to obtain the demonstrated action sample.

[0070] The demonstrated gripping action data, the image feature data of the demonstrated gripping scene, inherent condition parameters of the robot arm and the task are synchronously recorded in the MongoDB database, and are integrated to acquire a demonstrated action sample D.sub.t({γ.sub.t}, g, d), {γ.sub.t}={S.sub.t, P.sub.t}. {S.sub.t} is the state information data of the six-degree-of-freedom joints; {P.sub.t} is an image feature data sequence; g is the state information of the to-be-gripped target object (including the size and distance of the to-be-gripped target object); d is kinetic information of the robot arm (including the mass of robot arm model components, an initial posture of the robot arm model joints) and control parameters.

[0071] S103: Construct an inverse reinforcement reward policy network.

[0072] In this embodiment, the constructing the inverse reinforcement reward policy network includes:

[0073] S1031: Construct the inverse reinforcement reward policy network for fitting and representing a reward.

[0074] In this embodiment, the inverse reinforcement reward policy network is a deep neural network (DNN), which is used to fit and represent a reward function in the GPS algorithm, so as to avoid manual selection of feature parameters for modeling.

[0075] In this embodiment, the structure of the inverse reinforcement reward policy network used is as follows:

TABLE-US-00002 TABLE 2 No. Name Parameter 1 Input 40-dimensional feature vector 2 Fully connected 1 50 3 Fully connected 2 30 4 Fully connected 3 12

[0076] Then, an initial value θ.sub.0 of the weight parameter of the inverse reinforcement reward policy network is generated by a uniform random means. At this time, the DNN can be used to represent a reward function that is not optimized by learning and training.

[0077] S1032: Generate a simulation parameter through a simulation domain randomization algorithm.

[0078] First, a feasible parameter domain C is set to indicate a possible range of a parameter of the domain randomization algorithm. The parameter domain C includes a feasible parameter domain C.sub.g of a relevant parameter of the to-be-gripped target object 3 and a feasible parameter domain C.sub.d of a relevant kinetic parameter of the UR5 robot arm 6.

[0079] Specifically, the Ubuntu system is installed on a training machine with a GTX1080 graphics card, and the Docker container built in the Jetson TX1 control board 9 is transplanted. Meanwhile, a real model of the UR5 robot arm 6 and an abstract model of the to-be-gripped target object 3 are imported into the ROS in the training machine. Through the domain randomization algorithm, the initial state of the UR5 robot arm 6 and the size and spatial position of the to-be-gripped target object 3 are randomly generated, and a shooting and observation angle of view in the simulation environment is determined.

[0080] In this embodiment, the parameters used in the domain randomization algorithm are as follows:

TABLE-US-00003 TABLE 3 No. Name Parameter Parameter domain 1 Mass of robot arm model component M.sup.i.sub.Link (i = 1, 2, . . . , 6) [M.sub.min, M.sub.max].sub.kg 2 Initial posture of robot arm model β.sub.i (i = 1, 2, . . . , 6) [β.sub.min, β.sub.max].sub.rad joint 3 Initial damping coefficient D.sup.i.sub.Joint (i = 1, 2, . . . , 6) [D.sub.min, D.sub.max] 4 Size of to-be-gripped target object ObjectSize(1 × 1 × 1) [L.sub.min, L.sub.max].sub.m 5 Main window angle Vpangle [α.sub.min, α.sub.max] 6 Distance of to-be-gripped target object Location [X.sub.min, X.sub.max].sub.m 7 Gain factor of controller Gain [G.sub.min, G.sub.max] 8 Time step T [T.sub.min, T.sub.max].sub.s−1

[0081] Then, a set of parameters (M.sub.link.sup.i, β.sub.i, D.sub.Joint.sup.i, ObjectSize,Vpangle,Location,Gain,T) of the domain randomization algorithm are randomly generated in the parameter domain C, where g(ObjectSize,Vpangle,Location), d(M.sub.link.sup.i, D.sub.Joint.sup.i, Gain,T), and state S(β.sub.i).

[0082] S1033: Program and simulate a virtual gripping action by using an ROS programming library, and obtain a simulated gripping path through sampling.

[0083] Based on the parameters of the domain randomization algorithm, the task object, initial state and execution conditions in the simulation environment are set. The ROS programming library is used to program and simulate the gripping action in the simulation environment, and the simulated gripping action path is sampled to acquire simulated gripping path state data. Meanwhile, according to the main window angle parameter Vpangle, the observation angle in the simulation is adjusted, and continuous image sampling is performed to acquire image data of the simulated gripping scene.

[0084] The simulated gripping path state data, the image data of the simulated gripping scene and the parameters of the domain randomization algorithm are combined to generate single action sample data Z.sub.t({γ.sub.t′}, g′, d′), which is to be saved in the MongoDB database. {γ.sub.t′}={S.sub.t′, P.sub.t′}, where {S.sub.t′} is the state information data of the six-degree-of-freedom joints; {P.sub.t′} is an image feature data sequence; g′ is the state information of the to-be-gripped target object (including the size and distance of the to-be-gripped target object); d′ is kinetic information of the robot arm (including the mass of the robot arm model component, the initial posture of the robot arm model joint) and control parameters.

[0085] S1034: Subject the inverse reinforcement reward policy network to a simulation pre-training.

[0086] The inverse reinforcement reward policy network is pre-trained by using the simulated action sample data Z.sub.t({γ.sub.t′}, g′, d′).

[0087] First, the initial value θ of the weight parameter of the randomly generated inverse reinforcement reward policy network is taken as an initial value for iteration, that is, θ.sup.1=initial_weights( )=θ.sub.0.

[0088] An iterative loop is started, and a loop feature quantity n is executed from 1 to an upper limit of iteration n max:

[0089] The current network weight parameter θ.sup.n of an n-th loop and the spatial image feature F are input to calculate a current reward distribution as follows:

γ.sup.n=nn_forward(F, θ.sup.n)

[0090] Then, according to the current reward distribution, a Markov decision process (MDP)-based optimal policy π.sup.n is calculated:

π.sup.n=solve_mdp(γ.sup.n)

[0091] An expected state frequency IE[μ.sup.n] and an expert demonstration loss L.sub.D are calculated, where D indicates the demonstration data as an expert action;

IE[μ.sup.n]=propagrate_policy(π.sup.n)

L.sub.D.sup.n=log(π.sup.n)×μ.sub.D.sup.a

[0092] A derivative

[00003] $\frac{\partial L_{D}^{n}}{\partial γ^{n}}$

of the expert demonstration loss function to the reward r and a derivative

[00004] $\frac{\partial L_{D}^{n}}{\partial θ_{D}^{n}}$

of the expert demonstration loss function to the network model parameter are calculated, where μ.sub.D is an expert state action frequency:

[00005] $\frac{\partial L_{D}^{n}}{\partial r^{n}} = μ_{D} - IE [μ^{n}]$ $\frac{\partial L_{D}^{n}}{\partial θ_{D}^{n}} = nn_backprop (\frac{\partial L_{D}^{n}}{\partial γ^{n}})$

[0093] The network model parameter is corrected according to the

[00006] $\frac{\partial L_{D}^{n}}{\partial θ_{D}^{n}}$

gradient, and a single iterative optimization is completed:

[00007] $θ^{n + 1} = update_weights (θ^{n}, \frac{\partial L_{D}^{n}}{\partial θ_{D}^{n}})$

[0094] The algorithm is iterated to a maximum number of iterations or until the expert demonstration loss L.sub.D is less than a tolerable limit, and the network converges to obtain θ.sub.end. Using this parameter as the network weight parameter, the reward policy network guides the robot arm model to execute an execution policy similar to an expected policy programmed by the ROS programming library in the simulation environment.

[0095] S104: Subject the inverse reinforcement reward policy network to a transfer training.

[0096] First, the weight parameter θ.sub.end of the inverse reinforcement reward policy network pre-trained in S103 is used as an initial condition, and the demonstrated action sample D.sub.t({γ.sub.t}, g, d) acquired in S102 is used to replace the programmed and simulated action sample data Z.sub.t({γ.sub.t′}, g′, d′) in S103. In this way, the inverse reinforcement reward policy network is trained, and a network training correction is performed, so as to realize the transfer from a simulation environment model to a real policy model.

[0097] Specifically, let θ=θ.sub.end. Based on the demonstrated sample data as the expert action, the image feature vector of the real scene calculated by the visual feature extraction network is used as a feature input to perform the transfer training optimization on the inverse reinforcement reward policy network. The specific algorithm execution step is the same as the optimization process in S1034, and an optimized network weight θ.sub.end* is acquired.

[0098] Then, the inverse reinforcement reward policy network acquired through the network weight parameter θ.sub.end*, which is a GPS-based reward network with human policy perception, is used to evaluate the reward of the robot arm policy, and guide the robot arm to make a decision similar to human perception in a task in the complex agricultural environment.

[0099] S105: Acquire, based on a visual feature extraction network and the inverse reinforcement reward policy network, a forward-guided programming result by using the GPS algorithm.

[0100] Based on the visual feature extraction network and the inverse reinforcement reward policy network trained by the learning algorithm, the GPS algorithm is used for forward-guided programming.

[0101] The specific guided programming process is as follows:

[0102] First, multiple differentiated dynamic programming (DDP) policies (DDP)π.sub.g.sub.1, . . . , π.sub.g.sub.n are generated.

[0103] Then, policy path data ζ.sub.1, . . . , ζ.sub.m is acquired by sampling the multiple DDP policies, and average policies are calculated by

[00008] $q (ϛ) = \frac{1}{n} \underset{i}{.Math.} π_{g_{i}} (ϛ)$

and combined for simultaneous guiding, so as to improve efficiency.

[0104] Maximum likelihood estimation is performed on the parameters of these policies, θ*←arg max.sub.θΣ.sub.i log π.sub.θ*(ζ.sub.i).

[0105] π.sub.g.sub.1, . . . , π.sub.g.sub.n and π.sub.θ* are combined to acquire an initial sample set S.

[0106] Based on the vector state parameters after scene feature extraction, the inverse reinforcement reward policy network is used to evaluate the sample set S, and rewards of π.sub.g.sub.1, . . . , π.sub.g.sub.n and π.sub.θ* are evaluated.

[0107] If the reward of π.sub.g.sub.i is greater than that of π.sub.θ*, then π.sub.g.sub.i is assigned to π.sub.θ*, and the regular parameter in the policy function is reduced correspondingly.

[0108] If the reward of π.sub.g.sub.i is less than that of π.sub.θ*, then the regular parameter in the policy function is increased.

[0109] The evaluation of π.sub.g.sub.1, . . . , π.sub.g.sub.n is repeated to finally acquire a guided optimal policy, which is the forward-guided programming result.

[0110] The method of the present invention is based on double servo drives of motion vision to train the robot to obtain intelligent spatial perception and task programming capabilities through an adaptive learning algorithm. In the final drive process, there is no need for precise spatial calibration on the to-be-gripped target object and the related environment in the scene. The robot arm is guided to complete the gripping task according to the trained networks, which have low requirements for space perception equipment, high environmental adaptability, and can be applied to a variety of tasks.

[0111] The above disclosed are merely two specific embodiments of the present invention, and the embodiments of the present invention are not limited thereto. Any changes that can be conceived by those skilled in the art should fall within the protection scope of the present invention.

[0112] Those of ordinary skill in the art may understand that all or some of the procedures in the methods of the above embodiments may be implemented by a computer program instructing related hardware. The program may be stored in a computer readable storage medium. When the program is executed, the procedures in the embodiments of the above methods may be performed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RANI), etc.

METHOD AND DEVICE FOR COLLABORATIVE SERVO CONTROL OF MOTION VISION OF ROBOT IN UNCALIBRATED AGRICULTURAL SCENE

Assignee

Inventors

Cpc classification

Classification Explorer

B25J9/1612

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B25J9/1661

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

A01B63/002

HUMAN NECESSITIES

Classification Explorer

B25J9/161

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

A01M7/0089

HUMAN NECESSITIES

Classification Explorer

G05B2219/40269

PHYSICS

Classification Explorer

G05B19/4155

PHYSICS

Classification Explorer

B25J9/1605

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

B25J9/1697

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B25J9/163

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B25J13/08

PERFORMING OPERATIONS; TRANSPORTING

International classification

Classification Explorer

B25J9/16

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

A01B63/00

HUMAN NECESSITIES

Classification Explorer

B25J13/08

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G05B19/4155

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Abstract

Claims

Description