IMAGE STITCHING METHOD, APPARATUS AND DEVICE BASED ON REINFORCEMENT LEARNING AND STORAGE MEDIUM
20240378839 ยท 2024-11-14
Inventors
- Jian Gao (Guangzhou, CN)
- Junlang Liang (Guangzhou, CN)
- Lanyu Zhang (Guangzhou, CN)
- Yuheng LUO (GUANGZHOU, CN)
- Zhuojun ZHENG (Guangzhou, CN)
- Xin CHEN (Guangzhou, CN)
Cpc classification
G06T7/80
PHYSICS
International classification
G06T7/80
PHYSICS
G06V10/74
PHYSICS
Abstract
The present application provides an image stitching method, apparatus and device based on reinforcement learning and a storage medium. The method includes: acquiring initial calibration parameters, collecting a sample image and position information of a motion platform; setting a negative reward function; acquiring a state set and a negative reward value set according to a randomly generated action set, the initial calibration parameters, the position information of the motion platform and the negative reward function to construct a probability kinematics model; constructing a state value function based on an occurrence probability of the state, and acquiring an optimal action by optimizing the state value function; and acquiring optimized calibration parameters through the optimal action and the initial calibration parameters, and carrying out image stitching on corresponding sample images through the optimized calibration parameters. The application solves the technical problem of low image stitching quality in the prior art.
Claims
1. An image stitching method based on reinforcement learning, comprising the following steps of: acquiring initial calibration parameters through a calibration board arranged on a motion platform, and collecting a sample image of a detected sample on the motion platform at each moment and position information of the motion platform at each moment in a movement process of the motion platform; setting a negative reward function based on image stitching quality at each moment by taking the image stitching quality at each moment and the position information of the motion platform at each moment as a state at each moment and a calibration parameter adjustment amount at each moment as an action at each moment; randomly generating an action set, and acquiring a state set and a negative reward value set according to the action set, the initial calibration parameters, the position information of the motion platform at each moment and the negative reward function; constructing a Markov experience sequence according to the action set, the state set and the negative reward value set, and constructing a probability kinematics model through the Markov experience sequence, wherein the probability kinematics model is used for predicting an occurrence probability of a state at the next moment according to a state and action at a current moment; constructing a state value function based on an occurrence probability of the state at each moment and a negative reward value at each moment, and acquiring an optimal action at each moment by optimizing the state value function; and acquiring optimized calibration parameters at each moment through the optimal action and the initial calibration parameters, and carrying out image stitching on corresponding sample images through the optimized calibration parameters at each moment.
2. The image stitching method based on reinforcement learning according to claim 1, wherein a calculation process of the image stitching quality comprises: after carrying out image stitching on sample images at two adjacent moments, capturing an overlapping region of a stitched image to obtain a first overlapping image and a second overlapping image; and calculating a similarity degree between the first overlapping image and the second overlapping image to obtain the image stitching quality.
3. The image stitching method based on reinforcement learning according to claim 1, wherein the state value function is:
4. The image stitching method based on reinforcement learning according to claim 1, wherein the carrying out image stitching on the corresponding sample images through the optimized calibration parameters at each moment, comprises: calculating a platform movement distance of the motion platform at each two adjacent moments according to the position information of the motion platform at each moment; calculating an image translation distance of sample images at each two adjacent moments according to the optimized calibration parameters at each moment, the platform movement distance at each two adjacent moments and the position information of the motion platform at each moment; and carrying out image stitching on the sample images at each two adjacent moments based on the image translation distance of the sample images at each two adjacent moments.
5. An image stitching device based on reinforcement learning, wherein the device comprises a processor and a storage; the storage is used for storing a program code and transmitting the program code to the processor; and the processor is used for executing the image stitching method based on reinforcement learning according to claim 1 based on an instruction in the program code.
6. A computer-readable storage medium, wherein the computer-readable storage medium is used for storing a program code, and the program code, when executed by a processor, realizes the image stitching method based on reinforcement learning according to claim 1.
Description
DESCRIPTION OF THE DRAWINGS
[0046] In order to illustrate technical solutions in embodiments of the present application or in the prior art more clearly, the drawings which need to be used in describing the embodiments or the prior art will be briefly introduced hereinafter. Apparently, the drawings described hereinafter are only some embodiments of the present application, those of ordinary skills in the art may obtain other drawings according to these drawings without going through any creative work.
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0054] In order to make those skilled in the art better understand the solution of the present application, technical solutions in embodiments of the present application are clearly and completely described with reference to the drawings in the embodiments of the present application. Apparently, the described embodiments are merely some but not all of the embodiments of the present application. Based on the embodiments of the present application, all other embodiments obtained by those of ordinary skills in the art without going through any creative work should fall within the scope of protection of the present application.
[0055] Traditional image stitching methods are generally divided into a software stitching method and a hardware stitching method, wherein in the software stitching method, feature point information of overlapping parts of two pictures is generally detected, feature points of the two pictures are compared to calculate position and shape transformation of the two pictures, and the overlapping parts are fused to realize image stitching; and in the hardware stitching method, it is generally necessary to calibrate external parameters and acquire physical information of a platform carrying an object to be detected, and by converting an actual physical position into an image position, the two pictures are stitched according to image position information. In general, the software stitching method has better stitching quality, but it takes a long time, so as to be unable to achieve real-time stitching, while the hardware stitching method can realize real-time stitching only by an affine transformation matrix after calibration, but it has high requirements for hardware and motion control precision of a platform. The hardware stitching method usually has a fast speed, but it has stitching quality lower than that of the software stitching method. Therefore, how to design a high real-time and high precision stitching method is an urgent problem to be solved in the industry.
[0056] In order to improve the above problem, the present application improves the hardware stitching method, and based on the hardware stitching method, initial calibration parameters are used as prior information, and reinforcement learning is used to optimize the calibration parameters, so as to improve the stitching quality of the hardware stitching method while ensuring the stitching speed. The reinforcement learning includes acquiring state information, setting a reward function, and outputting an optimal action according to a state and the reward function. For easy understanding, with reference to
[0057] In step 101, initial calibration parameters are acquired through a calibration board arranged on a motion platform, and a sample image of a detected sample on the motion platform at each moment and position information of the motion platform at each moment are collected in a movement process of the motion platform.
[0058] When the sample image of the detected sample is collected and image stitching is carried out, a camera, the motion platform, the detected sample, the calibration board and an industrial personal computer are needed, referring to
[0059] Because the high-resolution camera with the telecentric lens has an extremely low distortion coefficient, it is unnecessary to calibrate internal parameters, and the image stitching aims at two-dimensional movement plane stitching. Therefore, in the embodiment of the present application, the external parameters which are namely a scale and an angle are mainly calibrated during parameter calibration, and may be acquired through the checkerboard calibration board as shown in
[0060] The position information of the motion platform is acquired through a precision plane detection device of a precision motion platform, and the motion platform feeds back the position information during movement. Position information fed back by the motion platform at a moment t is set to be (P.sub.x.sub.
[0061] After calibration by the camera, the motion platform is moved to a starting point position of the detected sample capable of being measured by the camera. In a time period from the moment t1 to the moment t, a platform movement distance (P.sub.x.sub.
[0062] In step 102, a negative reward function is set based on image stitching quality at each moment by taking the image stitching quality at each moment and the position information of the motion platform at each moment as a state at each moment and a calibration parameter adjustment amount at each moment as an action at each moment.
[0063] In order to optimize the global image stitching quality, in the embodiment of the present application, the calibration parameters are automatically optimized by constructing a state, an action and a reward function of an agent, and image stitching is carried out according to the optimized calibration parameters.
[0064] In the embodiment of the present application, the image stitching quality at each moment and the position information (P.sub.x.sub.
[0065] In the embodiment of the present application, the initial scale and the initial angle acquired by the calibration board are used as prior information for subsequent optimization of the scale and the angle, and this process is intended to improve a convergence speed of a calibration parameter optimization process. If the initial scale and the initial angle acquired by the calibration board are not used, a reasonable action range cannot be provided for calibration parameter optimization, which easily causes non-convergence or a slow optimization speed of strategy optimization, thus leading to the failure of the calibration parameter optimization. The process of taking the initial calibration parameters acquired by the calibration board as the prior information and then carrying out the calibration parameter optimization according to the prior information is better than a process of simply using reinforcement learning to output the calibration parameters, and solves the shortcoming of instability of simply using reinforcement learning.
[0066] Further, in the embodiment of the present application, a calculation process of the image stitching quality includes: [0067] after carrying out image stitching on sample images at two adjacent moments, capturing an overlapping region of a stitched image to obtain a first overlapping image and a second overlapping image; and [0068] calculating a similarity degree between the first overlapping image and the second overlapping image to obtain the image stitching quality.
[0069] With reference to
[0071] In step 103, an action set is randomly generated, and a state set and a negative reward value set are acquired according to the action set, the initial calibration parameters, the position information of the motion platform at each moment and the negative reward function.
[0072] After the state, the action and the negative reward function are defined, the action set {a.sub.1, a.sub.2, . . . , a.sub.w} may be randomly generated according to value ranges of a scale adjustment amount .sub.t and an angle adjustment amount .sub.t, wherein w is a total number of actions randomly generated. According to the generated action set, the initial calibration parameters, the position information of the motion platform at each moment and the negative reward function, a corresponding state set {s.sub.1, s.sub.2, . . . , s.sub.w} and a corresponding negative reward value set {c.sub.1, c.sub.2, . . . , c.sub.w} are acquired. The value ranges of the scale adjustment amount and the angle adjustment amount may be determined according to setting parameters of the camera. Specifically, the value ranges of the scale adjustment amount and the angle adjustment amount may be set according to a ratio of a resolution of the camera to a physical size of a measured object, precision required by a stitching operation and temperature drift stability of the camera. Because different stitching operations require different precision and performances of the camera are different in different points, it is necessary to select appropriate value ranges according to an actual stitching operation.
[0073] In step 104, a Markov experience sequence is constructed according to the action set, the state set and the negative reward value set, and a probability kinematics model is constructed through the Markov experience sequence, wherein the probability kinematics model is used for predicting an occurrence probability of a state at the next moment according to a state and action at a current moment.
[0074] One Markov experience sequence {(s.sub.1,a.sub.1,s.sub.1,c.sub.1),(s.sub.2,a.sub.2,s.sub.2,c.sub.2), . . . ,(s.sub.w,a.sub.w, s.sub.w,c.sub.w)} may be constructed according to the action set, the state set and the negative reward value set, wherein s.sub.1, is a state at the next moment following a moment of a state s.sub.1, s.sub.2 is a state at the next moment following a moment of a state s.sub.2, and s.sub.w is a state at the next moment following a moment of a state s.sub.w.
[0075] The probability kinematics model s.sub.i=f(s.sub.i,a.sub.i) may be constructed through the Markov experience sequence above, and the probability kinematics model may predict a state at the next moment according to a state and action at a current moment. The probability kinematics model may be fitted by a deep learning method, a neural network is trained by taking the state action set {(s.sub.1, a.sub.1),(s.sub.2, a.sub.2), . . . , (s.sub.i, a.sub.i), . . . ,(s.sub.w, a.sub.w)} as input data of the neural network and the state {S.sub.1, S.sub.2, . . . , S.sub.i, . . . , S.sub.w} at the next moment as a label of the neural network, s.sub.0 as to acquire the probability kinematics model. When a state s(t1) and action a(t1) at a moment t1 are input to the probability kinematics model, the occurrence probability p(s.sub.t) of the state s(t) at the next moment may be automatically output.
[0076] In step 105, a state value function is constructed based on an occurrence probability of the state at each moment and a negative reward value at each moment, and an optimal action at each moment is acquired by optimizing the state value function.
[0077] The occurrence probability p(s.sub.t) of the state at each moment may be predicted through the probability kinematics model, and the state value function may be obtained by multiplying the occurrence probability of the state by the negative reward value, which is:
[0079] When the state value function is optimized, a value of the state value function is minimized to achieve a maximum reward. The probability p(s.sub.t) output by the probability kinematics model is substituted into the state value function, and the minimum value is acquired by gradient calculation of the state value function, so that an optimal action strategy .sub.* at the moment t is obtained, thus obtaining the optimal action a.sub.t*, =(.sub.t*,.sub.t*) at the moment t. In the embodiment of the present application, model fitting is carried out according to data, the state value function is acquired through strategy evaluation according to the fitted probability kinematics model, then the strategy is optimized by minimizing the state value function, the current optimal action is output, and the calibration parameters are compensated online to maximize the image stitching quality, so as to improve the global image stitching quality. In the embodiment of the present application, in a process of image stitching by hardware, the initial calibration parameters are acquired through calibration by a hardware system to ensure the convergence of calibration parameter optimization, a local error of the initial calibration parameters obtained by the hardware system is compensated through the calibration parameter optimization, so as to realize organic combination of software and hardware, and improve the image stitching quality while ensuring a real-time performance of image stitching. In the embodiment of the present application, aiming at a temperature drift characteristic of the camera under long-term operation, the probability kinematics model is constructed first, and model-based reinforcement learning is used to avoid the problem of low data use efficiency in model-free reinforcement learning.
[0080] In step 106, optimized calibration parameters at each moment are acquired through the optimal action at each moment and the initial calibration parameters, and image stitching is carried out on corresponding sample images through the optimized calibration parameters at each moment to obtain an optimized stitched image.
[0081] After the optimal action (.sub.t*,.sub.t*) at the moment t is acquired, optimized calibration parameters (.sub.t=.sub.0+.sub.t*, .sub.t=.sub.0+.sub.t*) at the moment t are acquired through the optimal action at the moment t and the initial calibration parameters, and then image stitching may be carried out on a sample image at the moment t and a sample image at a moment t+1 through the optimized calibration parameters at the moment t. A platform movement distance of the motion platform at each two adjacent moments is calculated according to the position information of the motion platform at each moment; an image translation distance of sample images at each two adjacent moments is calculated according to the optimized calibration parameters at each moment, the platform movement distance at each two adjacent moments and the position information of the motion platform at each moment; and image stitching is carried out on the sample images at each two adjacent moments based on the image translation distance of the sample images at each two adjacent moments.
[0082] Taking the detection of a flexible printed circuit board as an example, the optimized calibration parameters acquired by the calibration parameter optimization above are used for image stitching, so as to obtain the stitched image as shown in
[0083] In the embodiment of the present application, after the initial calibration parameters are acquired through the calibration board, the negative reward function is set based on the image stitching quality by taking the image stitching quality and the position information of the motion platform as the state and the calibration parameter adjustment amount as the action, so that the state value function is constructed; calibration parameters in different positions are optimized by optimizing the state value function, so that a local error of the calibration parameters is corrected online; and a hardware stitching coefficient is optimized by a reinforcement learning method to achieve the purpose of improving the image stitching quality, so that the technical problem of low image stitching quality in the prior art is improved.
[0084] The above is one embodiment of the image stitching method based on reinforcement learning provided by the present application, and the following is one embodiment of an image stitching apparatus based on reinforcement learning provided by the present application.
[0085] With reference to
[0092] As a further improvement, a calculation process of the image stitching quality includes: [0093] after carrying out image stitching on sample images at two adjacent moments, capturing an overlapping region of a stitched image to obtain a first overlapping image and a second overlapping image; and [0094] calculating a similarity degree between the first overlapping image and the second overlapping image to obtain the image stitching quality.
[0095] As a further improvement, the state value function is:
[0097] As a further improvement, the stitching unit is specifically configured for: [0098] acquiring optimized calibration parameters at each moment through the optimal action at each moment and the initial calibration parameters; [0099] calculating a platform movement distance of the motion platform at each two adjacent moments according to the position information of the motion platform at each moment; [0100] calculating an image translation distance of sample images at each two adjacent moments according to the optimized calibration parameters at each moment, the platform movement distance at each two adjacent moments and the position information of the motion platform at each moment; and [0101] carrying out image stitching on the sample images at each two adjacent moments based on the image translation distance of the sample images at each two adjacent moments.
[0102] In the embodiment of the present application, after the initial calibration parameters are acquired through the calibration board, the negative reward function is set based on the image stitching quality by taking the image stitching quality and the position information of the motion platform as the state and the calibration parameter adjustment amount as the action, so that the state value function is constructed; calibration parameters in different positions are optimized by optimizing the state value function, so that a local error of the calibration parameters is corrected online; and a hardware stitching coefficient is optimized by a reinforcement learning method to achieve the purpose of improving the image stitching quality, so that the technical problem of low image stitching quality in the prior art is improved.
[0103] The embodiment of the present application further provides an image stitching device based on reinforcement learning, wherein the device includes a processor and a storage; [0104] the storage is used for storing a program code and transmitting the program code to the processor; and [0105] the processor is used for executing the image stitching method based on reinforcement learning in the method embodiment above based on an instruction in the program code.
[0106] The embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium is used for storing a program code, and the program code, when executed by a processor, realizes the image stitching method based on reinforcement learning in the method embodiment above.
[0107] It can be clearly understood by those skilled in the art that, for the sake of convenience and brevity in description, a detailed working process of the foregoing apparatus and unit may refer to a corresponding process in the foregoing method embodiments, and will not be elaborated herein.
[0108] The terms first, second, third, fourth, and the like (if any) in the specification and the drawings of the present application above are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that data used in this way may be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms comprising, having and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units not clearly listed in or inherent to the process, method, product or device.
[0109] It should be understood that, in the present application, at least one (item) refers to being one or more, and multiple refers to being two or more. And/or is used for describing the relationship between related objects, and indicates that there may be three relationships. For example, A and/or B may indicate that: A exists alone, B exists alone, and A and B exist at the same time, wherein A and B may be singular or plural. The symbol / generally indicates that there is a relationship of or between the related objects. At least one (item) of the followings or similar expression thereof refers to any combination of these items, comprising a singular (item) or any combination of plural (items). For example, at least one (item) of a, b or c may indicate: a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b and c may be singular or plural.
[0110] In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the foregoing apparatus embodiments are only illustrative. For example, the division of the units is only one logical function division. In practice, there may be other division methods. For example, multiple units or assemblies may be combined or integrated into another system, or some features may be ignored or not executed. In addition, the illustrated or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
[0111] The units illustrated as separated parts may be or not be physically separated, and the parts displayed as units may be or not be physical units, which means that the parts may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objects of the solutions of the embodiments.
[0112] In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units above may be implemented in a form of hardware, or may be implemented in a form of software functional unit.
[0113] The integrated units, if being implemented in the form of software functional unit and taken as an independent product to sell or use, may also be stored in one computer-readable storage medium. Based on such understanding, the essence of the technical solution of the present application, or a part contributing to the prior art, or all or a part of the technical solution may be embodied in a form of software product. The computer software product is stored in one storage medium including a number of instructions such that a computer device (which may be a personal computer, a server, or a network device, etc.) executes all or a part of steps of the method in the embodiments of the present application. Moreover, the foregoing storage medium includes: various media capable of storing the program code, such as a USB disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
[0114] As described above, the embodiments above are only used to illustrate the technical solutions of the present application, and are not intended to limit the present application. Although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skills in the art should understand that: the technical solution recorded in the above-mentioned embodiments can still be modified, or equivalent substitutions can be made to a part of the technical features in the embodiments. However, these modifications or substitutions should not depart from the spirit and scope of the technical solution of the embodiments of the present application.