Method and device for joining a plurality of individual digital images to form a total image

Abstract

In a device and a corresponding method for joining a plurality of individual digital images to form a total image, a plurality of features is determined in a first individual image by means of a selection unit using a feature-based algorithm and then tracked in a second individual image by means of a tracking unit. A transformation matrix, with which the individual images are joined in an output unit to form the total image, is calculated from the determined feature correspondences in a transformation unit. The individual images can be joined in real time and with a high degree of accuracy by means of the feature-based algorithm in combination with a robust algorithm to calculate the transformation matrix.

Claims

1. A method for joining a plurality of individual digital images to form a total image, the method comprising the steps: providing a first individual digital image and a second individual digital image overlapping the first individual digital image; determining a plurality of features of the first individual image, wherein the features are each allocated first feature coordinates; determining the features in the second individual image, wherein the features are each allocated second feature coordinates; setting up highlight masks for the first individual image and the second individual image, wherein pixel values are detected, which exceed a highlight threshold value and no features are permitted for a calculation of a transformation matrix in at least one of the highlight masks; determining the transformation matrix having at least six degrees of freedom between the individual images, wherein: a first part quantity of the features is selected; a test transformation matrix is calculated from the first feature coordinates and the second feature coordinates of this part quantity; a quality value is determined for the test transformation matrix with a second part quantity of the features; and the transformation matrix is determined proceeding from the test transformation matrix, when the quality value fulfils a quality criterion; and joining the individual images to form a total image by means of the transformation matrix.

2. A method according to claim 1, wherein the feature coordinates are determined iteratively in different resolution stages of the individual images.

3. A method according to claim 1, wherein the features of the first individual image are determined, wherein in each case: a pixel window is formed from a plurality of adjacent pixels; a matrix is formed from the gradients for the plurality of adjacent pixels; the eigenvalues are calculated from the matrix; the pixel window is characterized as the feature, if the eigenvalues exceed a threshold value; and the feature is allocated first feature coordinates; and the features in the second individual image are determined, wherein in each case: an error vector is formed for the feature as a function of the gradients and the differences in the pixel values of the individual images in the pixel window; a displacement vector is calculated from the error vector and the matrix; and the feature is allocated second feature coordinates.

4. A method according to claim 3, wherein the calculation of the respective displacement vector takes place iteratively, wherein: the individual images are displaced relative to one another by a first displacement vector; and a second displacement vector is calculated by means of the displaced individual images.

5. A method according to claim 3, wherein the calculation of the respective displacement vector takes place in different resolution stages of the individual images, wherein: a first displacement vector is calculated in a first, low resolution stage of the individual images; the individual images are displaced relative to one another in a second, higher resolution stage by the first displacement vector; and a second displacement vector is calculated by means of the displaced individual images in the second resolution stage.

6. A method according to claim 1, wherein a pixel window is only characterized as feature if a minimum distance from the already determined features is exceeded.

7. A method according to claim 1, wherein the transformation matrix has precisely eight degrees of freedom and characterizes a projective transformation.

8. A method according to claim 1, wherein the feature coordinates are normalized before the calculation of the transformation matrix.

9. A method according to claim 8, wherein the normalization takes place in such a way that the feature coordinates belonging to an individual image have an average distance of √{square root over (2)} from a focal point.

10. A method according to claim 1, wherein the calculation of the transformation matrix takes place in such a way that for a repetition count: a first part quantity of the features is in each case selected; a test transformation matrix belonging to said first part quantity is calculated; the number of features of a second part quantity is determined for the respective test transformation matrix as the quality value and the features of said second part quantity are imaged by the test transformation matrix with a defined accuracy; the test transformation matrix with the highest number is selected; and the transformation matrix is calculated from the associated features, which is imageable with the defined accuracy.

11. A method according to claim 1, wherein the individual images are joined in a planar projection face.

12. A method according to claim 1, wherein the individual image which is later in terms of time during joining is superimposed on the individual image that is earlier in terms of time.

13. A method according to claim 1, wherein, for joining, the x-coordinate and the y-coordinate of the pixels of the displaced individual image are rounded to the nearest integral value.

14. A method according to claim 1, wherein the individual images are recorded by means of an endoscope system.

15. A method according to claim 1, wherein the individual images are rectified before the determination of the features.

16. A device for joining a plurality of digital images to form a total image, the device comprising: an input unit for providing a first individual digital image and a second individual digital image overlapping the first individual digital image; a highlight unit to determine highlight masks of the individual images; a selection unit to determine a plurality of features of the first individual image, which is configured in such a way that the features are each allocated first feature coordinates; a tracking unit to determine the features in the second individual image, which is configured in such a way that the features are each allocated second feature coordinates and no features are permitted within at least one of the highlight masks; a transformation unit to determine a transformation matrix having at least six degrees of freedom between the individual images, which is configured in such a way that: a first part quantity of the features is selected; a test transformation matrix is calculated from the first feature coordinates and the second feature coordinates of the first part quantity; a quality value is determined for the test transformation matrix with a second part quantity of the features; and the transformation matrix is determined proceeding from the test transformation matrix if the quality value satisfies a quality criterion; and an output unit to join the individual images to form a total image by means of the transformation matrix.

17. A device according to claim 16, wherein a rectifying unit is provided to rectify the individual images.

18. A device according to claim 16, wherein the transformation unit is configured such that the feature coordinates are normalized before calculation of the transformation matrix.

19. An endoscope system, comprising: an imaging sensor arranged on a shaft to record individual digital images; a device for joining a plurality of individual digital images to form a total image with an input unit for providing a first individual digital image and a second individual digital image overlapping the first individual digital image; a highlight unit to determine highlight masks of the individual images; a selection unit to determine a plurality of features of the first individual image, which is configured in such a way that the features are each allocated first feature coordinates; a tracking unit to determine the features in the second individual image, which is configured in such a way that the features are each allocated second feature coordinates and no features are permitted within at least one of the highlight masks; a transformation unit to determine a transformation matrix having at least six degrees of freedom between the individual images, which is configured such that: a first part quantity of the features is selected; a test transformation matrix is calculated from the first feature coordinates and the second feature coordinates of the first part quantity; a quality value is determined for the test transformation matrix with a second part quantity of the features; and the transformation matrix is determined proceeding from the test transformation matrix if the quality value satisfies a quality criterion; and an output unit to join the individual images to form a total image by means of the transformation matrix.

20. An endoscope system according to claim 19, wherein the transformation unit is configured such that the feature coordinates are normalized before calculation of the transformation matrix.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) In the drawings:

(2) FIG. 1 is a schematic view of an endoscope system with a device for joining individual digital images to form a total image;

(3) FIG. 2 is a schematic view of two consecutively recorded individual images;

(4) FIG. 3 is a schematic view of a total image joined from the individual images in FIG. 2;

(5) FIG. 4 is a schematic flowchart of the method for joining the individual images to form the total image; and

(6) FIG. 5 is a schematic flowchart of the feature-based algorithm for determining and tracking features in the individual images.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

(7) An endoscope system 1 has a shaft 2, on which a handle 4 is arranged on a first end 3 and an imaging system 6 is arranged on a second end 5. The shaft 2 is flexible. Alternatively, the latter may also be rigid.

(8) The optical system 6 comprises a radiation source 7, a wide angle lens 8 and an imaging camera sensor 9. The radiation source 7 and the camera sensor 9 can be controlled by the handle 4. The imaging sensor 9 is, for example, configured as a miniaturized video sensor in Tip-Chip technology.

(9) The individual images I.sub.a, I.sub.b recorded by the sensor 9 are transmitted by a signal line 10 from the sensor 9 to a device 11, which is used to join the individual digital images I.sub.a, I.sub.b to form a total image G. The signal line 10 is configured, for example, as an electric image transmission line.

(10) The device 11 is configured as a personal computer, on which an input unit 12, a rectifying unit 13, a highlight unit 14, a selection unit 15, a tracking unit 16, a transformation unit 17 and an output unit 18 are implemented. To store data, the device additionally has a storage unit 19. To visualize the total image G, the device 11 is connected to a monitor 20.

(11) Using the camera sensor 9, during a time interval Δt, a first individual image I.sub.a and a second individual image I.sub.b that is later with respect to time are initially recorded. The camera sensor 9 was displaced between the recording of the individual images I.sub.a and I.sub.b, so that the scene located in front of the camera sensor 9 was recorded from different positions.

(12) The individual image I.sub.a is firstly read in step S.sub.1 into the input unit 12 and then rectified in the rectifying unit 13 in step S.sub.2 by means of an inverse distortion model. The rectified individual image I.sub.a is stored in the storage unit 19 in step S.sub.3. The rectified individual image I.sub.a is then passed to the highlight unit 14, which produces, in step S.sub.4, a highlight mask L.sub.a, which is stored in the storage unit 19 in step S.sub.5. To produce the highlight mask L.sub.a, the pixel values p.sub.a(x,y) are compared with a highlight threshold t.sub.1 and those pixels are allocated to the highlight mask L.sub.a, the pixel values p.sub.a(x,y) of which exceed the highlight threshold value t.sub.1. This pixel range or these pixel ranges is or are then extended by morphological dilation with a round structure element and the highlight mask L.sub.a is thus formed.

(13) The individual image I.sub.a is then fed in step S.sub.6 to the selection unit 15 and the tracking unit 16, in which the feature-based algorithm, for example the KLT algorithm, is implemented. The precise mode of functioning of the KLT algorithm will be described in more detail below.

(14) The selection unit 15 in step S.sub.7 outputs n features M.sub.1 to M.sub.n, which were determined in the individual image I.sub.a. The features M.sub.1 to M.sub.n are characterized by associated feature coordinates p.sub.1.sup.a to p.sub.1.sup.a which are stored in the storage unit 19. The individual image I.sub.a is fed in step S.sub.8 to the output unit 18, with which this is shown in step S.sub.9 in a planar projection face. No joining takes place yet for the first individual image I.sub.a, as no further individual image is present. The origin of the first individual image I.sub.a can be re-determined in step S.sub.9. For this purpose, an origin transformation matrix H.sub.a,0 is read out from the storage unit 19 in step S.sub.10. In step S.sub.11, a return is then made to step S.sub.1, where the following second individual image I.sub.b is read in in step S.sub.1. The steps S.sub.2 to S.sub.7 are then repeated for the individual image I.sub.b.

(15) The features M.sub.1 to M.sub.n determined in the first individual image I.sub.a are then determined in the second individual image I.sub.b. This process is also called tracking or following. In step S.sub.12, the features M.sub.1 to M.sub.n and parameters for the KLT algorithm, such as, for example, a search radius r, are read from the storage unit 19 into the tracking unit 16. The mode of functioning of the tracking unit 16 will be described below in detail.

(16) The result of the KLT algorithm, which is output by the tracking unit 16, is feature correspondences, which for each of the features M.sub.1 to M.sub.n consist of the first feature coordinates p.sub.1.sup.a to p.sub.n.sup.a and associated second feature coordinates p.sub.1.sup.b to p.sub.n.sup.b. The feature correspondences are accordingly (p.sub.1.sup.a, p.sub.1.sup.b) to ((p.sub.n.sup.a,p.sub.n.sup.b). The feature correspondences are formed in step S.sub.13, the highlight masks L.sub.a and L.sub.b entering in step S.sub.14, so that no features M are permitted, which lie within one of the highlight masks L.sub.a or L.sub.b.

(17) In step S.sub.15, the transformation matrix H.sub.b,a is calculated in the transformation unit 17. The calculation takes place in accordance with equations (1) to (24) by means of the RANSAC algorithm, for which purpose, in step S.sub.16, parameters for the RANSAC algorithm, such as, for example, the size of the first part quantity T.sub.1 and the second part quantity T.sub.2 of features M.sub.i and the number N of iterations are read in. For subsequent individual images, a feature history can additionally be read in in step S.sub.16. The feature history contains all the coordinates and values of a feature over the time and may be useful when features become lost. Using the feature history, additional feature correspondences between past individual images and the current individual image can be generated in order to stabilize the total image or the production process thereof. The feature coordinates of the lost feature are further transformed by means of the transformation matrix or over a plurality of consecutive individual images by means of a plurality of transformation matrices. As a result, these features can be continued in the list of feature correspondences.

(18) To calculate the transformation matrix H.sub.b,a, the feature coordinates are firstly normalized and the transformation matrix H.sub.b,a is then calculated by means of the RANSAC algorithm. For this purpose, a first part quantity T.sub.1 is selected from the total quantity of features M.sub.1 to M.sub.n. In the case of a projective transformation matrix, eight unknowns are to be determined, so at least four feature coordinates have to be selected in order to clearly calculate the transformation matrix H.sub.b,a. The RANSAC algorithm carries out N-iteration steps, a first part quantity T.sub.1 being randomly selected in each iteration step and a test transformation matrix T being calculated. The calculated test transformation matrix T is then checked with a second part quantity T.sub.2 of features M.sub.i, the part quantities T.sub.1 and T.sub.2 not overlapping. The number N.sub.I of the so-called inliers is used as the quality value for the respective test transformation matrix T. A feature M.sub.i of the second part quantity T.sub.2 is an inlier when it can be imaged within a predefined accuracy by the calculated test transformation model. The Euclidean distance is calculated for this purpose with the test transformation matrix T. The test transformation matrix T with the highest number N.sub.I of inliers is used as the starting point to calculate the transformation matrix H.sub.b,a with all the features M.sub.1 characterized as inliers of the second part quantity T.sub.2.

(19) If no transformation matrix is found, the individual image I.sub.b is discarded in step S.sub.17 and a further individual image is read in in step S.sub.11. If no further individual image is present, the method ends in step S.sub.18. If a transformation matrix H.sub.b,a is found, this is passed in step S.sub.19 to the output unit 18, in which the individual images I.sub.a and I.sub.b are joined to form the total image G. The total image G is visualized by the monitor 20, the current individual image I.sub.b preferably being superimposed on the preceding individual image(s) I.sub.a. The most current individual image I.sub.b is preferably shown centrally on the monitor 20.

(20) The transformation matrix H.sub.b, a is stored in the storage unit 19 in step S.sub.20. If no further individual image is present, the method ends in step S.sub.18.

(21) The mode of functioning of the selection unit 15 and the tracking unit 16 will be described in more detail below with reference to FIG. 5. In step S.sub.6, the individual image I.sub.a is firstly read in. The individual image I.sub.a is then smoothed in step S.sub.61 with a smoothing filter and then a resolution pyramid is set up in step S.sub.62 from a plurality of resolution stages j. The number of resolution stages j is calculated for example, from a predetermined search radius r.

(22) Features M.sub.1 to M.sub.n are then determined or extracted in accordance with equations (16) to (19) from the individual image I.sub.a. The features M.sub.1 to M.sub.n are calculated all the resolution stages j, so that they can also be tracked in the case of large displacements in the subsequent individual image I.sub.b. The displacement vector is firstly determined in the lowest resolution stage and then projected into the next higher resolution stage. The displacement in pixels can therefore be limited in each resolution stage to a desired amount.

(23) A pixel window W with a predefined width b in the x-direction and a predefined height h in the y-direction is displaced over the individual image I.sub.a in step S.sub.63, wherein in each or a plurality of positions of the pixel window W, a matrix G is calculated in accordance with equation (17). The matrix G is produced from the sum of the products gg.sup.T over all the pixels of the pixel window W, g being the gradient vector for a specific pixel of the pixel window W. The pixel window W of the feature M.sub.1 with a gradient vector g is drawn by way of example in FIG. 2. In addition, a weighting function w can be taken into account. If this is not desired, w=1 can be selected. For each pixel window W, the eigenvalues λ.sub.1 and λ.sub.2 of the matrix G are calculated and a decision is made with the aid of equation (19) whether the pixel window W characterizes a feature M. In addition, a feature M being newly added has to maintain a minimum distance ΔM from the already determined features M. In step S.sub.64, storage of the determined features M.sub.1 to M.sub.n is initiated, so that the latter are transmitted to the storage unit 19 in step S.sub.7. A return is then made in step S.sub.65 to step S.sub.61, where the second individual image I.sub.b is read in and smoothed and a resolution pyramid is set up for this in step S.sub.62.

(24) For the following or tracking of the features M.sub.1 to M.sub.n in the tracking unit 16, these are selected one after the other in step S.sub.66. Firstly, for the first feature M.sub.1, the individual images I.sub.a and I.sub.b in the lowest resolution stage j=1 are selected in S.sub.67. The equation system (16) is then solved for this resolution stage. The solution of the equation system (16) is the displacement vector d for the first feature M.sub.1. The matrix G is already known for step S.sub.68. The error vector e is calculated according to equation (18) in step S.sub.68, the pixel window W being assumed in a first iteration step i=1 in the second individual image I.sub.b at a location corresponding to the first individual image I.sub.a. The displacement vector d(i,j) determined in the first iteration step still has errors. For a second iteration step i=2, the pixel window P in the second individual image I.sub.b is assumed at a location displaced by the displacement vector d determined in the first iteration step. The equation system (16) is solved again with a newly calculated error vector e, the second displacement vector d determined in the second iteration step being an improved solution of the equation system (16). The described iteration step S.sub.69 is repeated until the solution for the displacement vector d converges and the feature M.sub.1 in the second individual image I.sub.b is determined in step S.sub.70. The solving of the equation system (16) can take place by means of the Newton-Raphson algorithm. If no converging solution for the displacement vector d is found after a maximum number of iterations, the associated feature M.sub.1 is discarded in step S.sub.71 and a return is made in step S.sub.72 to step S.sub.66, where the next feature M.sub.2 is selected.

(25) If the feature M.sub.1 is determined in step S.sub.70, a return is made in step S.sub.73 to step S.sub.67, where the next highest resolution stage j=2 of the individual images I.sub.a and I.sub.b is selected. The steps S.sub.68, S.sub.69 and S.sub.70 are then repeated for this resolution stage taking into account the displacement vector d determined in the preceding resolution stage. If this loop has been run through for all the resolution stages j, a check is made in step S.sub.74 whether further features M to be determined are present. If this is the case, step S.sub.72 is repeated. If all the features M were determined in the individual image I.sub.b, lost features M are replaced in step S.sub.75. A feature M can be lost if the latter is no longer contained in the individual image I.sub.b because of the displacement of the sensor 9, or it was discarded as unusable in step S.sub.71. In step S.sub.75, lost features M are replaced in such a way that the number of features M is again n. The replacement of the features M in step S.sub.75 takes place in accordance with step S.sub.63. A storage process is initiated for the features M to be newly added in step S.sub.67. If further individual images are present, step S.sub.65 is repeated. If no further individual images I are present, the KLT algorithm is left in step S.sub.76 and a transfer is made to step S.sub.13.

(26) To join the individual images I.sub.a and I.sub.b, the x- and y-coordinates of the pixels of the displaced individual image I.sub.b are rounded to the nearest integral value. This method is also called the “nearest neighbor” method. The following individual images can be calculated by multiplication of the current transformation matrix H.sub.i,i-1 by the preceding transformation matrix H.sub.i-1,0 as follows:
H.sub.i,0=H.sub.i-1,0.Math.H.sub.i,i-1  (25)

(27) Using the method according to the invention and the corresponding device 11, 6 to 25 images per second can be achieved on a PC of the type Intel Core 2 6420 with 2.13 GHz and 2 GB RAM.

(28) While specific embodiments of the invention have been shown and described in detail to illustrate the application of the principles of the invention, it will be understood that the invention may be embodied otherwise without departing from such principles.