Method and device for determining movement between successive video images
09838572 · 2017-12-05
Assignee
Inventors
Cpc classification
G06T7/246
PHYSICS
H04N23/6845
ELECTRICITY
G06T7/277
PHYSICS
H04N23/6812
ELECTRICITY
International classification
G06T7/246
PHYSICS
G06T7/277
PHYSICS
Abstract
The method includes for each current pair of first and second successive video images determining movement between the two images. The determining includes a phase of testing homography model hypotheses on the movement by a RANSAC type algorithm operating on a set of points in the first image and first assumed corresponding points in the second image so as to deliver one of the homography model hypothesis that defines the movement. The test phase includes a test of first homography model hypotheses of the movement obtained from a set of second points in the first image and second assumed corresponding points in the second image. At least one second homography model hypothesis is obtained from auxiliary information supplied by an inertial sensor and representative of a movement of the image sensor between the captures of the two successive images of the pair.
Claims
1. A method of determining movement between successive video images captured by an image sensor comprises: determining movement between each current pair of first and second successive video images, the determining comprising testing a plurality of homography model hypotheses on the movement using a RANSAC type algorithm operating on a set of first points in the first video image and on first assumed corresponding points in the second video image so as to determine a selected homography model hypothesis that defines the movement, the testing comprising testing a plurality of first homography model hypotheses on the movement, each of the plurality of first homography model hypotheses being a matrix indicative of a global movement between the pair of first and second successive video images, each of the plurality of first homography model hypotheses being obtained from a set of second points in the first video image and second assumed corresponding points in the second video image; and testing at least one second homography model hypothesis obtained from auxiliary information supplied by at least one inertial sensor, the at least one second homography model hypothesis being a matrix representative of movement of the image sensor between the pair of first and second successive video images; and determining a corrective element comprising a first coefficient for each of the plurality of first homography model hypotheses, the corrective element of a respective first homography model hypotheses being dependent on a distance between the respective first homography model hypothesis and the at least one second homography model hypothesis.
2. The method according to claim 1, wherein the auxiliary information is supplied by at least one gyroscope.
3. The method according to claim 1, wherein the auxiliary information is supplied by a gyroscope and at least one other sensor, with the at least one other sensor comprising at least one of an accelerometer and a magnetometer.
4. The method according to claim 1, wherein the RANSAC type algorithm comprises a Pre-emptive RANSAC type algorithm.
5. The method according to claim 1, wherein the test for each homography model hypothesis includes for each first point of at least one block of the set of first points in the first image the following: determining a first estimated point in the second image from the tested homography model hypothesis; determining a position difference between the first estimated point and a first assumed corresponding point in the second image; and determining a first piece of score information from the position differences obtained and an error tolerance, and a correction of the first piece of score information with the corrective element, so as to obtain a second piece of score information, the second piece of score information being used for determining the selected homography model hypothesis.
6. The method according to claim 5, wherein determining the corrective element comprises determining a weighting of the first coefficient by a weighting coefficient representative of a weight of the first piece of score information associated with the at least one second homography model hypothesis with respect to the first piece of score information associated with the tested homography model hypothesis.
7. The method according to claim 6, wherein the weighting coefficient has a fixed and identical value for all the tested homography model hypotheses of all the video image pairs.
8. The method according to claim 6, wherein the weighting coefficient has a fixed and identical value for all the tested homography model hypotheses of the current pair of video images, with the value being calculated from all the values of respective distances between the tested homography model hypotheses of the current pair of images and the second homography model hypothesis, with the value being recalculated at each new current pair of video images.
9. The method according to claim 5, wherein determining the corrective element also takes into account the number of second points.
10. A device for determining movement between successive video images comprising: a processor configured to receives image signals on video images successively captured by an image sensor, and for each current pair of first and second successive video images, determining movement therebetween, said processor comprising a test module executing a RANSAC type algorithm and configured to test a plurality of homography model hypotheses of the movement using said RANSAC type algorithm operating on a set of first points in the first video image and first assumed corresponding points in the second image so as to deliver a selected homography model hypothesis defining the movement; and an auxiliary input configured to receive auxiliary information from at least one inertial sensor, the auxiliary information being representative of a movement of said image sensor between the pair of first and second successive video images; wherein said test module is configured to test a plurality of first homography model hypotheses of the movement, the plurality of first homography model hypotheses being a matrix indicative of a global movement between the first video image and the second video images, each of the plurality of first homography model hypotheses being obtained from a set of second points in the first video image and second assumed corresponding points in the second video image, and testing at least one second homography model hypothesis indicative of the auxiliary information, the test module being further configured to determine a corrective element comprising a first coefficient for each of the plurality of first homography model hypotheses, the corrective element of a respective first homography model hypotheses being dependent on a distance between the respective first homography model hypothesis and the at least one second homography model hypothesis.
11. The device according to claim 10, wherein the at least one inertial sensor comprises a gyroscope.
12. The device according to claim 10, wherein the at least one inertial sensor comprises a plurality of inertial sensors comprising at least one of a gyroscope, an accelerometer and a magnetometer.
13. The device according to claim 10, wherein said RANSAC type algorithm comprises a Pre-emptive RANSAC type algorithm.
14. The device according to claim 10, wherein said test module comprises: a first determination unit configured to determine, for each first point of at least one block of the set of first points in the first video image, a first estimated point in the second video image from the tested homography model hypothesis; a second determination unit configured to determine a position difference between the first estimated point and the first assumed corresponding point in the second video image; a third determination unit configured to determine a first piece of score information from the position differences obtained and an error tolerance; a calculation unit configured to calculate the corrective element; and a correction unit configured to correct the first piece of score information with the corrective element so as to obtain a second piece of score information, with the second piece of score information being used for determining the selected homography model hypothesis.
15. The device according to claim 14, wherein said calculation unit is further configured to determine a weighting of the first coefficient by a weighting coefficient representative of a weight of the first piece of score information associated with the at least one second homography model hypothesis with respect to the first piece of score information associated with the tested homography model hypothesis.
16. The device according to claim 15, wherein the weighting coefficient has a fixed and identical value for all the tested homography model hypotheses of all the video image pairs.
17. The device according to claim 15, wherein the weighting coefficient has a fixed and identical value for all the tested homography model hypotheses of the current pair of video images, with said calculation unit being further configured to calculate the value from all the values of respective distances between the tested homography model hypotheses of the current pair of video images and the second homography model hypothesis, and for recalculating the value at each new current pair of video images.
18. The device according to claim 14, wherein said calculation unit is further configured to take into account the number of second points.
19. An apparatus comprising: an image sensor configured to generate image signals based on successively captured video images; at least one inertial sensor; a device for determining movement between successive video images comprising a processor configured to receives the image signals, and for each current pair of first and second successive video images, determining movement therebetween, said processor comprising a test module comprising a RANSAC type algorithm and configured to test a plurality of homography model hypotheses of the movement using said RANSAC type algorithm operating on a set of first points in the first video image and first assumed corresponding points in the second image so as to deliver a selected homography model hypothesis defining the movement, an auxiliary input configured to receive auxiliary information from said at least one inertial sensor, the auxiliary information being representative of a movement of said image sensor between the pair of first and second successive video images, and said test module configured for testing a plurality of first homography model hypotheses of the movement obtained from a set of second points in the first video image and second assumed corresponding points in the second video image and at least one second homography model hypothesis obtained from the auxiliary information, wherein said test module comprises: a first determination unit configured to determine, for each first point of at least one block of the set of first points in the first video image, a first estimated point in the second video image from the tested homography model hypothesis; a second determination unit configured to determine a position difference between the first estimated point and the first assumed corresponding point in the second video image; a third determination unit configured to determine a first piece of score information from the position differences obtained and an error tolerance; a calculation unit configured to calculate a corrective element comprising a first coefficient taking into account a distance between the homography model hypothesis and the at least one second homography model hypothesis; and a correction unit configured to correct the first piece of score information with the corrective element so as to obtain a second piece of score information, with the second piece of score information being used for determining the selected homography model hypothesis.
20. The apparatus according to claim 19, wherein said at least one inertial sensor comprises a gyroscope.
21. The apparatus according to claim 19, wherein said at least one inertial sensor comprises a plurality of inertial sensors comprising at least one of a gyroscope, an accelerometer and a magnetometer.
22. The apparatus according to claim 19, wherein said RANSAC type algorithm comprises a Pre-emptive RANSAC type algorithm.
23. The apparatus according to claim 19, wherein said image sensor, said at least one inertial sensor, and said device are configured so that the apparatus is at least one of a mobile cellular telephone and a digital tablet.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Other advantages and features of the invention will appear on examination of the detailed description of implementations and embodiments, which are in no way restrictive, and the attached drawings in which:
(2)
DETAILED DESCRIPTION
(3) In
(4) The apparatus APP also comprises a device 1 for determining movement between successive video images captured by the image sensor 2. This device 1 may, for example, be incorporated within a microprocessor.
(5) The device 1 comprises input means or an input 9 for receiving image signals relating to the video images of the scene SC successively captured by the image sensor 2, and auxiliary input means or an auxiliary input 11 for receiving auxiliary information originating, for example, from a gyroscope 30, and optionally one or more accelerometers 31, and/or a magnetometer 32.
(6) The inertial sensors 30, 31 and 32 are, for example, rigidly connected to the apparatus APP in the same way as the image sensor. The inertial sensors therefore follow any movement in space of the image sensor 2. Accordingly, this auxiliary information is representative of a movement of the image sensor between the captures of two successive video images.
(7) The device 1 comprises processing means or a processor 10 configured, as will be seen in more detail below, for performing for each current pair of first and second successive video images a determination of movement between these two images. In this respect, the processing means comprise test means or a test module 100 configured for testing a plurality of first homography model hypotheses of this movement, obtained from a set of points in the first image and of assumed corresponding points in the second image, and at least one second homography model hypothesis obtained from the auxiliary information.
(8) The test module 100 comprises in this respect various means or units referenced 1001-1005 which will be returned to below with greater detail on their function. The processing means 10 and the means composing the same may be implemented in software within the microprocessor.
(9) The various homography model hypotheses of the movement will be processed by a RANSAC type algorithm. Although the conventional RANSAC type algorithm may be used, an implementation will now be described using the Pre-emptive RANSAC type algorithm which is better suited for embedded applications, as is the case described here with reference to
(10) Generally speaking, the Pre-emptive RANSAC algorithm operates on successive blocks of a set of first points in the first image and first assumed corresponding points in the second image of a pair. The Pre-emptive RANSAC algorithm notably tests visual homography model hypotheses obtained from a set of second points in the first image and of second assumed corresponding points in the second image. However, in general, these second points are interest points of the image and the set of first (test) points may or may not intersect with the set of second (hypothesis generator) points.
(11) More particular reference is now made to
(12) The first image IM1 is typically the previous image and the second image IM2 the current image. This is followed by an extraction from the first image IM1 of N points or pixels P1.sub.j, j=1 to N, and an extraction of N assumed corresponding points or pixels P2.sub.j from the second image IM2.
(13) This extraction of interest points from an image and of assumed corresponding points from the next video image may be performed with algorithms known to the person skilled in the art. For example, one algorithm is known under the acronym FAST and described, for example, in the article by Edward Rosten and Tom Drummond titled “Machine learning for high-speed corner detection,” ECCV 2006 Proceedings of the 9th European Conference on Computer Vision, Volume 1, Part 1, pages 430-443. Another algorithm is known under the acronym ‘BRIEF’ and described, for example, in the article by Michael Calonder et al. entitled ‘BRIEF: Binary Robust Independent Elementary Features’, ECCV 2010 Proceedings of the 11th European Conference on Computer Vision: Part IV, pages 778-792.
(14) Triplets of points in the first image and triplets of assumed corresponding points in the second image are formed from these points P1.sub.j and P2.sub.j. From these triplets (step 25) K first homography model hypotheses H1.sub.k, k=1 to K of the global movement between the two images IM1 and IM2 are prepared.
(15) These first homography model hypotheses are 3×3 homography matrices obtained, for example, using the DLT (Direct Linear Transform) algorithm described, for example, in the aforementioned essay by Elan Dubrofsky. These first model hypotheses H1.sub.k may be considered as visual model hypotheses since they are obtained from the pixels of the two successive images IM1 and IM2. As a guide, the number K of first model hypotheses H1.sub.k may be between 300 and 500.
(16) Furthermore, the processing means 10 will prepare from the auxiliary information supplied by the gyroscope 30, and optionally the accelerometer or accelerometers 31 and/or magnetometers 32, a second homography model hypothesis H2 that may be designated as an inertial model hypothesis in that it is obtained directly from the auxiliary information delivered by the inertial sensor or sensors.
(17) The types of cellular mobile telephones known as smartphones may be equipped with a gyroscope, an accelerometer and a magnetometer. The same applies to current digital tablets. It is assumed here that only a gyroscope is present.
(18) The gyroscope integrates rotational speeds over the three axes between the capture of the two images and supplies the auxiliary information θ.sub.x, θ.sub.y and θ.sub.z which are the corresponding angles of rotation about the axes x, y and z representing yaw, pitch and roll, respectively. For preparing the inertial 3×3 homography matrix, the processing means must determine the horizontal ΔT.sub.x and vertical translation ΔT.sub.y and the angle of rotation in the plane resulting from the movement of the sensors between the two captured images IM1 and IM2.
(19) In this regard, ΔT.sub.x is given by the formula (1) below:
ΔT.sub.x=θ.sub.x.Math.ρ.sub.x (1)
(20) in which ρ.sub.x is a scaling factor defined by the formula (2) below:
ρ.sub.x=L.sub.x/2 tan.sup.−1(L.sub.x/2f.sub.x) (2)
(21) Similarly, ΔT.sub.y is defined by the formula (3) below:
ΔT.sub.y=θ.sub.y.Math.ρ.sub.y (3)
(22) in which ρ.sub.y is a scaling factor defined by the formula (4) below:
ρ.sub.y=L.sub.y/2 tan.sup.−1(L.sub.y/2f.sub.y) (4)
(23) In formulae (2) and (4) L.sub.x and L.sub.y represent the resolution of the image, f.sub.x and f.sub.y the focal length and x and y refer respectively to the horizontal and vertical directions of the image.
(24) The use of such scaling factors is well known to the person skilled in the art and for all useful purposes the latter may refer to the article by Suya You et al. titled “Hybrid inertial and vision tracking for augmented reality registration,” Virtual Reality, 1999, Proceedings, IEEE 13-17 Mar. 1999, pages 260-267.
(25) The roll angle θ.sub.z directly supplies the planar rotation angle without needing a scaling factor.
(26) The second (inertial) homography model hypothesis may then be represented by the 3×3 matrix H2 defined by the formula (5) below:
(27)
(28) If the telephone is also equipped with accelerometer(s) and/or a magnetometer, the information supplied by the gyroscope is corrected, e.g., by filtering, in a known manner for supplying the auxiliary information.
(29) Based on this, the calculation means or a calculation unit 1004 (
(30) The test module 100 will then proceed with testing various homography model hypotheses, in this case the first hypotheses H1.sub.k and the second hypothesis H2. For this, given that the Pre-emptive RANSAC type algorithm is used, the test module randomly extracts from the set of points P1.sub.j a block of test points BL1A.sub.i and extracts from the set of points P2.sub.j the block of assumed corresponding points BL2A.sub.i, with i varying from 1 to I. As a guide, I may be equal to 20.
(31) In this first iteration, the testing of the various homography model hypotheses takes place on a block of 20 points taken at random from the image IM1 and on the block of assumed corresponding points in the IM2 image. At least some of these test points may or may not be taken from the points used in preparing the various model hypotheses.
(32) For performing this test, first determination means or a first determination unit 1001 (
(33) Then, second determination means or a second determination unit 1002 (
(34) As a guide, this position difference e.sub.i,k corresponding to the number of pixels between the two points may be normalized according to the formula (6) below:
e.sub.i,k=∥BL1AS.sub.i−BL2Ai∥ (6)
(35) in which the notation ∥ ∥ represents the norm function.
(36) Furthermore, third determination means or a third determination unit 1003 (
(37) Furthermore, whenever the position difference e.sub.i,k (for i=1 to I) associated with a hypothesis H1.sub.k is greater than a predefined error ERR, the corresponding piece of score information SCV1.sub.k remains unchanged, whereas it is updated by the formula (7):
SCV1.sub.k=SCV1.sub.k+1 (7)
(38) if the position difference e.sub.i,k is less than or equal to said error ERR.
(39) Updating the score information SCV2 associated with the model hypothesis H2 is performed in the same way.
(40) Once the I points BL1A.sub.i have been processed, for each first homography model hypothesis H1.sub.k and for the second homography model hypothesis H2, the first updated pieces of score information SCV1.sub.k and SCV2 are therefore obtained. This can be described as visual score information since they have been obtained using the points contained in the two images IM1 and IM2. Then, the correction means or correction unit 1005 (
SCV1C.sub.k=SCV1.sub.k−CORR.sub.k (8)
(41) The second piece of score information SCV2C associated with the inertial model hypothesis H2 is simply equal to the corresponding visual piece of score information SCV2 since the correction coefficient applied thereto is zero.
(42) In a next step 28, the test module performs, for example, a dichotomy on the model hypotheses H1.sub.k and H2 which have just been tested. More precisely, the test module only keeps half of the tested model hypotheses which have had the highest second pieces of score information.
(43) Then, the test module again performs a test 29 on these remaining model hypotheses using a new block of points BL1B.sub.i from the first image, drawn at random from the points not already tested, and the assumed corresponding block of points BL2B.sub.i from the second image IM2. The operations that have just been performed are repeated either until a single remaining model hypothesis HF is obtained, or until the tested points are exhausted.
(44) In the first case, the remaining model hypothesis HF then represents the model of global movement between the two images IM1 and IM2. In the second case, the hypothesis HF that will be adopted is that which displays the highest second piece of score information.
(45) Reference is now made more particularly to
(46) The first homography model hypothesis H1.sub.k is a 3×3 matrix as defined by the expression (9) below:
(47)
(48) The matrix H2 is that illustrated by the formula (5) above. Since the two matrices have the same structure, the coefficients a3 and a6 of the matrix H1.sub.k respectively represent translations in x and y while the coefficient a2 represents the sine of the angle of rotation in the plane.
(49) As a result, a particularly simple way of determining the distance d.sub.k (H1.sub.k,H2) between the two model hypotheses is to use the formula (10) below:
d.sub.k(H1.sub.k,H2)=[(a3−ΔT.sub.x).sup.2+(a6−ΔT.sub.y).sup.2+(arcsin(a2)−θ.sub.z).sup.2].sup.1/2 (10)
(50) It should be noted, of course, that the distance d(H2,H2) is obviously zero. The calculation means then determine (step 241) a first coefficient (c1.sub.k) defined by the formula (11) below:
c1.sub.k=1−e.sup.d.sup.
(51) in which e denotes the exponential function. Of course, the first coefficient associated with the second model hypothesis H2 is zero.
(52) The calculation means then determine (step 242) a weighting coefficient λ representative of a weight of the score information associated with the second homography model hypothesis H2 with respect to the score information associated with the tested homography model hypothesis H1.sub.k or H2. The manner of determining this weighting coefficient will be returned to in more detail below.
(53) The calculation means then determine (step 243) the corrective element CORR.sub.k via the formula (12) below:
CORR.sub.k=N.Math.λ.Math.c1.sub.k (12)
(54) in which N denotes the number of points tested, i.e., the number of points P1.sub.j and the number of points P2.sub.j (j=1 to N), (
(55) The lower the weighting coefficient λ, the greater the weight of the visual score of the first model hypotheses H1.sub.k will be with respect to the inertial score of the inertial model hypothesis H2. Conversely, the higher the weighting coefficient λ, the less the weight of the visual score of the first model hypotheses H1.sub.k will be with respect to the inertial score of the inertial model hypothesis H2.
(56) The person skilled in the art will be able to determine the weighting coefficient λ according to the envisaged application. However, a fixed and constant value λ equal to 1 for all the homography model hypotheses is a good compromise.
(57) It is quite possible to keep this fixed and constant value λ equally for all the successive image pairs. However, as a variation, in order to further improve the quality of the filmed video sequence, it is possible, as illustrated schematically in
(58) More precisely, for each current pair of images PP.sub.p, the calculation means determine (step 242) the value of the weighting coefficient λ.sub.p which will, however, remain the same for all the tested model hypotheses associated with these two images of the current pair PP.sub.p.
(59) An example of calculating the weighting coefficient λ.sub.p is illustrated in
(60) Then, the calculation means in step S20 calculate the first corresponding coefficients c1.sub.k (see step 241 in
(61) The weighting coefficient λ.sub.p is then defined via the formula (13) below:
λ.sub.p=2e.sup.−cm.sup.
(62) Such a variable coefficient λ.sub.p between the various images makes it possible for the internal movement of an object within the image not to be too dominant with respect to the background. Thus, for example, when a truck passes through the field of the camera and occupies almost all this field, the stabilization of the image on the truck is minimized, thus minimizing or reducing the movement of the background.
(63) Overall, the invention also makes it possible, for example, when a black dot is filmed in the center of a white wall, to only have a slight oscillation of the black dot due to the imprecision of the inertial sensor or sensors.