Method for processing a stream of video images

10867211 ยท 2020-12-15

Assignee

Inventors

Cpc classification

International classification

Abstract

A method for processing a stream of video images to search for information therein, in particular detect predefined objects and/or a motion, comprising the steps of: a) supplying at least one attention map in at least one space of the positions and of the scales of at least one image of the video stream, b) selecting, in this space, points to be analyzed by making the selection depend at least on the values of the coefficients of the attention map at these points, at least some of the points to be analyzed being selected by random draw with a probability of selection in the draw at a point depending on the value of the attention map at that point, a bias being introduced into the map to give a non-zero probability of selection at any point, c) analyzing the selected points to search therein for said information, d) updating the attention map at least for the processing of the subsequent image, from at least the result of the analysis performed in c), e) reiterating the steps a) to d) for each new image of the video stream and/or for the current image on at least one different scale.

Claims

1. A method for processing a stream of video images to search for information therein to detect predefined objects and/or a motion, the method comprising the steps of: a) supplying an attention map in at least one space of positions and of scales of at least one image of the stream of video images; b) selecting, in this at least one space, points to be analyzed by making the selection depend at least on values of coefficients of the attention map at the points, at least one or more of the points to be analyzed being selected by a random draw with a probability of selection in the random draw at a point depending on a value of a coefficient of the attention map at the point, a bias being introduced into the attention map to give a non-zero probability of selection at any point; c) analyzing the selected points to search therein for said information; d) updating the attention map at least for processing of a subsequent image, from at least a result of the analysis performed in c); e) reiterating the steps a) to d) for each new image of the video stream and/or for a current image on at least one different scale, wherein a computation of the coefficients of the attention map depends on a result of at least one preceding detection and/or on values of coefficients of at least one preceding attention map, and wherein, in the updating of the attention map, a coefficient of the attention map at a point is given a value that becomes higher as the point approaches, in the at least one space of the positions and of the scales, a positive detection.

2. The method according to claim 1, wherein the attention map is initialized by giving a same value for all the points, for a given detection scale, the same value being equal to the bias.

3. The method according to claim 1, wherein the steps a) through e) are applied to a detection of pedestrians.

4. The method according to claim 3, wherein the coefficients of the attention map take one of (i) an extreme value which forces the selection in each region of interest, and (ii) the value of the bias.

5. The method according to claim 3, wherein a binary mask per detection scale is generated in the step b) from the attention map for this detection scale, the binary mask being applied to at least one image of the stream of video images to be analyzed in the step c), the analysis being performed on only pixels that are not masked, all pixels of the binary mask being initialized with a same value corresponding to an absence of masking.

6. The method according to claim 3, wherein, as a function of the result of the detection for a given image on a given scale, at least one region of interest in the image on the scale is defined, and, for the processing of the subsequent image, the attention map on the scale is updated on the basis of the region of interest by adjusting all pixels of the region of interest on the scale to a value greater than the bias.

7. The method according to claim 6, wherein the detection is positive in at least two nearby regions of interest, the method further comprising merging of the regions of interest and a corresponding updating of the coefficients of the attention map for the processing of the subsequent image.

8. The method according to claim 3, wherein the detection is positive at at least one point in the at least one space of the positions and of the scales for a given image, the method further comprising generation of a wider region of interest relative to dimensions of an analysis window given by the scale on which the analysis in step c) is performed, and a corresponding updating of the coefficients of the attention map for the processing of the subsequent image.

9. The method according to claim 8, wherein the wider region of interest being determined by morphological expansion.

10. The method according to claim 9, wherein parameters of the morphological expansion are fixed.

11. The method according to claim 9, wherein parameters of the morphological expansion are dynamic depending on a size of the wider region of interest or on a speed of motion of an object.

12. The method according to claim 1, wherein steps a) through e) are applied to motion detection.

13. The method according to claim 1, wherein the attention map is calculated according to the following formula:
attention_map(t+1)=max(probability.sub.bias,
temporal_filter(proximity_function(algo_output(i)).sub.i<t)).

14. A non-transitory computer readable medium having instructions stored therein, which when executed by a computer cause the computer to execute the method according to claim 1.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) The invention will be able to be better understood on reading the following detailed description, of nonlimiting exemplary implementations thereof, and on studying the attached drawing, in which:

(2) FIG. 1, previously described, corresponds to the prior art,

(3) FIG. 2 illustrates the concept of space of positions and of scales on an image,

(4) FIG. 3 is a block diagram illustrating an example of a method according to the invention,

(5) FIGS. 4A and 4B are two examples of images extracted from a video stream, in the context of the application of the invention to the detection of objects, on which the outline of the detected objects and of the areas of interest have been traced,

(6) FIG. 5 is another example of an image in the case of the application of the invention to motion detection, and

(7) FIG. 6 represents the attention map corresponding to the image of FIG. 5.

(8) An example of a processing method according to the invention, intended to process a video stream V, will be described with reference to FIG. 3.

(9) It concerns, for example, a video stream originating from video-surveillance cameras, and the aim is to search in the images of this stream for a given item of information, for example find an object having predefined characteristics, such as a pedestrian. As a variant, it concerns motion detection.

(10) The method comprises a detection engine 10 which supplies a detection result 11. The detection engine may use different detection techniques, on different detection scales, depending on whether the aim is to detect an object such as a pedestrian for example or to perform a motion detection. The detection engine 10 corresponds to an algorithm implemented in a microcomputer or a dedicated processor.

(11) Among the detection techniques that may be used in the context of the detection of an object in particular, ACF (Aggregated Channel Features), DPM (Deformable Part Models), deep learning and others may be cited.

(12) The article Fast Feature Pyramids for Object Detection by Piotr Dollar et al describes, in SUBMISSIONS TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014 September, examples of techniques that may be used.

(13) The article Fast Human Detection for Intelligent Monitoring Using Surveillance Visible Sensors by Byoung Chul Ko et al, published in Sensors 2014, 14, 21247-21257, discloses performing a detection of pedestrians by determining an optimal scale factor through the use of adaptive regions of interest.

(14) The result of the detection, namely the presence of predefined objects in the images or the presence of a motion, may be, according to the applications, sent at point 12 to a higher level system, in order for example to process these objects with a view to identifying them.

(15) Whether it be for the detection of objects or of motion, the method according to the invention relies on the use of at least one attention map in a given space of detection positions and scales. Attention map denotes a matrix whose coefficients are associated with points of the space of the positions and of the scales. The value of each coefficient is representative of the attention that should be paid to the detection algorithm at this point, in other words, a higher attention at the point where the information is likely to be located given the result of the analysis of the images previously performed, compared to the locations on the image where there is a low probability that the information sought is located in light of this result. This higher attention is reflected by a greater frequency of analysis of the pixels concerned.

(16) The method comprises a step of updating each attention map for a given detection scale in light of the result 11 of the detection on that scale, this updating being able to be performed also if necessary by taking into account values previously taken by the map in the processing of the preceding images.

(17) All the coefficients of the attention map may have been initialized with one and the same value, for example a non-zero bias h between 0 and 1, excluding bounds.

(18) The updating of the attention map in the step 14 in FIG. 3 is performed according to learned data 15. The learning of these data may be performed in various ways. It involves teaching the system where the probability of finding the information sought is the greatest given the nature and/or the location of the detected objects and their motion, as appropriate.

(19) Referring to the example of FIG. 4A, which concerns the detection of objects, in the space of pedestrians, the detected objects have been marked on the image. These objects are delimited by rectangles 16, the long sides of which are vertical.

(20) The updating of the attention map comprises the updating of the value of the coefficients of the attention map which correspond in this example to the pixels encompassed by these rectangles and which are analyzed.

(21) Advantageously, in the example of the detection of pedestrians, wider regions of interest around the detected objects are defined to take account of the motion of these objects on the image, and thus ensure that, on the subsequent image, the analysis is focused preferentially on these regions.

(22) The form of the wider regions of interest may result from a learning, and take account of the nature of the objects and/or of their motion.

(23) The wider regions of interest may be determined by subjecting the detected objects to a mathematical transformation, such as a morphological expansion for example.

(24) FIG. 4A shows the outline 17 of the wider regions of interest. If, in computing the wider regions of interest associated with the different detected objects, area overlaps or nearby areas are obtained, these areas may be merged into a single area, which is the case with the area situated on the right in FIG. 4A. It may be seen that each wider region of interest occupies a surface area equal to several, for example at least three, times that of the object or objects contained within.

(25) FIG. 4B represents an image originating from the same camera at a different instant. It may be seen that the wider regions of interest remain centered on the detected pedestrians.

(26) The attention map has its coefficients in the wider regions of interest updated. A higher value is given to a coefficient to reflect a higher probability of the pixel associated with this coefficient of the attention map containing the information sought. All the coefficients of the attention map correspond to regions of interest that may, for example, take an extreme value c, for example a maximum value, and for example a value equal to 1, to force the detection at these points.

(27) Several attention maps are thus updated after the processing of each image of the stream, given that there is, in the example considered, a map for each detection scale.

(28) The next step is to ensure that the pixels situated in the areas of interest are analyzed more often than those outside of these areas.

(29) However, areas outside of the regions of interest are regularly observed, to detect new objects which might just have appeared therein.

(30) For that, a random draw 20 is performed for each detection scale as illustrated in FIG. 3, and, on the basis of this draw and of the attention map, a binary mask 21 is generated which will determine the areas where the detection will be performed, all the pixels of this mask being, in this example, initially at 1 to ensure that the initial detection 10 is applied to all the pixels of the image.

(31) A random draw between 0 and 1 is for example conducted, and the value of this draw is compared to the value of the attention map at a point. By assuming for example that the bias b is 0.2, that the value of the coefficients of the attention map in the regions of interest is maximum and has the value c=1, the binary mask takes the value 1 when the draw is greater than the value of the coefficient of the attention map, which means that the corresponding pixel of the image is analyzed in the step 10. For example, by assuming a draw equal to 0.5, for a coefficient of the map corresponding to a pixel situated out of an area of interest, equal to 0.2, the mask takes the value 0 because, the value of the coefficient is lower than the draw; the corresponding pixel of the image is not analyzed in the step 10; for a draw equal to 0.1, the pixel is analyzed because the value of the coefficient is greater than the draw. For a coefficient of the attention map corresponding to a pixel situated in an area of interest, the draw is always less than 1 and the pixel will always be analyzed in the step 10. A pixel situated outside of an area of interest therefore leads to a binary mask which statistically will take the value 0 more often than a pixel situated in an area of interest. Thus, the pixels situated in the regions of interest will be analyzed on average more frequently than the others. The draw may be performed for all the pixels, but the decision is dependent on the attention map. The bias guarantees that there is no loss in detection. The value of the bias h conditions the latency time, that is to say the number of images which will be analyzed on average without a given pixel situated outside of an area of interest being analyzed. For example, this latency time is approximately 5 in the context of the detection of pedestrians for a video supplying 25 images/s; that means that, in the area of the image corresponding to the lawn bottom left in FIGS. 4A and 4B, a pixel is analyzed on average only every 5 images; it is understood there is a gain in efficiency in the processing, since a pointless analysis is avoided in the detection step 10 in an area where there is a low probability of a pedestrian moving around, the analysis being concentrated automatically on the regions where the probability of detecting pedestrians is highest from one image to another.

(32) When the method is applied to the detection of motion on the image, the computation of the coefficients of the attention map takes account of a motion probability map, as illustrated in FIG. 5. In this figure, there are objects 16 composed of two moving vehicles appearing on the image. FIG. 6 represents the motion probability map, computed from several preceding images of the video stream, from the response of each of the pixels. It may be seen that the detected motion probability is high at the vehicle level, and zero elsewhere.

(33) The attention map may be computed from this motion probability map and from a transfer function, for example as follows:
map.sub.attention=max(probability.sub.bias,expansion(map.sub.motion))

(34) The expansion concerned is for example morphological expansion.

(35) Where the expansion is zero, because of being too far from the object, the value of the bias b is taken as value for the coefficient of the attention map. Where the value resulting from the expansion is greater than the bias b, this greater value is taken.

(36) Obviously, the invention is not limited to the examples which have just been described.

(37) The invention may in particular be applied to video streams other than those originating from surveillance cameras, for example a camera equipping a vehicle for the purpose of pedestrian avoidance.