Method, device, and system for pre-processing a video stream for subsequent motion detection processing

09628751 ยท 2017-04-18

Assignee

Inventors

Cpc classification

International classification

Abstract

There is provided a method for pre-processing a video stream for subsequent motion detection processing. The method comprises receiving a video stream of images, wherein each image in the video stream is represented by a first plurality of bits; enhancing the video stream of images by, for each image in the video stream: comparing the image to at least one previous image in the video stream so as to identify pixels where the image differs from the at least one previous image in the video stream, enhancing the image in those pixels where the image differs from the at least one previous image in the video stream; and converting the enhanced video stream of images so as to produce a converted video stream of images for subsequent motion detection processing, wherein each image in the converted video stream is represented by a second plurality of bits being lower than the first plurality of bits.

Claims

1. A method for motion detection processing of a video stream, comprising: receiving a video stream of images, wherein each image in the video stream is represented by a first plurality of bits, enhancing the video stream of images by, for each image in the video stream: comparing the image to at least one previous image in the video stream so as to identify pixels where the image differs from the at least one previous image in the video stream, enhancing the image in those pixels where the image differs from the at least one previous image in the video stream, converting the enhanced video stream of images so as to produce a converted video stream of images, wherein each image in the converted video stream is represented by a second plurality of bits being lower than the first plurality of bits, and applying motion detection processing to the converted video stream of images.

2. The method of claim 1, wherein each pixel of the images in the video stream is represented by a first number of hits, and the step of converting the enhanced video stream of images comprises converting the enhanced video stream of images such that each pixel of the images in the converted video stream is represented by a second number of bits being lower than the first number of bits.

3. The method of claim 1, wherein, in the step of comparing the image to at least one previous image in the video stream, the image is compared to an image formed from the at least one previous image in the video stream.

4. The method of claim 1, wherein, in the step of enhancing the image, an offset is added to a pixel value in those pixels where the image differs from the at least one previous image in the video stream.

5. The method of claim 1, wherein, in the step of enhancing the image, a pixel value in those pixels where the image differs from the at least one previous image in the video stream is multiplied by a gain factor.

6. The method of claim 1, wherein in the step of enhancing the image, the image is further enhanced in a surrounding of those pixels where the image differs from the at least one previous image in the video stream.

7. The method of claim 1, wherein the step of enhancing the video stream of images further comprises, for each image in the video stream: noise filtering the image in those pixels where the image does not differ from the at least one previous image in the video stream.

8. The method of claim 7, wherein the noise filtering comprises temporally averaging the image using the at least one previous image in the video stream.

9. The method of claim 1, further comprising providing the converted video stream of images to a motion detection processing device.

10. A computer program product comprising a non-transitory computer-readable medium with computer code instructions for carrying out the method according to claim 1.

11. A device, comprising: a receiver configured to receive a video stream of images, wherein each image in the video stream is represented by a first plurality of bits, a video stream enhancing component configured to enhance the video stream of images by, for each image in the video stream: comparing the image to at least one previous image in the video stream so as to identify pixels where the image differs from the at least one previous image in the video stream, enhancing the image in those pixels where the image differs from the at least one previous image in the video stream, a converting component configured to convert the enhanced video stream of images so as to produce a converted video stream of images, wherein each image in the converted video stream of images is represented by a second plurality of bits being lower than the first plurality of bits, and one of: a transmitting component configured to transmit the converted video stream of images via a network to a motion detection device for subsequent motion detection processing; and a motion detection processing component configure to perform motion detection processing on the converted video stream.

12. A system for motion detection processing of a video stream, comprising: a camera configured to capture a video stream of images, a device according to claim 11, the device being configured to receive the video stream of images captured by the camera.

13. The system of claim 12, further comprising: a motion detection processing device configured to receive a converted video stream from the device, and apply motion detection processing to the converted video stream.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above, as well as additional objects, features and advantages of the present invention, will be better understood through the following illustrative and non-limiting detailed description of preferred embodiments of the present invention, with reference to the appended drawings, where the same reference numerals will be used for similar elements, wherein:

(2) FIG. 1 illustrates a system for pre-processing of a video stream of images for subsequent motion detection processing according to embodiments.

(3) FIG. 2 illustrates a system for pre-processing of a video stream of images for subsequent motion detection processing according to other embodiments.

(4) FIG. 3 is a flowchart of a method for pre-processing of a video stream of images for subsequent motion detection processing according to embodiments.

(5) FIG. 4 schematically illustrates enhancement of a video stream of images according to embodiments.

(6) FIG. 5 schematically illustrates combined noise filtering and enhancement of a video stream of images according to embodiments.

DETAILED DESCRIPTION OF EMBODIMENTS

(7) The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. The systems and devices disclosed herein will be described during operation.

(8) FIG. 1 illustrates a system 100 for pre-processing a video stream, i.e. a sequence of images, for subsequent motion detection processing. The system 100 comprises a camera 120 and a device 140 for pre-processing a video stream of images captured by the camera 120. The system 100 may further comprise a motion detection processing device 160. The motion detection device 160 may for example be connected to the pre-processing device 140 via a network 180.

(9) The camera 120 may generally be any type of digital camera which is capable of capturing a video stream of images of a scene. The camera 120 may work according to different principles. For example, the camera 120 may be a visible light camera or a thermal camera.

(10) The camera 120 is operatively connected, via wire or wirelessly, to the pre-processing device 140. The pre-processing device 140 may be physically separate from the camera 120 as illustrated in FIG. 1, or may be integrated in the camera 120.

(11) The pre-processing device 140 may comprise a receiving component 142, an enhancement component 144, and a converting component 146. The pre-processing device 140 is, via the receiving component 142, configured to receive a video stream of images 130 from the camera 120. The enhancement component 144 is generally configured to enhance the received video stream of images 130 to generate an enhanced video stream 135. Further, the converting component 146 is configured to convert, such as compress or reduce bit depth of, the enhanced video stream 135 of images so as to output a converted video stream of images 150. In particular, the images in the received video stream 130 may be represented by a first plurality of bits, and the images of the converted video stream 150 may be represented by a second, lower, plurality of bits due to the conversion. For example, the pixels in the images of the video stream 130 may be represented by 16 bits, while the pixels in the images of the converted video stream 150 may be represented by 8 bits. The conversion may thus be made from a first bit environment (a 16-bit environment) to a second bit environment (an 8-bit environment). The enhancement component 144 operates on images in the first, higher, bit environment, whereas the converted video stream 150 which is subsequently input to the motion detection processing device 160 is represented in the second, lower, bit environment. The conversion of the images from the first to the second bit environment enables streaming of the converted video stream 150 over the network 180.

(12) The converted video stream of images 150 may subsequently be transmitted to a motion detection device 160. For that purpose the pre-processing device 140 may comprise a transmitting component (not shown). The transmitting component may e.g. be configured to transmit the converted video stream of images via a network 180. The network 180 may be any type of network being suitable for the purpose, such as a local area network or a wide area network.

(13) The network 180 may typically have a limited bandwidth, and hence there is a limitation in the available bit rate. For example, the available bit rate may not be high enough to enable transmission of the video sequence 130. However, thanks to the conversion carried out by the converting component 146, the bit rate of the video sequence 130 is reduced to a bit rate which falls within the limits of the available bit rate.

(14) The pre-processing device 140, and the components thereof, may be implemented in software or in hardware, or a combination thereof. In particular, the pre-processing device 140 may comprise a processor and a memory. The memory may act as a (non-transitory) computer-readable storage medium or device for storing computer code instructions which, when executed by the processor, are adapted to carry out any method disclosed herein.

(15) The motion protection processing device 160 may generally be any device which is adapted to receive a video stream of images and to perform motion detection processing on the received video stream according to any known method. For example, the motion detection processing device 160 may implement any commercially available motion detection engine.

(16) FIG. 2 illustrates an alternative system 200 for pre-processing a video stream for subsequent motion detection processing. The system 200 comprises a camera 220 and a pre-processing device 240. The difference between the system 200 and the system 100 of FIG. 1 is that the pre-processing device 240 includes a motion detection processing component 248 replacing the motion detection device 160. In other words, the motion detection processing is in system 200 carried out by the pre-processing device 240, whereas in system 100 the motion detection processing is carried out by a separate device.

(17) The operation of the systems 100 and 200, and in particular of the pre-processing device 140, 240, will now be described with reference to FIG. 1, FIG. 2, FIG. 4 and the flow chart of FIG. 3.

(18) In step S02, the pre-processing device 140, 240, receives, via receiver 142, 242 a video stream of images 130 from the camera 120, 220. FIG. 4 illustrates a part {I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t} of such a video stream of images 130. Here, I.sub.t represents a current image in the video stream, and I.sub.t-3, I.sub.t-2, I.sub.t-1 are previous images in the received video stream 130. Here three previous images are illustrated. However, generally the number of previous images may take other values.

(19) The received images {I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t} may include stationary objects 402 and moving objects 404 (as indicated by the arrow). In the illustrated video stream, a moving object 404 is moving to the right. There may further be noise in the received images {I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t}, here indicated by a dotted background pattern.

(20) In step S04, the enhancement component 144, 244 enhances the video stream of images 130. In more detail, for each image I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t in the video stream 130, the enhancement component 144, 244, may carry out a number of substeps S04a, S04b, S04c. This procedure is described in the following with respect to image I.sub.t.

(21) In step S04a, the enhancement component 144, 244, compares the image I.sub.t to at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 in the video stream 130. In the embodiment of FIG. 4, the image I.sub.t is, for reasons of illustration, compared to the previous image I.sub.t-1. However, more generally, the image I.sub.t may be compared to an image formed from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 such as a mean value or a temporal filtration of the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. This will be explained in more detail later on with reference to FIG. 5. In more detail, the enhancement component 144, 244 may compare the image I.sub.t to the previous image I.sub.t-1 pixel-wise, e.g. by calculating differences between intensity values of individual pixels. If the difference (or the absolute value of the difference) in intensity values between the image I.sub.t and the at least one previous image I.sub.t-1 exceeds a threshold in a pixel, then the enhancement component 144, 244 may determine that the image differs from the previous image in that pixel. In this way, the enhancement component may identify pixels where the image I.sub.t differs from the previous image I.sub.t-1. This is further illustrated by image D.sub.t in FIG. 4, which in black shows pixels 406 where image I.sub.t differs from image I.sub.t-1 by more than a certain threshold. The differences between image I.sub.t and the previous image I.sub.t-1 may be found where there is motion in the images I.sub.t-1, I.sub.t, and in particular around the edges of the moving object 404. Since the moving object 404 in this case moves to the right, the differing pixels will show up at the front, right, edge and the back, left, edge of the moving object 404.

(22) The enhancement component 144, 244 then proceeds to enhance the image I.sub.t in those pixels 406 where image I.sub.t differs from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 (here represented by I.sub.t-1) in the video stream 130. The enhancement typically involves applying a gain to the intensity values of pixels 406 in image I.sub.t that differ from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. This may e.g. include adding an offset value to the pixel value of image I.sub.t, and/or multiplying the pixel value of image I.sub.t by a gain factor. This is further illustrated by the enhanced image I.sub.t.sup.e of FIG. 4 where the image I.sub.t is enhanced in pixels 408 corresponding to the pixels 406 where the image I.sub.t differs from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. As may be seen in FIG. 4, the moving object 404 appears to be larger in the enhanced image in comparison to the original image I.sub.t, thus making the object easier to detect during subsequent motion detection processing.

(23) According to further examples, the enhancement component 144, 244 may further enhance the image I.sub.t in a surrounding of those pixels 406 where the image I.sub.t differs from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. For example, the enhancement component 144, 244 may extend the pixel region 406 in image D.sub.t to also include surrounding pixels, such as adding a nn neighbourhood, with n=1, 3, 5 etc., around each pixel in the region 406. In that way, a frame of pixels will be added to the pixel region 406.

(24) In some embodiments, the enhancement component 144, 244 may further perform noise filtering of the image I.sub.t. In particular, the enhancement component 144, 244, may apply a noise filter to those pixels in the image I.sub.t which do not differ from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. Such pixels corresponds to the white portions 410 of the image D.sub.t. The noise filter may generally be any type of noise filter used in the art. By way of example, it may be a temporal filter which filters noise based on the image I.sub.t and the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1, for instance by temporally averaging (or forming weighted averages) the image I.sub.t and the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. In FIG. 4, I.sub.t.sup.e,noise represents an image which is enhanced in pixel regions 406 where the image I.sub.t differs from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 and which is noise filtered in pixel regions 410 where the image I.sub.t does not differ from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1, as represented by the less dense dotted background pattern in I.sub.t.sup.e,noise.

(25) The enhanced image I.sub.t.sup.3 or I.sub.t.sup.e,noise is then subject to conversion by the converting component 146, 246. The purpose of the conversion is to convert the video stream of enhanced images 135 to a format which is suitable for transmittal (streaming) over the network 180 and/or which is suitable for a standard, commercially available, motion detection processing engine. In more detail, the images I.sub.t.sup.e, I.sub.t.sup.e,noise of the enhanced video stream are represented by a first plurality of bits. The conversion component 146, 246 converts the images I.sub.t.sup.e, I.sub.t.sup.e,noise of the enhanced video stream to a video stream in which each image is represented by a lower number of bits.

(26) For example, the conversion component 146, 246 may convert the images I.sub.t.sup.e, I.sub.t.sup.e,noise of the enhanced video stream by performing video compression according to any known method, thereby decreasing the amount of bits required to represent each image in the video stream.

(27) According to other examples, the converting component 146, 246 converts the enhanced video stream by reducing the bit depth of the images I.sub.t.sup.e, I.sub.t.sup.e,noise in the enhanced video stream. More specifically, each pixel of the images of the video stream 130 and thereby also the images of the enhanced video stream 135 may be represented by a first number of bits, such as 16 bits, referred to herein as the bit depth. The converted images may have a lower bit depth, such as 8 bits. The converting component 146, 246 may reduce the bit depth in any known manner. For example, consider the situation where a pixel before conversion is represented by 16 bits. This means that the value in the pixel before conversion may take 2.sup.16 different values. The conversion may then proceed to map the first 2.sup.8 of these values to a first converted value, the following 2.sup.8 values to a second converted value etc. In this way, the pixels in the converted image may take 2.sup.8 different values, and may hence be represented by 8 bits. This approach may of course be generalized to a situation where the pixels before and after the conversion are represented by arbitrary numbers of bits.

(28) It may also be the case that at least the converted video stream has a plurality of color channels. The conversion component 146, 246 may then use the plurality of color channels such that the pixels 406 where the image I.sub.t differs from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 are given different weights in the different color channels in comparison to pixels 410 where the image I.sub.t does not differ from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1. In this way, pixels corresponding to a detected motion may be given a different color, i.e. be color-coded, in comparison to pixels where no motion is detected.

(29) Consider the case where the images I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t in the video stream 130 are gray-scale images, i.e. where each pixel of the images in the video stream 130 are represented by a first number of bits in a single color channel, and the images in the converted video stream are color images, i.e. where each pixel of the images in the converted video stream are represented by a second number of bits divided between a plurality of color channels. For example, the images before conversion may be represented by 16 bits in a one color channel, and the images after conversion may be represented by 8 bits in each of three color channels, such as a red, a green, and a blue color channel.

(30) For pixels 410 for which no difference was identified, the converting component 146, 246 may assign the same value in all color channels. In that way, those pixels 410 will look gray in the images of the converted video sequence.

(31) For pixels 406 for which a difference was identified, the converting component 146, 246 may assign different values to some of the plurality of color channels in the converted image. For example, those pixels 406 may only be coded in one of the color channels, such as the green channel, or in two of the color channels. Alternatively, different weights may be applied to the different color channels so as to obtain a distribution between the color channels.

(32) It is to be noted that the opposite is also possible, such that pixels 406 for which a difference was identified are coded in gray-scale, i.e. are assigned the same value in all color channels, and that pixels 410 for which no difference was identified are coded in color, i.e. are assigned different values in at least some color channels.

(33) It may be the case that also the images I.sub.t-3, I.sub.t-2, I.sub.t-1, I.sub.t in the video stream 130 are color images, i.e. that each pixel of the images in the video stream 130 are represented by a first number of bits divided between a plurality of color channels. For example, the images in the video stream 130 may be represented by 16 bits in each of three color channels, and the images after conversion may be represented by 8 bits in each of three color channels.

(34) For pixels 410 for which no difference was identified, the converting component 146, 246 may keep the balance between the color channels, i.e. for such pixels 410 the distribution between the color channels is the same in the images of the video stream 130 and the converted video stream 150.

(35) For pixels 410 for which a difference is identified, the converting component 146, 246 may modify the balance between the color channels, i.e. for such pixels 410 the distribution between the color channels in the converted video stream 150 is modified in comparison to the distribution between the color channels in the video stream 130. For example, the converting component 146, 246 may convert the video stream 130 such that moving objects in the images of the converted video stream 150 are a bit more red than the color in remaining parts of the images.

(36) In step S08, the motion detection processing device 160, or the motion detection processing component 248 may perform motion detection on the converted video stream 150. As further discussed above, the motion detection processing may be carried out according to any known method in the art.

(37) In the embodiment of FIG. 4, the enhancement component 144, 244, in step S04a, compared the image I.sub.t to the previous image I.sub.t-1 in order to identify pixels where the image I.sub.t differs from the previous image I.sub.t-1. An embodiment where the enhancement component 144, 244 compares the image I.sub.t to an image formed from the at least one previous image I.sub.t-3, I.sub.t-2, I.sub.t-1 in the video stream 130 will now be explained with reference to FIG. 5.

(38) The enhancement component 144, 244 may work in an iterative manner, where each iteration corresponds to a point in time t. In each iteration, the enhancement component 144, 244 may form an image M.sub.t from images I.sub.t-n, . . . , I.sub.t-1, I.sub.t, where n is a predefined number. The iteration may be initiated by setting M.sub.0=I.sub.0.

(39) In the t:th iteration, the enhancement component 144, 244, may form the image M.sub.t by first identifying the differences between image I.sub.t and an image M.sub.t-1 determined in the previous iteration on basis of images I.sub.t-1-n, . . . , I.sub.t-2, I.sub.t-1, cf. step S04a. This identification may be done in accordance to what was described in conjunction with FIG. 4, e.g. by calculating differences and comparing the differences to a threshold. The result of such identification of differences is illustrated by image D.sub.t of FIG. 5. D.sub.t shows pixels 506 where differences are identified in black, and pixels 510 where no differences are identified in white.

(40) The enhancement component 144, 244 may then determine image M.sub.t in pixel p according to:

(41) M t ( p ) = { f ( I t - n ( p ) , .Math. , I t - 1 ( p ) , I t ( p ) ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. < T I t ( p ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. T ,
where is a function, such as a filtration or a (weighted) mean value, of the value of the corresponding pixel p in images I.sub.t-n, . . . , I.sub.t-1, I.sub.t, and T is a threshold value. Expressed in words, M.sub.t is hence formed by temporally filtering images I.sub.t-n, . . . , I.sub.t-1, I.sub.t in pixels 510 where no difference is found between the image I.sub.t and image M.sub.t-1 from the previous iteration (cf. step S04c), and by keeping the value of image I.sub.t in pixels 506 where a difference is found. Since the filtering typically is an averaging operation, the image M.sub.t is thus a noise-filtered version of I.sub.t in pixels 510 where no differences are identified (i.e. no motion is detected) and equal to I.sub.t in pixels 506 where differences are identified (i.e. where motion is detected).

(42) The enhancement component 144, 244 may further base the enhancement performed in step S04b on the image M.sub.t. In more detail, the enhancement component 144, 244, may enhance the image M.sub.t in pixels where a difference was detected so as to generate en enhanced image I.sub.t.sup.e,noise according to:

(43) I t e , noise ( p ) = { M t ( p ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. < T Enhancement ( M t ( p ) ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. T = { f ( I t - n ( p ) , .Math. , I t - 1 ( p ) , I t ( p ) ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. < T Enhancement ( I t ( p ) ) , if .Math. I t ( p ) - M t - 1 ( p ) .Math. T .
For pixels where no difference is detected between I.sub.t and M.sub.t-1, the enhanced image I.sub.t.sup.e,noise is thus a temporal filtration, such as a noise filtration, of images I.sub.t-n, . . . , I.sub.t-1, I.sub.t, and in pixels where a difference is detected between I.sub.t and M.sub.t-1, the enhanced image I.sub.t.sup.e,noise is an enhancement of image the image I.sub.t. As described with reference to FIG. 4, the enhancement may e.g. include adding a (pre-defined) offset to the pixel value, and/or multiplying the pixel value by a gain factor.

(44) With this approach, the enhancement may thus be conveniently combined with, i.e. performed in the same step, as noise filtering.

(45) It will be appreciated that a person skilled in the art can modify the above-described embodiments in many ways and still use the advantages of the invention as shown in the embodiments above. Thus, the invention should not be limited to the shown embodiments but should only be defined by the appended claims. Additionally, as the skilled person understands, the shown embodiments may be combined.