IR or thermal image enhancement method based on background information for video analysis

10452922 ยท 2019-10-22

Assignee

Inventors

Cpc classification

International classification

Abstract

An image enhancement method for video analysis or automatic video surveillance systems has at least one image acquisition device, through which an IR thermal spectrum image is captured from an area of space, a scene calibration system and a detection system, through which at least one object type is detected. The method includes at least one processing stage in which the contrast of the captured image is enhanced by the image acquisition device through the image's depth or scene information obtained directly or indirectly by the scene calibration system or entered manually by the user.

Claims

1. An image enhancement method for video analysis or automatic surveillance system, comprising; at least one image acquisition device through which an IR or thermal spectrum input image of an area of space is captured, the input image being digitized by the image acquisition device or an image digitizing system, a scene calibration system, and a detection system through which at least one type of object is detected; the method comprising: at least one processing stage in which a contrast of the input image is improved through the input image's depth or scene information obtained directly or indirectly by the scene calibration system or entered manually by a user, the at least one processing stage comprising at least one scene equalization stage defined from the input image's depth or scene information; the at least one scene equalization stage comprises at least the following steps: define a region of interest (r); calculate a histogram of the input image pixels contained in the region of interest p.sub.I.sub.r(i) and use this information to obtain a corresponding transformation function T.sub.r; wherein p I r ( i ) = p ( I r = i ) = n i n r is a histogram of the region of interest formed by pixels of the input image contained in region r; wherein the transformation function T.sub.r is a transformation function whose calculation is based on the histogram of the pixels contained in the region of interest T.sub.r=f(p.sub.I.sub.r(i)); and apply the transformation function T.sub.r to the entire input image I or sub-region r.sub.o of the input image and obtain an equalized image O=T.sub.r(r.sub.o), wherein rcustom characterr.sub.o.Math.D.sub.i, wherein D.sub.I refers to a domain of the input image I and is defined as D.sub.I=[0, I.sub.w1][0, I.sub.h1] wherein I.sub.w is a total width of the input image and I.sub.h is a full height of the input image, wherein the region of interest is defined with a rectangle
r=[x,y,w,h]=[I.sub.w,y.sub.hor,I.sub.w,y.sub.hor], where x and y correspond to coordinates of an upper corner of r, w and h are width w and height h values of the region in pixels, , , (0,1) and (+)<1; I.sub.w is a total width of the input image; and y.sub.hor is a vertical coordinate that defines a detection limit from which an expected size of a target object is smaller than a minimum size that the detection system can detect.

2. The image enhancement method for video analysis or automatic surveillance system according to claim 1, wherein the region of interest (r) comprises smaller target object types that are difficult for the detection system to detect.

3. The image enhancement method for video analysis or automatic surveillance system according to claim 2, wherein the region of interest (r) comprises at least all of the pixels for which an expected size of a target object is in the range (T.sub.min,T.sub.min+(T.sub.maxT.sub.min)); with T.sub.min being a minimum size of the target object that the detection system can detect; T.sub.max being a maximum possible size of the target object; and is a number between 0 and 1.

4. The image enhancement method for video analysis or automatic surveillance system according to claim 1, wherein the equalization stage comprises at least one step that estimates a range of the histogram of the region of interest through calculation of an entropy in the region of interest in the following way: H r ( t ) = - .Math. r p I r ( t ) .Math. log ( p I r ( t ) ) and another step that fixes at least two threshold values (H.sub.HL and H.sub.LH) for which the equalization stage is activated or deactivated, respectively.

5. The image enhancement method for video analysis or automatic surveillance system according to claim 4, wherein the entropy is calculated using the following moving average
custom character(t)=custom character(t1).Math.(1)+H.sub.r(t).Math. with being a number between 0 and 1.

6. The image enhancement method for video analysis or automatic surveillance system according to claim 1, further comprising an equalization smoothing stage in which a new image is obtained from a weighted sum of the equalized image and the input image in the following way:
I.sub.F(x,y)=g(x,y).Math.O(x,y)+(1g(x,y)).Math.I(x,y) wherein g(x,y) is any function whose value is maximum and equal to 1 in the center of the region of interest (r); I(x,y) is the input image, and O(x,y) is the equalized image.

7. The image enhancement method for video analysis or automatic surveillance system according to claim 6, wherein ( x , y ) = e - 1 2 ( ( x - x x ) 2 + ( y - y y ) 2 ) ; = ( x , y ) = ( I w + I w 2 , y hor - y hor 2 ) ; = ( x , y ) = ( I w , y hor ) .

8. The image enhancement method for video analysis or automatic surveillance system according to claim 1, wherein the depth or scene information is obtained from a scene calibration system applying a calibration procedure consisting of at least the following phases: a sample acquisition phase divided into the following sub-phases: an image acquisition sub-phase; an image processing sub-phase, which determines whether there is a moving object in the input image; and a person pre-classification sub-phase, which determines whether or not the moving object is a person and stores its size and position information in case it is; and a calibration phase which obtains the size of a person for each position of the image based on the size and position data obtained for each object identified as a person in the sample acquisition phase.

9. The image enhancement method for video analysis or automatic surveillance system according to claim 8, wherein the region of interest (r) is defined as the transit area during the calibration procedure, this transit area being where samples are obtained during the sample acquisition phase.

10. The image enhancement method for video analysis or automatic surveillance system according to claim 8, wherein the region of interest (r) is obtained according to the procedure: Divide the image into N resizable cells; Mark the cells in which the scene calibration system has obtained at least one sample in the sample acquisition phase; Define the region of interest (r) as a convex area that surrounds the marked cells.

11. The image enhancement system for video analysis or automatic surveillance system of claim 1, comprising functional elements suitable for carrying out a plurality of image enhancement procedures.

12. The image enhancement system for video analysis or automatic surveillance system according to claim 1, wherein the detection system, through which the at least one object type is detected, comprises at least: a static scene segmentation system that classifies pixels into at least two types: moving objects and objects belonging to a background of the input image; a candidate generation system which groups associated moving pixels into objects; a classification system which classifies the objects according to whether or not they are target object types; and a tracking system that maintains temporal coherence of the objects so that, depending on detection rules entered by the user, respective intrusion alarms can be generated.

13. The image enhancement system for video analysis or automatic surveillance system according to claim 12, wherein the image acquisition device is analog and the system consists of a digitizing stage for the image captured by the image acquisition device prior to the processing stage.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above and other advantages and features are easier to understand from the following detailed description of certain embodiment examples with reference to the accompanying drawings, which should be considered as illustrative and not limiting, in which:

(2) FIG. 1 illustrates a block diagram of a video analysis or video surveillance system according to the invention;

(3) FIG. 2 shows a block diagram of the detection system;

(4) FIG. 3 shows a block diagram of a scene calibration system based on a strong calibration procedure;

(5) FIG. 4 illustrates a block diagram of a scene calibration system based on a weak calibration procedure;

(6) FIG. 5 shows an image to which a scene calibration system has been applied;

(7) FIG. 6 shows an equalization procedure.

(8) FIG. 7 illustrates the operation of a switch based on hysteresis.

DETAILED DESCRIPTION

(9) FIG. 1 is a block diagram of a video analysis or automatic surveillance system (1) according to the invention, consisting of at least one image acquisition device (2) to obtain images from one area of the space; a scanning system (3) to provide the digital image obtained from the image acquisition device (2); an image processing system (4) and two alternative operating systems; a scene calibration system (5); and a detection system (6).

(10) The image acquisition devices (2) enable IR or thermal spectrum images to be obtained. These should preferably be fixed cameras with this type of image capture. Also included are image acquisition devices (2) which obtain images in the near IR spectrum, such as day/night cameras which operate with this section of the electromagnetic spectrum for night-time surveillance.

(11) Note that certain image acquisition devices (2) already feature an image scanning system (3) to provide the video analysis or automatic surveillance system (1) with digital images. The image scanning system (3) thus may not be included in the video analysis or automatic surveillance system (1).

(12) If the image acquisition device (2) already enables a digital image to be obtained or features an image scanning system (3), it can be set up to transmit images by any transmission means (cable, fiber, wireless, etc.).

(13) The image processing system (4) applies at least one image enhancement procedure so that, on output, the image has sufficient quality to enable detection of a particular type of object, preferably a person.

(14) As mentioned, the video analysis or automatic surveillance system (1) has two alternative ways to operate, a detection system (6) and a scene calibration system (5).

(15) The detection system (6) is applied regularly during the operation of the video analysis or automatic surveillance system (1) as it is the system that detects particular objects, preferably people.

(16) The scene calibration system (5) should preferably be applied only once at the beginning of the start-up of the video analysis or automatic surveillance system (1) and has to provide the image with a spatial reference so that the detection system (6) can reference all of the calculations that it performs during the detection process: calculation of distance traveled, speed, size of objects, etc., as well as providing direct or indirect information on the depth of the real scene captured in the image for the image processing system (4).

(17) The scene calibration system (5) should preferably be any type of system that obtains the depth of the real scene captured in the image either directly or indirectly. In a preferred embodiment, the scene calibration system (5) is a system that obtains the variation of the approximate size of the target object for each of the coordinates of the pixels of the image as it is an indirect way of measuring the depth of the real scene captured in the image.

(18) FIG. 2 is a block diagram of the detection system (6), consisting of a static scene segmentation system (7), a candidate generation system (8), a classification system (9), a tracking system (10) and a decision system (20).

(19) The static scene segmentation system (7) classifies the pixels into at least two types: moving objects and objects belonging to the background of the image.

(20) The candidate generation system (8) groups the pixels that relate to moving objects and assigns a unique identifier to each moving object in the image.

(21) Both the static segmentation system (7) and the candidate generation system (8) require a sufficiently contrasted image.

(22) The classification system (9) classifies moving objects according to whether or not they are target objects, preferably people, and/or vehicles. As just mentioned, this system needs the scene information obtained during the calibration phase to perform the necessary calculations (measurement of speed, size, distance traveled, etc.) in order to classify objects and hence the FIG. 2 block diagram features a calibration block to reference this information need.

(23) The tracking system (10) maintains temporal coherence of the objects so that, depending on the detection rules entered by the user, the respective intrusion alarms can be generated.

(24) The decision system (20) is responsible for determining, according to the rules (hence the rules block in the diagram for referencing this information need), whether the objects classified by the classification system (9) should be considered as intruders, and if so, generating the appropriate alarm.

(25) FIG. 3 is a block diagram of the scene calibration system (5) based on a strong calibration procedure, such as that described by Hartley, R and Zisserman, A in Multiple view geometry in computer vision, Cambridge University Press, 2003.

(26) This calibration system (5) can consist of at least one parameter insertion system in the image acquisition device (14) and a scene parameter calculation system (15).

(27) The image acquisition device's parameter insertion system (14) obtains, either directly or through the user, the intrinsic parameters of the image acquisition device (2), such as focal length, pixel size, radial lens distortion, and the extrinsic parameters, such as height and angular orientation.

(28) The scene parameter calculation system (15) obtains the size of the target objects for each pixel.

(29) FIG. 4 is a block diagram of the scene calibration system (5) based on a weak calibration procedure, such as that described in patent ES2452790, Procedimiento y sistema de anlisis de imgenes.

(30) As shown, the scene calibration system (5) consists of at least one static scene segmentation system (7), a candidate generation system (8), a tracking system (10), a mapping system for observed size/position (11) and a scene parameter estimation system (12).

(31) The static scene segmentation (7), candidate generation (8) and tracking (10) systems should preferably perform the same functions as those described in the detection system (6), and can even be the same.

(32) The size/position mapping system (11) obtains the variation of the approximate size of the target object for each of the coordinates of the pixels in the image.

(33) The scene parameter estimation system (12) allows other parameters necessary for the detection system (6) to be obtained, such as measurement of speed, size and distance traveled.

(34) In a preferred embodiment, the scene calibration system (5) of the video analysis or automatic surveillance system (1), according to the invention, uses the calibration procedure described in Spanish patent ES2452790 Procedimiento y sistema de anlisis de imgenes to obtain depth or scene information.

(35) Independent of the calibration procedure used by the scene calibration system (5), FIG. 5 shows an image to which the calibration system has been applied and in which rectangles (16) indicate the approximate size of the target object, preferably people, at the point where a rectangle (16) is drawn.

(36) The image processing system (4), according to the invention, performs an image enhancement procedure consisting of a processing stage in which, through the depth or scene information entered by the user or obtained through any scene calibration system (5), but preferably those using the above-described procedures, the contrast of the images captured by the image acquisition device (2) are improved.

(37) In a preferred embodiment, the image processing system (4) consists of at least one filter that adjusts its size variably in each position of the image based on the image's depth or scene information, preferably a percentage of the size of the target object which has been estimated by the scene calibration system (5). This spatial filtering can be applied to the entire image or just region of interest r.sub.. The criteria for defining region of interest r.sub. are preferably: Region defined manually by the user; or Region in which the target objects are smaller; or Transit areas during the scene calibration process (5), defined as those areas where samples were obtained during the acquisition phase described in Spanish patent ES2452790, Procedimiento y sistema de anlisis de imgenes.

(38) For the criterion in which target objects are smaller, two sizes of person should preferably be defined in pixels: T.sub.min (minimum object size capable of being detected by the detection system (6) and provided by the detection system (6)) and T.sub.max which corresponds to the maximum possible size of a target object in the image (which is given by the calibration system (5)). So in this case, the criterion for defining region of interest r.sub. will be all of the pixels for which the expected object size given by the scene calibration system (5) is situated in the range (T.sub.min, T.sub.min+(T.sub.maxT.sub.min)). With being a number between 0 and 1.

(39) As previously explained, there are calibration procedures comprising associating observed object sizes with the position in which they were observed in order to estimate the geometric model describing the scene, among which, preferably, the scene calibration system (5) uses that described in Spanish patent ES2452790, Procedimiento y sistema de anlisis de imgenes In these cases, a criterion for defining the region of interest (r.sub.) should preferably consist of at least the following steps: Divide the image into N resizable cells; Mark the cells in which the calibration system (5) has obtained at least one sample. Define the region of interest (r.sub.) as the convex area that surrounds the marked cells.

(40) The regions of interest (r.sub.) are not restricted to any shape or size.

(41) The region of interest (r.sub.) should preferably be defined as a rectangle (17)
r.sub.=[x,y,w.sub.r,h.sub.r]

(42) where x and y correspond to the coordinates of the upper corner of r.sub., while w.sub.r and h.sub.r correspond to the width and height values of the region in pixels.

(43) In video analysis or video surveillance scenes, the relevant content at object detection level is usually centered in the image so preferably
r.sub.=[x,y,w.sub.r,h.sub.r]=[I.sub.w,y.sub.hor,I.sub.w,y.sub.hor]

(44) is defined, where , , (0,1) and (+)<1; I.sub.w is the total width of the input image; and y.sub.hor is the vertical coordinate delimiting the detection limit (preferably, the vertical coordinate from which the expected size of the target object, preferably a person, is smaller than the minimum size that the system needs to detect it) which the user can enter or can be obtained from the calibration system (5).

(45) FIG. 5 contains a rectangle (17) that defines a region of interest for that image.

(46) It is noted that this type of rectangular region is useful for any scene calibration system (5) as the final calibration result is a person size map per pixel.

(47) Applying spatial filtering can produce unwanted effects in certain areas of the image. To minimize these effects, it is possible to soften the filtering result by combining the filtered and unfiltered images in the following way:
I.sub.F(x,y)=g(x,y).Math.O(x,y)+(1g(x,y)).Math.I(x,y)

(48) Wherein weighting function g(x,y), in the most general case, is a function that takes values between 0 and 1, O(x,y) is the filtered image and I(x,y) is the input image.

(49) Whether spatial filtering is applied to the entire image or region of interest r.sub., this region of interest r.sub. can be used to define weighting function g(x,y) which is preferably a two-dimensional Gaussian function centered on the region of interest (r.sub.) and with standard deviations based on the width and height dimensions of the region of interest (r.sub.), leaving the center of the Gaussian as:

(50) = ( x , y ) = ( I w + I w 2 , y hor - y hor 2 )
and its standard deviation vector as
=(.sub.x,.sub.y)=(I.sub.w,y.sub.hor)

(51) Consequently, function

(52) ( x , y ) = e - 1 2 ( ( x - x x ) 2 + ( y - y y ) 2 )

(53) As can be observed, value g(x,y) in the center of the region of interest (r.sub.) is maximum (equal to 1) and, as the values of x or y move away from the center, the value of g(x,y) decreases and the unequalized image therefore starts to become relevant since function 1g(x,y) increases.

(54) Consequently, this stage has a smoothing effect on the filtering and can be viewed as the introduction of an artificial spotlight that illuminates the area of the region of interest (r.sub.).

(55) In another preferred embodiment, the image processing system (4) consists of at least one equalization process which uses the depth or scene information obtained through the scene calibration system (5).

(56) This equalization process should preferably be based on a simple equalization procedure focused on a region that is considered to be of interest (r). The criteria for defining this region of interest are preferably: Region defined manually by the user; or Region in which the target objects are smaller; or Transit areas during the scene calibration process (5), defined as those areas where samples were obtained during the acquisition phase described in Spanish patent ES2452790, Procedimiento y sistema de anlisis de imgenes

(57) For the criterion in which target objects are smaller, two sizes of person should preferably be defined in pixels: T.sub.min (minimum object size capable of being detected by the detection system (6) and provided by the detection system (6)) and T.sub.max which corresponds to the maximum possible size of a target object in the image (which is given by the calibration system (5)). So in this case, the criterion for defining the region of interest (r) will be all of the pixels for which the expected object size given by the scene calibration system (5) is situated in the range (T.sub.min, T.sub.min+(T.sub.maxT.sub.min)). With being a number between 0 and 1 that allows the equalization level to be adjusted.

(58) As previously explained, calibration procedures exist that consist of associating observed object sizes with the position in which they were observed in order to estimate the geometric model describing the scene, among which, preferably, the scene calibration system (5) uses that described in Spanish patent ES2452790, Procedimiento y sistema de anlisis de imgenes In these cases, a criterion for defining the region of interest (r) should preferably consist of at least the following steps: Divide the image into N resizable cells; Mark the cells in which the calibration system (5) has obtained at least one sample. Define the region of interest as the convex area that surrounds the marked cells.

(59) The regions of interest (r) are not restricted to any shape or size.

(60) The region of interest (r) should preferably be defined as a rectangle (17)
r=[x,y,w,h]
where x and y correspond to the coordinates of the upper corner of r, while w and h correspond to the width and height values of the region in pixels.

(61) In video analysis or video surveillance scenes, the relevant content at object detection level is usually centered in the image so preferably
r=[x,y,w,h]=[I.sub.W,y.sub.hor,I.sub.w,y.sub.hor]
is defined, where , , (0,1) and +)<1; I.sub.w is the total width of the input image; and y.sub.hor is the vertical coordinate delimiting the detection limit (preferably, the vertical coordinate from which the expected size of the target object, preferably a person, is smaller than the minimum size that the system needs to detect it) that the user can enter or can be obtained from the calibration system (5).

(62) FIG. 5 contains a rectangle (17) that defines a region of interest for that image.

(63) This type of rectangular region is useful for any scene calibration system (5) as the final calibration result is a person size map per pixel.

(64) In a preferred embodiment, the equalization process defines a sub-image formed by the pixels of input image I contained in region r is defined as I.sub.r and the histogram of this sub-image as

(65) p I r ( i ) = p ( I r = i ) = n i n r

(66) In addition, a new transformation T.sub.r is defined as the transformation function whose calculation is based on the histogram of the pixels in region of interest T.sub.r=(p.sub.I.sub.r(i)).

(67) In short, a preferred embodiment of the equalization process consists of at least the following steps: 1Calculate the histogram of the input image pixels contained in region of interest p.sub.I.sub.r(i) and use this information to obtain the corresponding transformation function T.sub.r; 2Apply this transformation T.sub.r to the entire input image I or sub-region r.sub.o of the input image to obtain equalized image O=T.sub.r(r.sub.o), where rcustom characterr.sub.o.Math.D.sub.I.

(68) As mentioned previously, equalizing certain areas of the image with a histogram that does not correspond to the areas can harm the contrast in these areas with the appearance of unwanted effects such as noise, gray level saturation, etc. Thus, in a preferred embodiment, an equalization smoothing stage is proposed using a weighted sum of the equalized image with the above method and an unequalized image in the following way:
I.sub.F(x,y)=g(x,y).Math.O(x,y)+(1g(x,y)).Math.I(x,y)
Where weighting function g(x,y) can be any type of function whose value at the center of the region of interest is maximum, but is preferably a two-dimensional Gaussian function centered on the region of interest (r) and with standard deviations based on the width and height dimensions of the region of interest (r), leaving the center of the Gaussian as:

(69) = ( x , y ) = ( I w + I w 2 , y hor - y hor 2 )
and its standard deviation vector as
=(.sub.x,.sub.y)=(I.sub.w,y.sub.hor)

(70) Consequently, the function

(71) ( x , y ) = e - 1 2 ( ( x - x x ) 2 + ( y - y y ) 2 )

(72) As can be observed, value g(x,y) in the center of the region of interest (r) is maximum (equal to 1) and, as the values of x or y move away from the center, the value of g(x,y) decreases and the unequalized image therefore starts to become relevant since function 1g(x,y) increases.

(73) Consequently, this stage has a smoothing effect on the equalization and can be viewed as the introduction of an artificial spotlight that illuminates the area of the region of interest (r).

(74) So far, a basic equalization procedure has been described in which the depth or scene information obtained through the scene calibration system (5) is used. However, it can be considered that there are two types of equalization process depending on the nature of the input image, local equalization process and remote equalization process.

(75) The local equalization process is the simplest and is that shown in FIG. 6. As can be observed, in this type of equalization process, the image from the image acquisition device (2) or the image resulting from applying the image scanning system (3) is equalized using depth information from the scene calibration system (5) of the video analysis or automatic surveillance system (1).

(76) Since the dynamic range of the image, i.e. the range of values of most of the pixels, is occasionally too small, it can cause excessive noise to be introduced when expanding the histogram during equalization. This noise is not desirable in the image because it can lead to false alarms generated by the detection system (6). Because of this, in a preferred embodiment, the equalization process, according to the invention, includes a step that studies the histogram range through calculation of the entropy in the region of interesta measurement that, although indirect, is much more robust than simply studying the width of the histogram.

(77) In this regard, the entropy of the image in the region of interest is defined as:

(78) H r ( t ) = - .Math. r p I r ( t ) .Math. log ( p I r ( t ) )

(79) This metric will be greater the more the histogram has a uniform probability distribution and the wider the dynamic range of the image.

(80) Thus, two threshold values are set for activating (on) or deactivating (off) the equalization: H.sub.HL and H.sub.LH.

(81) The operation of this switch based on hysteresis is illustrated in FIG. 7. Specifically, if during the equalization process, the equalization mode is off and the entropy calculated rises above H.sub.LH, the equalization process is activated. If the equalization mode is on and the entropy falls below H.sub.HL, the equalization process is deactivated.

(82) Note that this hysteresis cycle is implemented to prevent jumping between modes, which can affect the detection system (6). However and to further smooth the transition between modes, entropies calculated using a moving average are used instead of instantaneous entropies:
custom character(t)=custom character(t1).Math.(1)+H.sub.r(t).Math.
with being a number between 0 and 1.

(83) Furthermore, the remote equalization process is based on remotely defining the region of interest from the depth or scene information obtained by the image calibration system (5) for image acquisition devices (2) or image scanning systems (3), which are equipped with software that executes equalization processes. That is to say, the equalization process is performed by the image acquisition device (2) or the image scanning system (3) but on the region of interest defined from the depth or scene information obtained by the image calibration system (5).