Image processing apparatus and method and monitoring system
10916016 ยท 2021-02-09
Assignee
Inventors
Cpc classification
G06V20/41
PHYSICS
G06V10/28
PHYSICS
International classification
Abstract
Acquiring a current image from an inputted video and a background model which comprises a background image and foreground/background classification information of visual elements; classifying the visual elements in the current image as foreground or background; determining similarity measures between the current image and groups in the background model, wherein visual elements in the current image are the visual elements in the current image which are classified as the foreground, wherein visual elements in the groups in the background model are the visual elements whose classification information is the foreground, and wherein the visual elements in the groups in the background model are the visual elements which neighbour to corresponding portions of the visual elements in the groups in the current image; and identifying whether the visual elements in the current image which are classified as the foreground are falsely classified or not according to the determined similarity measures.
Claims
1. An image processing apparatus, comprising: an acquisition unit configured to acquire a current image from an inputted video and a model which comprises classification information, wherein the classification information comprises visual elements as foreground and visual elements as background; a classification unit configured to classify the visual elements in the current image as the foreground or the background according to the classification information in the model; a similarity measure determination unit configured to determine a similarity measure between a visual element in a first group, in which the visual element is determined as foreground by the classification unit, in the current image and a visual element as foreground in a second group in the model, the second group being spatially neighbour to a corresponding group to the first group in the current image; and an identification unit configured to identify that the visual elements in the current image which are classified as the foreground by the classification unit are falsely classified in a case where the determined similarity measure is less than a threshold.
2. The image processing apparatus according to claim 1, wherein, visual elements as foreground and visual elements as background are obtained according to the visual elements which are identified as the foreground or the background in at least one previous image of the current image.
3. The image processing apparatus according to claim 1, wherein, as for any one of the visual elements in the first group in the current image, a position of a group in the model is same as position of the first group in the current image.
4. The image processing apparatus according to claim 1, wherein in case a group comprises more than one visual element, the similarity measure determination unit determines the similarity measure corresponding to the group according to a ratio of a number of visual elements which are larger than a predefined threshold to a total number.
5. An image processing method, comprising: an acquisition step of acquiring a current image from an inputted video and a model which comprises classification information, wherein the classification information comprises visual elements as foreground and visual elements as background; a classification step of classifying the visual elements in the current image as the foreground or the background according to the classification information in the model; a similarity measure determination step of determining a similarity measure between a visual element in a first group in which the visual element is determined as foreground by the classification unit, in the current image and a visual element as foreground in a second group in the model, the second group being spatially neighbour to a corresponding group to the first group in the current image; and an identification step of identifying that the visual elements in the current image which are classified as the foreground in the classification step are falsely classified in a case where the determined similarity measure is less than a threshold.
6. A monitoring system, comprising: an acquiring device configured to acquire a video; an image processing apparatus configured to identify visual elements in images of the acquired video as foreground or background, the image processing apparatus comprising: an acquisition unit configured to acquire a current image from an inputted video and a model which comprises classification information, wherein the classification information comprises visual elements as foreground and visual elements as background; a classification unit configured to classify the visual elements in the current image as the foreground or the background according to the classification information in the model; a similarity measure determination unit configured to determine a similarity measure between a visual element in a first group, in which the visual element is determined as foreground by the classification unit, in the current image and a visual element as foreground in a second group in the model, the second group being spatially neighbour to a corresponding group to the first group in the current image; and an identification unit configured to identify that the visual elements in the current image which are classified as the foreground by the classification unit are falsely classified in a case where the determined similarity measure is less than a threshold; and a storage device configured to store the acquired video and processing results determined by the image processing apparatus.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF THE EMBODIMENTS
(10) Exemplary embodiments of the present invention will be described in detail with reference to the drawings below. It shall be noted that the following description is merely illustrative and exemplary in nature, and is in no way intended to limit the present invention and its applications or uses. The relative arrangement of components and steps, numerical expressions and numerical values set forth in the embodiments do not limit the scope of the present invention unless it is otherwise specifically stated. In addition, techniques, methods and devices known by persons skilled in the art may not be discussed in detail, but are intended to be apart of the specification where appropriate.
(11) Please note that similar reference numerals and letters refer to similar items in the figures, and thus once an item is defined in one figure, it need not be discussed for following figures.
(12) Generally, in a scene captured on a video, it is impossible that a real object (i.e. the foreground) suddenly appears or disappears in the video. That is to say, a real object will have a moving trajectory in the video. Therefore, in case a group of visual elements (e.g. one visual element, or, more than one visual element) in a current image of the video are the foreground, generally, the visual elements in at least one previous image of the current image which neighbour to the corresponding portions of the visual elements in this group will also be the foreground. Therefore, the inventor found that, in foreground detection, as for a group of visual elements in a current image of a video which comprises at least one visual element that is classified as the foreground, it could regard the visual elements which are identified as the foreground in the previous images of the current image and which neighbour to the corresponding portions of the visual elements in this group in the previous images of the current image as a reference to identify whether the visual elements in this group are falsely classified or not.
(13) Therefore, according to the present disclosure, after the visual elements in a current image of a video are classified as the foreground or the background, as for the visual elements which are classified as the foreground, it will take into consideration the similarities between these visual elements and the visual elements which are identified as the foreground in the previous images and which neighbour to the corresponding portions of these visual elements in the previous images to identify whether these visual elements are falsely classified or not. For example, as for a group of visual elements in the current image which comprises at least one visual element that is classified as the foreground, the more the visual elements in this group that are similar to the visual elements which are identified as the foreground in the previous images and which neighbour to the corresponding portions of the visual elements in this group in the previous images are, such as texture/color/luminance of these visual elements are similar, the higher the probability that the visual elements in this group are correctly classified is, that is, the higher the probability that the visual elements in this group are the real object. Thereby, in case there are less visual elements in this group that are similar to the visual elements which are identified as the foreground in the previous images and which neighbour to the corresponding portions of the visual elements in this group in the previous images, the visual elements in this group will be identified as falsely classified.
(14) Therefore, according to the present disclosure, even if the background includes movements (e.g. the water ripples or the leaves moving in the wind) in certain images of the video, or even if the graph segmentation algorithm with low accuracy is used to obtain the visual elements which are used for foreground detection, since the identified foreground/background classification results which are obtained in the previous processing will be used as a reference for the subsequent processing, the false foreground detection could be eliminated efficiently. Thus, the accuracy of the foreground detection will be improved.
(15) (Hardware Configuration)
(16) The hardware configuration that can implement the techniques described hereinafter will be described first with reference to
(17) The hardware configuration 100, for example, includes Central Processing Unit (CPU) 110, Random Access Memory (RAM) 120, Read Only Memory (ROM) 130, Hard Disk 140, Input Device 150, Output Device 160, Network Interface 170 and System Bus 180. Further, in one implementation, the hardware configuration 100 could be implemented by a computer, such as tablet computers, laptops, desktops or other suitable electronic device. In another implementation, the hardware configuration 100 could be implemented by a monitor, such as digital cameras, video cameras, network cameras or other suitable electronic device. Wherein, in case the hardware configuration 100 is implemented by the monitor, the hardware configuration 100 further includes Optical System 190 for example.
(18) In one implementation, the image processing according to the present invention is configured by hardware or firmware and is acted as a module or component of the hardware configuration 100. For example, the image processing apparatus 200 which will be described in detail hereinafter with reference to
(19) The CPU 110 is any suitable programmable control devices (such as processors) and could execute a variety of functions, to be described hereinafter, by executing a variety of application programs that are stored in the ROM 130 or the Hard Disk 140 (such as memories). The RAM 120 is used to temporarily store the program or the data that are loaded from the ROM 130 or the Hard Disk 140, and is also used as a space wherein the CPU 110 executes the variety of procedures, such as carrying out the techniques which will be described in detail hereinafter with reference to
(20) In one implementation, the Input Device 150 is used to allow the user to interact with the hardware configuration 100. In one instance, the user could input images/videos/data through the Input Device 150. In another instance, the user could trigger the corresponding processing of the present invention through the Input Device 150. Furthermore, the Input Device 150 can take a variety of forms, such as a button, a keypad or a touch screen. In another implementation, the Input Device 150 is used to receive images/videos which are outputted from special electronic devices, such as the digital cameras, the video cameras and/or the network cameras. In addition, in case the hardware configuration 100 is implemented by the monitor, the optical system 190 in the hardware configuration 100 will capture images/videos of a monitoring place directly.
(21) In one implementation, the Output Device 160 is used to display the processing results (such as the foreground) to the user. And the Output Device 160 can take a variety of forms, such as a Cathode Ray Tube (CRT) or a liquid crystal display. In another implementation, the Output Device 160 is used to output the processing results to the subsequent processing, such as monitoring analysis that whether or not giving an alarm to the user, and so on.
(22) The Network Interface 170 provides an interface for connecting the hardware configuration 100 to the network. For example, the hardware configuration 100 could perform, via the Network Interface 170, data communication with other electronic device connected via the network. Alternatively, a wireless interface may be provided for the hardware configuration 100 to perform wireless data communication. The system bus 180 may provide a data transfer path for transferring data to, from, or between the CPU 110, the RAM 120, the ROM 130, the Hard Disk 140, the Input Device 150, the Output Device 160 and the Network Interface 170, and the like to each other. Although referred to as a bus, the system bus 180 is not limited to any specific data transfer technology.
(23) The above described hardware configuration 100 is merely illustrative and is in no way intended to limit the invention, its application, or uses. And for the sake of simplicity, only one hardware configuration is shown in
(24) (Image Processing)
(25) The image processing according to the present disclosure will be described next with reference to
(26)
(27) In addition, a storage device 250 shown in
(28) First, in one implementation, for example, in case the hardware configuration 100 shown in
(29) And then, as shown in
(30) The background image in the background model is obtained according to at least one previous image of the t.sup.th image. That is, the background image is obtained according to at least one image of the video in a certain duration time previous to the t.sup.th image, and the certain duration time is not limited and is set based on experimental statistics and/or experience. In one instance, the background image is an average image of the previous images of the t.sup.th image. In another instance, the background image is any one of the previous images of the t.sup.th image. In the other instance, the background image is obtained timely according to models which are generated for each pixel based on for example Gaussian Models. However, it is readily apparent that it is not necessarily limited thereto.
(31) The foreground/background classification information of the visual elements in the background model is obtained according to the visual elements which are identified as the foreground or the background in at least one previous image of the t.sup.th image. In one instance, the foreground/background classification information of the visual elements is obtained by averaging the identified foreground/background classification results of the visual elements in the previous images of the t.sup.th image. In another instance, the foreground/background classification information of the visual elements is the identified foreground/background classification results of the visual elements in any one of the previous images of the t.sup.th image. In the other instance, the foreground/background classification information of the visual elements is obtained timely according to models which are generated for each visual element based on for example Gaussian Models. However, it is readily apparent that it is not necessarily limited thereto.
(32) For example, assuming that the visual elements are super-pixels, and assuming that the foreground/background classification information of the visual elements in the background model is obtained according to the identified foreground/background classification results of the visual elements in three previous images of the t.sup.th image, wherein the three previous images of the t.sup.th image for example are (t3).sup.th image shown in
(33) In addition, in case one of the previous images of the t.sup.th image is selected as the background image in the background model, and in case the identified foreground/background classification results of the visual elements in one of the previous images of the t.sup.th image is selected as the foreground/background classification information of the visual elements in the background model, these two previous images could be the same image or the different image.
(34) Referring back to
(35) Then, the similarity measure determination unit 230 determines similarity measures between groups (i.e. visual element groups) in the t.sup.th image and groups (i.e. visual element groups) in the background model. Wherein, the visual elements in the groups in the background model are the visual elements whose classification information is the foreground, and wherein the visual elements in the groups in the background model are the visual elements which neighbour to corresponding portions of the visual elements in the groups in the image. Wherein, the groups in the t.sup.th image could be determined according to any of manners, such as set by the user, determined by clustering the visual elements in the t.sup.th image which are classified as the foreground by the classification unit 220, etc. Wherein, each of the groups in the t.sup.th image comprises at least one visual element, and visual elements in the each of the groups are the visual elements in the t.sup.th image which are classified as the foreground by the classification unit 220. Wherein, as for any one of the groups in the t.sup.th image, each of the visual elements in this group corresponds to one corresponding portion of this visual element in the background model. And as for any one of the visual elements in this group, the corresponding portion of this visual element is a portion whose position in the background model is same as position of this visual element in the t.sup.th image. Wherein, as for any one of the groups in the t.sup.th image, the larger the similarity measure corresponding to this group is, the higher the probability that the visual elements in this group are correctly classified is, that is, the higher the probability that the visual elements in this group are the real object is.
(36) And then, the identification unit 240 identifies whether the visual elements in the t.sup.th image which are classified as the foreground by the classification unit 220 are falsely classified or not according to the similarity measures determined by the similarity measure determination unit 230.
(37) Finally, after the visual elements in the t.sup.th image are identified by the identification unit 240, in one aspect, the identification unit 240 transfers the identified foreground/background classification results of the visual elements in the t.sup.th image to the storage device 250, so that the corresponding information stored in the storage device 250 could be updated and the background model which will be used for the next image (e.g. (t+1).sup.th image) could be acquired according to the updated information. In another aspect, the identification unit 240 transfers the identified foreground/background classification results of the visual elements in the t.sup.th image to the Output Device 160 shown in
(38) In addition, generally, in foreground detection, the visual elements in the 1.sup.st image of the inputted video will be regarded as the background acquiescently.
(39) The flowchart 400 shown in
(40) As shown in
(41) In classification step S420, the classification unit 220 classifies the visual elements in the t.sup.th image as the foreground or the background according to the t.sup.th image and the background image in the background model. In one implementation, the classification unit 220 classifies the visual elements in the t.sup.th image as the foreground or the background with reference to
(42) As shown in
(43) In one implementation, as for each of the visual elements in the t.sup.th image, the visual distance between this visual element and the corresponding visual element in the background image is calculated according to feature values of these two visual elements. For example, an absolute difference between the feature values of these two visual elements is regarded as the corresponding visual distance. It is readily apparent that it is not necessarily limited thereto. Wherein, the feature value of one visual element in one image could be determined according to channel features of this visual element in the image. For example, in case the image is in YCbCr color space, one visual element includes Y(Luminance) channel feature, Cb(Blue) channel feature and Cr(Red) channel feature. In case the image is in RGB color space, one visual element includes Red channel feature, Green channel feature and Blue channel feature.
(44) Referring back to
(45) Referring back to
(46) As shown in
(47) Taking the t.sup.th image shown in
(48) Thus, firstly, as for each of the visual elements 760-780, taking the visual element 760 for example, the similarity measure determination unit 230 determines a similarity measure between the visual element 710 and the visual element 760 according to the feature values of these two visual elements. For example, an absolute difference between the feature values of these two visual elements is regarded as the corresponding similarity measure. It is readily apparent that it is not necessarily limited thereto. As described above, the feature value of one visual element in one image could be determined according to the channel features of this visual element in the image. Therefore, the feature value of the visual element 710 is determined according to its channel features in the t.sup.th image. The feature value of the visual element 760 is determined according to feature values of visual elements in the previous images of the t.sup.th image, wherein positions of these visual elements in the previous images are same as the position of the visual element 760 and the foreground/background classification results of these visual elements are used to determine the foreground/background classification information of the visual element 760.
(49) And then, after the similarity measure determination unit 230 determines the similarity measure between the visual element 710 and the visual element 760 (e.g. regarded as Sim1), the similarity measure between the visual element 710 and the visual element 770 (e.g. regarded as Sim2) and the similarity measure between the visual element 710 and the visual element 780 (e.g. regarded as Sim3), the similarity measure determination unit 230 determines the similarity measure corresponding to the visual element 710 according to the determined similarity measures (i.e. Sim1, Sim2 and Sim3). In one instance, the average value of Sim1 to Sim3 is determined as the similarity measure corresponding to the visual element 710. In another instance, one similarity measure among Sim1 to Sim3 whose value is maximal is determined as the similarity measure corresponding to the visual element 710. However, it is readily apparent that it is not necessarily limited thereto.
(50) Referring back to
(51) In step S434, the similarity measure determination unit 230 determines the similarity measure corresponding to this group in the t.sup.th image according to the similarity measures determined in the step S431. In one implementation, the similarity measure determination unit 230 determines the similarity measure corresponding to this group as follows. Firstly, the similarity measure determination unit 230 obtains a counting number by calculating the number of the visual elements in this group the similarity measures corresponding to which are larger than a predefined threshold (e.g. TH2). Then, the similarity measure determination unit 230 determines the similarity measure corresponding to this group by calculating a ratio of the counting number to the total number of the visual elements in this group. For example, the ratio is calculated by using the following formula (1):
(52)
(53) In addition, in case this group comprises only one visual element, the similarity measure determination unit 230 regards the similarity measure corresponding to this visual element as the similarity measure corresponding to this group directly.
(54) Referring back to
(55) In one implementation, as for any one of the groups in the t.sup.th image, the identification unit 240 identifies whether the visual elements in this group are falsely classified or not according to a predefined threshold (e.g. TH3) and the similarity measure corresponding to this group. As described above, as for any one of the groups in the t.sup.th image, the larger the similarity measure corresponding to this group is, the higher the probability that the visual elements in this group are the real object is. Therefore, for example, in case the similarity measure corresponding to this group (i.e. the ratio) is calculated by using the above-mentioned formula (1) and is less than the TH3, which means the probability that the visual elements in this group are the real object is low, the identification unit 240 identifies the visual elements in this group are falsely classified. That is, the visual elements in this group which are classified as the foreground by the classification unit 220 are false foreground. Otherwise, in case the similarity measure corresponding to this group (i.e. the ratio) is not less than the TH3, the identification unit 240 identifies the visual elements in this group are real foreground (i.e. real object). In other words, as for the visual elements in any one of group, the identification unit 240 identifies the visual elements as follows:
(56)
(57) As shown in
(58) According to the present disclosure, since the identified foreground/background classification results which are obtained in the previous processing will be used as a reference for the subsequent processing, the false foreground detection could be eliminated efficiently. Thus, the accuracy of the foreground detection will be improved.
(59) (A Monitoring System)
(60) As described above, the present disclosure could be implemented by a computer (e.g. tablet computers, laptops or desktops) or could be implemented by a monitor (e.g. digital cameras, video cameras or network cameras). The present disclosure is implemented by a network camera for example, after the network camera is triggered the corresponding processing of the present disclosure, the network camera could output the corresponding processing results (i.e. the foreground) to the subsequent processing, such as monitoring analysis that whether or not giving an alarm to the user. Therefore, as an exemplary application of the present disclosure, an exemplary monitor (e.g. a network camera) will be described next with reference to
(61) In addition, a storage device 820 shown in
(62) As shown in
(63) And then, the image processing apparatus 200 identifies visual elements in images of the captured video as foreground or background with reference to
(64) The monitor 800 outputs the detected foreground to a processor which is used to execute a monitoring analysis for example, assuming that the monitoring place is an illegal parking area and the pre-defined alarming rule is that giving an alarm to the user in case cars or other objects are parked in the illegal parking area, that is to say, the illegal parking area is the background and the cars or other objects that appear in the illegal parking area are the foreground. Thereby, the monitor 800 will continuously capture the video of the illegal parking area and execute the foreground detection on the captured video with reference to
(65) All of the units described above are exemplary and/or preferable modules for implementing the processes described in the present disclosure. These units can be hardware units (such as a Field Programmable Gate Array (FPGA), a digital signal processor, an application specific integrated circuit or the like) and/or software modules (such as computer readable program). The units for implementing the various steps are not described exhaustively above. However, where there is a step of performing a certain process, there may be a corresponding functional module or unit (implemented by hardware and/or software) for implementing the same process. Technical solutions by all combinations of steps described and units corresponding to these steps are included in the disclosure of the present application, as long as the technical solutions they constitute are complete and applicable.
(66) It is possible to carry out the method and apparatus of the present invention in many ways. For example, it is possible to carry out the method and apparatus of the present invention through software, hardware, firmware or any combination thereof. The above described order of the steps for the method is only intended to be illustrative, and the steps of the method of the present invention are not limited to the above specifically described order unless otherwise specifically stated. Besides, in some embodiments, the present invention may also be embodied as programs recorded in recording medium, including machine-readable instructions for implementing the method according to the present invention. Thus, the present invention also covers the recording medium which stores the program for implementing the method according to the present invention.
(67) Although some specific embodiments of the present invention have been demonstrated in detail with examples, it should be understood by a person skilled in the art that the above examples are only intended to be illustrative but not to limit the scope of the present invention. It should be understood by a person skilled in the art that the above embodiments can be modified without departing from the scope and spirit of the present invention. The scope of the present invention is defined by the attached claims.