OBJECT TRACKING SYSTEM, INTELLIGENT IMAGING DEVICE, OBJECT FEATURE EXTRACTION DEVICE, AND OBJECT FEATURE EXTRACTION METHOD
20200034649 ยท 2020-01-30
Assignee
Inventors
Cpc classification
G06V10/771
PHYSICS
G06V10/762
PHYSICS
G06V10/454
PHYSICS
G06F18/217
PHYSICS
G06V20/52
PHYSICS
G06V10/507
PHYSICS
G06V40/10
PHYSICS
International classification
Abstract
An object feature extraction device according to an aspect of the present invention includes: at least one memory storing instructions; and at least one processor configured to execute the instructions to: detect an object from an image, and generate area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and extract, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
Claims
1. An object feature extraction device comprising: at least one memory storing instructions; and at least one processor configured to execute the instructions to: detect an object from an image, and generate area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and extract, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
2. The object feature extraction device according to claim 1, wherein the at least one processor is further configured to extract from the image within an area defined by the area information, a primary feature, and generate the feature indicating the feature of the object by separably adding the resolution information to the primary feature.
3. The object feature extraction device according to claim 1, wherein the at least one processor is further configured to generate the feature indicating the feature of the object by converting, based on the resolution information, a feature extracted from the image within the area defined by the area information.
4. The object feature extraction device according to claim 3, wherein the at least one processor is further configured to acquire a likelihood, based on the resolution information, with respect to the feature extracted from the image within the area defined by the area information, and generate the feature indicating the feature of the object, based on the acquired likelihood.
5. The object feature extraction device according to claim 1, wherein the at least one processor is further configured to make the feature consist of likelihoods output by a discriminator for a plurality of subareas included in the image within the area defined by the area information, the discriminator being learned for each resolution indicated by the resolution information.
6. The object feature extraction device according to claim 2, wherein the at least one processor is further configured to: determine by comparing features from time series of images within areas defined by the area information, an identical object between images of different points of time, and generate and output a tracking identifier identifying the identical object; and group the primary feature calculated by the feature extraction means based on the area information, the resolution information, and the tracking identifier, estimate an original feature based on the primary feature acquired from an area having a higher resolution in a group, learning how a value of the estimated original feature varies with resolution, and feed back a learning result to the feature extraction means.
7. An object tracking system including a first object feature extraction device and a second object feature extraction device each of which is the object feature extraction device according to claim 1, comprising: at least one second memory storing instructions and a first feature in an area of an object detected from a first image by the first object feature extraction device, the first feature including first resolution information; and at least one second processor configured to execute the instructions to: perform matching between a second feature including second resolution information and a first feature including the first resolution information, the first feature read from the at least one second memory, the second feature being a feature in an area of an object detected from a second image by the second object feature extraction device, the second image being different from the first image, and determine if objects are identical to each other in consideration of the first resolution information and the second resolution information.
8. An intelligent imaging device comprising: at least an imaging device; at least one memory storing instructions; and at least one processor configured to execute the instructions to: detect an object from an image captured by the imaging unit, and generate area information and resolution information, the area information indicating an area where the object is present, the resolution information pertaining to resolution of the object; and extract, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
9. An object feature extraction method comprising: detecting an object from an image, and generating area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
10. The object feature extraction method according to claim 9, wherein the extracting includes extracting, from the image within an area defined by the area information, a primary feature, and generating the feature indicating the feature of the object by separably adding the resolution information to the primary feature.
11. The object feature extraction method according to claim 9, wherein the extracting includes generating the feature indicating the feature of the object by converting, based on the resolution information, a feature extracted from the image within the area defined by the area information.
12. The object feature extraction method according to claim 11, wherein the extracting includes acquiring a likelihood, based on the resolution information, with respect to the feature extracted from the image within the area defined by the area information, and generating the feature indicating the feature of the object, based on the acquired likelihood.
13. The object feature extraction method according to claim 9, wherein the extracting includes making the feature consist of likelihoods output by a discriminator for a plurality of subareas included in the image within the area defined by the area information, the discriminator being learned for each resolution indicated by the resolution information.
14. The object feature extraction method according to claim 10, further comprising: determining, by comparing features from time series of images within areas defined by the area information, an identical object between images of different points of time, and generating and outputting a tracking identifier identifying the identical object; and grouping the primary feature calculated by the extracting the feature based on the area information, the resolution information, and the tracking identifier, estimating an original feature based on the primary feature acquired from an area having a higher resolution in a group, learning how a value of the estimated original feature varies with resolution, and feeding back a learning result to the extracting the feature.
15. An object tracking method performing matching between a first feature and a second feature each of which is extracted by the object feature extraction method according to claim 9, the object tracking method comprising: performing matching between a second feature and a first feature including first resolution information, the first feature read from feature storage, and determining if objects are identical to each other in consideration of the first resolution information and the second resolution information, wherein the first feature is a feature in an area of an object detected from a first image, includes the first resolution information, and is stored in the feature, and the second feature is a quantity in an area of an object detected from a second image different from the first image, and includes second resolution information.
16-24. (canceled)
Description
BRIEF DESCRIPTION OF DRAWINGS
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
EXAMPLE EMBODIMENT
[0030] In the following, example embodiments of the present invention are exemplarily described in detail with reference to the drawings. However, constituent elements described in the following example embodiments are merely an example, and a technical scope of the present invention is not limited to these constituent elements.
First Example Embodiment
[0031] An object feature extraction device as a first example embodiment of the present invention is described by using
[0032] <<Object Feature Extraction Device>>
[0033] As illustrated in
[0034] (Configuration and Operation of Object Detection Unit)
[0035] The object detection unit 101 detects an object from the image 110 which is input, and outputs the result as an object detection result. When the object is a person, a person area is detected by using a detector which has learned an image feature of a person. For example, a detector for detecting based on histograms of oriented gradients (HOG) features, or a detector for directly detecting from an image by using a convolutional neural network (CNN) may be employed. Alternatively, a person may be detected without using an entirety of a person but by using a detector which has learned an area of a part of a person (e.g., a head portion or the like). Also, when the object is a car, similarly, it is possible to detect a car by using a detector which has learned an image feature of a vehicle. Also, when the object is a specific physical object other than the above, a detector which has learned an image feature of the specific physical object may be configured and used.
[0036] The area information 111 and the resolution information 112 are acquired with respect to individual objects detected as described above.
[0037] The area information 111 is information on an area where the object is present within an image. Specifically, the area information 111 may be information on a circumscribed rectangle of an object area on an image, or silhouette information indicating a shape of an object. The silhouette information is information for distinguishing between a pixel inside an object area, and a pixel outside the object area, and is, for example, image information in which a pixel value of a pixel inside an object area is set to 255, and a pixel value of a pixel outside the object area is set to 0. It is possible to acquire the silhouette information by an existing method such as a background subtraction method.
[0038] Meanwhile, the resolution information 112 is information indicating a size of an object on an image, and a distance from a camera serving as an imaging unit to an object. For example, the resolution information 112 may be the numbers of pixels regarding a horizontal direction and a vertical direction of an object area on an image, and may be a distance from a camera to an object. A distance from a camera to an object can be acquired by converting two-dimensional coordinates of the camera into coordinates on a real space by using information on a position and a direction of the camera. Information on a position and a direction of a camera can be acquired and be calculated by performing calibration processing when the camera is installed. Resolution information may include not only one type of information but also a plurality of types of information. The area information 111 and the resolution information 112 calculated for each detected object are output to the feature extraction unit 102 that extracts a feature such as a pattern and a texture, for example.
[0039] (Feature Extraction Unit)
[0040] The feature extraction unit 102 extracts, from the image 110 to be input, the object feature 121 representing a pattern, a texture, and the like, based on the area information 111 and the resolution information 112 for each object output from the object detection unit 101. When the object is a person, a feature of a pattern and a texture of a garment of the person is extracted. At this occasion, the object feature 121 into which the resolution information 112 is also incorporated is generated and output, taking into consideration that a feature of a pattern and a texture may vary depending on resolution of an area. When the resolution information 112 is incorporated, information in which the resolution information 112 is appended as it is to a feature of a pattern and a texture may be output as a whole as the object feature 121, and the object feature 121 may be acquired by applying certain conversion to a feature of a pattern and a texture by using the resolution information 112. In the following description, in the latter case, a feature before applying conversion is referred to as a primary feature.
[0041] According to the present example embodiment, a feature is extracted in consideration of a change in feature depending on a resolution, and therefore, the present example embodiment is capable of generating an object feature, while suppressing tracking miss and search miss due to a matching error.
Second Example Embodiment
[0042] Next, an object feature extraction device, and an object tracking system including the object feature extraction device according to a second example embodiment of the present invention are described. The object feature extraction device according to the present example embodiment calculates an object feature with taking a change in feature of a pattern depending on a distance from a camera and resolution into consideration from a time when extracting the feature. Further, the object tracking system including the object feature extraction device according to the present example embodiment is able to suppress tracking miss and search miss by maximally utilizing identification accuracy of a feature, since resolution is reflected on an object feature. For example, when resolution is lowered, a feature of a fine pattern becomes unidentifiable, and it appears as if a pattern is not present. It is possible to suppress tracking miss and search miss even in such a case, since resolution is reflected on a feature in a case where a fine pattern becomes unidentifiable and in a case where a pattern is not originally present.
[0043] <<Object Tracking System>>
[0044] A configuration and an operation of the object tracking system are described with reference to
[0045] (System Configuration)
[0046]
[0047] Referring to
[0048] The object feature extraction unit 220A detects an object from an image captured by a camera 210A, extracts a first feature such as a pattern of the object, and stores the first feature in the feature storage unit 230. The object feature extraction unit 220B detects an object from an image captured by a camera 210B, extracts a second feature 220b such as a pattern of the object, and outputs the second feature 220b to the object matching unit 240. The object matching unit 240 performs matching between the second feature 220b, such as a pattern of an object, output from the object feature extraction unit 220B, and the first feature 230a, such as a pattern of the object, stored in the feature storage unit 230, and outputs the matching result.
[0049] Although not illustrated in
[0050] (System Operation)
[0051]
[0052] A video acquired by the camera 210A is input to the object feature extraction unit 220A (S301), an object is detected, and extraction of a feature such as a pattern of the object is performed (S303). This processing is as described above in the first example embodiment. The feature such as a pattern reflecting resolution information is output for the detected object, and is stored in the feature storage unit 230 (S305). The feature storage unit 230 stores the acquired object feature together with information on a camera for which extraction of the object feature is performed, a point of time when the extraction is performed, a position in the camera, and so on. When a certain condition is given from an outside, the feature storage unit 230 outputs an object feature that meets the condition.
[0053] Meanwhile, a video acquired by the camera 210B is input to the object feature extraction unit 220B (S307), an object is detected, and extraction of a feature such as a pattern of the object is performed (S309). This processing is also similar to the processing by the object feature extraction unit 220A, and the acquired feature of the object is output to the object matching unit 240.
[0054] When the object feature which is extracted by the object feature extraction unit 220B and treated as a query is input, the object matching unit 240 reads, from the feature storage unit 230, the object feature to be used for matching (S311), and performs matching, in which resolution information is reflected, between the object features (S313). Specifically, the object matching unit 240 calculates a degree of similarity between the object features , and determines whether the objects are identical to each other. At this occasion, a point of time when a corresponding object appears on another camera (in this case, the camera 210A) may be predicted, and object features acquired at around the predicted point of time may be read and used for matching. Alternatively, a point of time when a corresponding object appears on another camera (in this case, the camera 210B) may be predicted, and object features acquired at around the predicted point of time may be selected and used for matching. The acquired result is output as an object matching result (S315 to S317).
[0055] This procedure is repeated until an instruction to finish from an operator is received (S319).
[0056] <<Functional Configuration of Object Feature Extraction Device (Unit)>>
[0057]
[0058] The object feature extraction device (unit) 220 includes an object detection unit 401 and a feature extraction unit 402. The object detection unit 401 is a functional element similar to the object detection unit 101 in
[0059] The feature extraction unit 402 according to the present example embodiment includes a primary feature extraction unit 421 and a feature generation unit 422. The primary feature extraction unit 421 receives image information and area information output from the object detection unit 401 as an input, and outputs a primary feature to the feature generation unit 422. The feature generation unit 422 generates a feature such as a pattern and a texture from the primary feature output from the primary feature extraction unit 421, and resolution information output from the object detection unit 401, and outputs the feature as an object feature.
[0060] (Primary Feature Extraction Unit)
[0061] The primary feature extraction unit 421 extracts a feature that is a base for a texture and a pattern. For example, the primary feature extraction unit 421 extracts a local feature reflecting a local feature of a pattern. As an extraction method, various methods are employed. First, a point that is a key point is extracted, and a feature in a periphery of the point is extracted. Alternatively, regularly arranged grids are placed on an area, and a feature at a grid point is extracted. At this occasion, an interval between the grids may be normalized according to a size of an object area. As a feature extracted at this occasion, it is possible to employ various features such as scale-invariant feature transform (SIFT), speed-up robust features (SURF), and oriented FAST and rotated BRIEF (ORB). Further, a feature such as a Haar-Like feature, a Gabor wavelet, and histograms of oriented gradients (HOG) may be employed.
[0062] Further, an area of an object may be divided into a plurality of subareas, and a feature may be extracted for each of the subareas. When the object is a person, for example, a feature point may be acquired for each of oblong areas acquired by dividing an area of a garment along a horizontal line, and a feature may be extracted. Alternatively, an area may be divided into N divisions in a vertical direction and M divisions in horizontal directions, i.e. divided areas of a certain number, the above-described features may be extracted for the divided areas, respectively, and the features may be joined together into a primary feature. For example, when a feature of one area has L dimensions, and the area is divided into areas of N divisions in a vertical direction and M divisions in a horizontal direction, the features become a vector of (LMN) dimensions. A way of division into subareas does not need to be regular. For example, when the object is a person, subareas may be set so as to fit body parts, such as an upper body and a lower body (alternatively, pieces into which each of the upper body and the lower body is further divided).
[0063] A primary feature generated as described above is output to the feature generation unit 422.
[0064] (Feature Generation Unit)
[0065] When the object is, for example, a person, the feature generation unit 422 generates a feature, which is to be used in matching, of a garment or the like on the basis of a feature of a garment output from the primary feature extraction unit 421 and resolution information output from the object detection unit 401; and outputs, as an object feature, the feature.
[0066] (First Generation Method)
[0067] For example, visual keywords acquired by clustering primary features are generated by learning in advance, and visual keywords to which primary features correspond are determined, and a histogram thereof is generated and set as a feature. At this occasion, the histogram, together with the resolution information which is also appended in a separable form, is set as an object feature. When a primary feature is acquired for each of the subareas, the histogram of visual keywords may be generated for each of the subareas, the subareas may be joined together, and resolution information may be appended to the entirety in a separable form.
[0068] (Second Generation Method)
[0069] Alternatively, when a histogram of visual keywords is generated, an occurrence probability of each of the visual keywords may be acquired from the acquired primary features by using resolution information, and the histogram may be calculated by performing weighting by a value of the probability. Here, the number of visual keywords is referred to as N, and an individual visual keyword is referred to as x.sub.n (n=1, . . . , N). Further the number of acquired primary features is referred to as J, and an acquired individual primary feature is referred to as y.sub.j (j=1, . . . , J). Further, levels of resolution are classified into K stages by resolution information, and distinguished by a resolution index (k=1, . . . , K). When a resolution index is k, an occurrence probability of the visual keyword x.sub.n in a case where y.sub.j is acquired is described as p.sub.k(x.sub.n|y.sub.j). When the primary feature y.sub.j is acquired, a histogram may be generated by adding a value of the occurrence probability p.sub.k(x.sub.n|y.sub.j) to a bin associated with the visual keyword x.sub.n.
[0070] Therefore, when a value of a bin of the histogram associated with the visual keyword x.sub.n is h.sub.n, h.sub.n is described as follows.
[0071] A value of the occurrence probability p.sub.k(x.sub.n|y.sub.j) can be written as:
[0072] Here, p.sub.k(y.sub.j|x.sub.n) is a probability that a feature of a texture pattern of the visual keyword x.sub.n is y.sub.j at a resolution indicated by the resolution index k, and p(x.sub.n) is a prior probability of the visual keyword x.sub.n (which represents a frequency of occurrence of the visual keyword x.sub.n, and does not depend on resolution).
[0073] It is possible to acquire in advance p.sub.k(y.sub.j|x.sub.n) by examining (i.e. learning by using data) how features of the visual keyword x.sub.n is distributed at a resolution associated with the resolution index k. Also regarding p(x.sub.n), it is possible to acquire what pattern is frequently used for a garment and to output a distribution by examining in advance texture patterns of various objects (e.g., in a case of a person, texture patterns of garments, patterns generated by layered wearing of garments, and the like). Alternatively, when such prior knowledge is not available, the distribution may be set as a uniform distribution. It is possible to calculate a value of (Math. 2) by using these values.
[0074] In this way, p.sub.k(x.sub.n|y.sub.j) is acquired and stored for each resolution index k, and for each value of the primary feature y.sub.j. When the value actually occurs, a feature is calculated according to (Math. 1).
[0075] It is possible to calculate an object feature of a pattern and a texture as described above. Also in this case, resolution information may be appended thereto. When primary features are acquired separately for subareas, features may be acquired separately for the subareas, and may be joined into an object feature of a pattern and a texture.
[0076] <<Functional Configuration of Object Matching Unit>>
[0077]
[0078] (Configuration)
[0079] Referring to
[0080] (Operation)
[0081] The first feature 230a read from the feature storage unit 230 is input to the resolution information separation unit 501. The resolution information separation unit 501 extracts, from the input first feature 230 a, information corresponding to a resolution, outputs the extracted information as the first resolution information, and outputs, as data on the first feature, data indicating a feature of a pattern other than the resolution. The second feature 220b from the object feature extraction device (unit) 220B is input to the resolution information separation unit 502. Similarly to the resolution information separation unit 501, the resolution information separation unit 502 also separates resolution information, and outputs second resolution information and data on the second feature.
[0082] The separated first and second resolution information are input to the reliability calculation unit 503. The reliability calculation unit 503 calculates and outputs, from the resolution information, a degree of reliability indicating a degree at which a matching result between features can be relied.
[0083] Meanwhile, the separated data on the first and second features are input to the feature matching unit 504. The feature matching unit 504 performs comparison between object features of patterns and the like. A degree of similarity or a distance between features are simply calculated, and objects are determined to be identical to each other when the objects have a degree of similarity equal to or larger than a predetermined threshold value, i.e. the degree of similarity is high, and then a matching result is output. Alternatively, determination may be made as to whether objects are identical to each other by employing a determiner generated by a neural network or the like, and by inputting data on the first feature and data on the second feature to the determiner. At this occasion, a criterion of matching may be adjusted according to the degree of reliability calculated by the reliability calculation unit 503, and identity determination may be performed. A matching result may not be binary determination as to whether objects are simply identical to each other, but a numerical value indicating a degree of identity may be output as a matching result. Further, a degree of reliability output from the reliability calculation unit 503 may be appended to a matching result.
[0084] <<Hardware Configuration of Object Feature Extraction Device (Unit)>>
[0085]
[0086] In
[0087] A RAM 640 is a random access memory used by the CPU 610 as a temporary storage work area. In the RAM 640, an area for storing data necessary for achieving the present example embodiment is secured. Captured image data 641 are image data acquired from the camera 210. An object detection result 642 is a detection result of an object, which is detected based on the captured image data 641. The object detection result 642 stores sets of (object, and area information/resolution information 643) from (first object, and area information/resolution information) to (n-th object, and area information/resolution information). A feature extraction table 644 is a table for extracting an object feature on the basis of the captured image data 641, and the area information/resolution information 643. Tables 645 from a first object table to an n-th object table are stored in the feature extraction table 644. An object feature 646 is a feature of an object, which is extracted only from the object by using the feature extraction table 644.
[0088] A storage 650 stores a database, various parameters, and the following data and program necessary for achieving the present example embodiment. Data and parameters 651 for object detection are data and parameters used for detecting an object on the basis of the captured image data 641. Data and parameters 652 for feature extraction are data and parameters used for extracting an object feature on the basis of the captured image data 641 and the area information/resolution information 643. The data and parameters 652 for feature extraction include those for primary feature extraction 653, and those for feature generation 654.
[0089] The following programs are stored in the storage 650. An object feature extraction program 655 is a program for controlling the entirety of the object feature extraction device 220. An object detection module 656 is a module for detecting an object on the basis of the captured image data 641 by using the data and parameters 651 for object detection. A primary feature extraction module 657 is a module for extracting a primary feature on the basis of the captured image data 641 and area information by using data and parameters for primary feature extraction 653. A feature generation module 658 is a module for generating an object feature on the basis of a primary feature and resolution information by using data and parameters for feature generation 654.
[0090] When the object feature extraction device (unit) 220 is provided as an intelligent camera 250 in which the object feature extraction device (unit) 220 is integrally implemented together with the camera 210, the object feature extraction device (unit) 220 further includes an input-output interface 660, the camera 210 connected with the input-output interface 660, and a camera control unit 661 for controlling the camera 210.
[0091] In the RAM 640 and the storage 650 illustrated in
[0092] (Feature Extraction Table)
[0093]
[0094] In the feature extraction table 644, image data 702 captured by the camera are stored in association with a camera ID 701. The image data 702 includes an image ID, and a timestamp of time at which an image having the image ID is captured. The image also includes a still image and a moving image. Object detection information 703 and feature information 704 are stored in association with each piece of the image data 702. The object detection information 703 includes an object ID, area information, and resolution information. The feature information 704 includes a primary feature and an object feature.
[0095] <<Processing Procedure of Object Feature Extraction Device (Unit)>>
[0096]
[0097] In Step S801, the feature extraction device 220 acquires image data of an image captured by a camera. In Step S803, based on the image data, the feature extraction device 220 detects an object from the image, and generates area information and resolution information. In Step S805, based on the image data, the feature extraction device 220 extracts, from the image, a primary feature of the object by using the area information. In Step S807, the feature extraction device 220 generates an object feature from the primary feature by using the resolution information. In Step S809, the feature extraction device 220 outputs the object feature of a pattern and a texture of a garment, for example. In Step S811, the feature extraction device 220 determines if an instruction to finish the processing from an operator is received. When there is no instruction, the feature extraction device 220 repeats extraction and output of an object feature of an image from the camera.
[0098] (Matching Table)
[0099]
[0100] In the matching table 900, first object information 901 and second object information 902 for matching are stored. The first object information 901 and the second object information 902 include a camera ID, a timestamp, an object ID, and a feature. First object resolution information 903, which is separated from a first object feature, and second object resolution information 904, which is separated from a second object feature, are stored in the matching table 900. Reliability information 905, which is determined from the first object resolution information 903 and the second object resolution information 904, and a matching result 906, which is acquired by matching between the first object feature and the second object feature with referring to the reliability information 905, are further stored in the matching table 900.
[0101] <<Processing Procedure of Object Matching Unit>>
[0102]
[0103] In Step S1001, the object matching unit 240 acquires a feature of a first object. In Step S1003, the object matching unit 240 separates the first resolution information 903 from the first object feature. In Step S1005, the object matching unit 240 acquires a feature of a second object. In Step S1007, the object matching unit 240 separates the second resolution information 904 from the second object feature.
[0104] In Step S1009, the object matching unit 240 calculates the reliability information 905 from the first resolution information 903 and the second resolution information 904. In Step S1011, the object matching unit 240 performs matching between the first object feature and the second object feature with referring to the reliability information. In Step S1013, the object matching unit 240 determines if the first object feature and the second object feature match with each other. When the first object feature and the second object feature match with each other, in Step S1015, the object matching unit 240 outputs information on the first object and the second object matching with each other. In Step S1017, the object matching unit 240 determines if an instruction to finish the processing from an operator is received. When there is no instruction, the object matching unit 240 repeats object matching and output of a matching result.
[0105] According to the present example embodiment, an object feature is extracted in consideration of a change in feature depending on a resolution, and matching between the object features is performed in consideration of a degree of reliability based on resolution. Therefore, the present example embodiment is able to suppress tracking miss and search miss due to a matching error.
Third Example Embodiment
[0106] Next, an object feature extraction device, and an object tracking system including the object feature extraction device according to a third example embodiment of the present invention are described. As compared with the above-described second example embodiment, the object feature extraction device, and the object tracking system including the object feature extraction device according to the present example embodiment of the present invention are different in a point that a feature extraction unit of the object feature extraction device, and an object matching unit for the object tracking system are achieved by one functional configuration unit. Since other configuration and operation are similar to those of the second example embodiment, the same signs are assigned to the same configurations and the same operations, and detailed description thereof is omitted.
[0107] <<Functional Configuration of Object Feature Extraction Device (Unit)>>
[0108]
[0109] (Configuration)
[0110] The object feature extraction device (unit) 1120 includes an object detection unit 401 and a feature extraction unit 1102 including one feature discriminating unit 1121. The feature discriminating unit 1121 receives area information and resolution information generated by the object detection unit 401 and image data as an input, generates a feature, and outputs the feature as an object feature.
[0111] (Operation)
[0112] The area information and the resolution information, and the image data are input to the feature discriminating unit 1121. The feature discriminating unit 1121 is, for example, a classifier which has learned in such a way as to classify features of various patterns captured at various resolutions. When a pattern is used as a feature, the input is pixel values and resolution information of a subarea within a garment area, and the output is a likelihood of a feature of each of patterns (which takes a value from 0 to 1, and as the value of the likelihood of a pattern of a feature approaches 1, a possibility that the input represents the feature of the pattern is higher). When classification of features of N patterns is performed, likelihoods of the features of the N patterns are the output, and this output is set as a feature indicating a pattern and a texture.
[0113] When a feature is acquired from a plurality of subareas, likelihoods to which likelihoods derived for the subareas are united may be set as a feature indicating a pattern and a texture. The classifier may be implemented using a neural network, for example. At this occasion, a classifier which is used may have been trained by inputting pixel values and a resolution altogether. Alternatively, classifiers which may be used may have been trained individually for a plurality of resolutions, and a classifier may be selected on the basis of resolution information and be used. A plurality of subareas may be input. In this case, any of the plurality of subareas may overlap or may not overlap one another. All sizes of the subareas may be the same, or the subareas may include subareas whose sizes are different. Sizes of the subareas may be normalized on the basis of a size of a garment area.
[0114] Description is made by using a pattern of a garment as an example. When the object is another tracking target such as a car, for example, a feature capable of suppressing tracking miss and search miss due to a matching error is selected, and a feature of the feature is extracted.
[0115] <<Functional Configuration of Object Matching Unit>>
[0116]
[0117] The object matching unit 1240 in
[0118] The present example embodiment is able to extract an object feature in consideration of a change in feature depending on a resolution with a more simplified configuration, perform matching between the object features in consideration of a degree of reliability by the resolution, and suppress tracking miss and search miss due to a matching error.
Fourth Example Embodiment
[0119] Next, an object matching unit of an object tracking system according to a fourth example embodiment of the present invention is described. As compared with the second example embodiment, the object matching unit according to the present example embodiment is different in a point that a reliability calculation unit is not provided, and separated first and second resolution information are directly input to a feature matching unit. Since other configuration and operation are similar to those of the second example embodiment, the same signs are assigned to the same configurations and operations, and detailed description thereof is omitted.
[0120] <<Functional Configuration of Object Matching Unit>>
[0121]
[0122] (Configuration)
[0123] Referring to
[0124] (Operation)
[0125] The feature matching unit 1304 compares between the data on the first feature and the data on the second feature, and determines if objects are identical to each other. At this occasion, the first resolution information and the second resolution information are also input to the feature matching unit 1304, and used for matching. For example, the feature matching unit 1304 determines a degree indicating that the data on the first feature and the data on the second feature are identical to each other by using a discriminator which has learned a probability of matching for each resolution, and outputs determination result as a matching result. Also in this case, the matching result that is output may not be a binary value as to whether objects are identical to each other, but a numerical value indicating a degree of matching.
[0126] The present example embodiment is capable of suppressing tracking miss and search miss due to a matching error without a reliability calculation unit, i.e. with an object matching unit having a more simplified configuration.
Fifth Example Embodiment
[0127] Next, an object feature extraction device according to a fifth example embodiment of the present invention is described. As compared with the above-described second and third example embodiments, the object feature extraction device according to the present example embodiment is different in a point that a change in feature is learned by object tracking, and a learning result is reflected on extraction of an object feature. Since other configuration and operation are similar to those of the second or third example embodiment, the same signs are assigned to the same configurations and operations, and detailed description thereof is omitted.
[0128] <<Functional Configuration of Object Feature Extraction Device (Unit)>>
[0129]
[0130] (Configuration)
[0131] Referring to
[0132] The object tracking unit 1403 performs tracking of an object between frames on the basis of area information output from the object detection unit 401 and input image data of an image , and outputs a tracking identifier (hereinafter, referred to as a tracking ID) of the object. The feature learning unit 1404 learns a change in feature caused by a change of resolution by using resolution information and area information output from the object detection unit 401, a tracking result output from the object tracking unit 1403, and a primary feature output from a primary feature extraction unit 421 of the feature extraction unit 1402, and outputs a learning result to a feature generation unit 1422 of the feature extraction unit 1402. The feature generation unit 1422 extracts, from the image data, the area information and the resolution information output from the object detection unit 401, and the learning result on the feature output from the feature learning unit 1404, a feature such as a pattern and a texture of the object, and outputs the feature as an object feature.
[0133] (Operation)
[0134] An operation of the object detection unit 401 is similar to the operation illustrated in
[0135] The object tracking unit 1403 associates an input result of object detection with an object tracking result which has been acquired so far, and thereby calculates a tracking result with respect to a current frame. At this occasion, existing various methods may be employed for tracking. For example, it is possible to employ tracking by a Kalman filter, and is also possible to employ a tracking method by a particle filter. Consequently, a tracking ID is calculated for each of the detected objects. The calculated tracking ID is output to the feature learning unit 1404.
[0136] The feature learning unit 1404 learns an influence on a feature by resolution on the basis of resolution information and area information output from the object detection unit 401 for each of the objects, tracking ID information output from the object tracking unit 1403 for each of the objects, and a primary feature output from the primary feature extraction unit 421 of the feature extraction unit 1402 for each of the objects, and acquires posterior probability information with respect to each resolution.
[0137] First, primary features associated with a same tracking ID are grouped. At this occasion, grouping may be performed further in consideration of a position in an object area. For example, when a feature belongs to an m-th subarea of a person having a same tracking ID, features located at the same m-th subarea are collected and grouped. Here, association is maintained in such a way that the associated resolution information is easily acquired from individual features that have been grouped. Next, a visual keyword, among visual keywords x.sub.n (n=1, . . . , N), to which an original feature pattern belongs is acquired based on, among the group of features, a feature having a resolution equal to or higher than a predetermined resolution. Thus, by using the feature of this group, how x.sub.n varies by resolution is confirmed. By repeating this learning with respect to a plurality of persons of which tracking has been securely performed, how each x.sub.n varies by resolution is learned. The learning result is output to the feature generation unit 1422 of the feature extraction unit 1402, and is used in succeeding feature generation.
[0138] Thus, since an influence on a feature by variation depending on resolution is automatically learned for each of cameras, it becomes possible to acquire a feature more appropriate for identification of a feature pattern. When the object is a person, online learning may be performed by using data, based on a premise that the number of persons is less and a tracking error does not occur, in an actual operation. Alternatively, when a pattern of a garment is used as a feature, learning may be performed at the time of installation by letting a person walk while wearing garments of various patterns, and then the system may be used. At this occasion, learning may be performed in such a way that a person wears garments on which various features are depicted.
[0139] (Feature Extraction Table)
[0140]
[0141] The feature extraction table 1500 stores object tracking information 1502 and training information 1503 in association with each of object tracking IDs 1501. Feature learning information 1504 is generated from the object tracking information 1502 and the training information 1503.
[0142] The object tracking information 1502 includes an image ID, a timestamp, and area information. The training information 1503 includes a primary feature and resolution information.
[0143] <<Processing Procedure of Object Feature Extraction Device (Unit)>>
[0144]
[0145] In Step S1606, the feature extraction device 1420 tracks an object in image data by using area information. In Step S1607, the feature extraction device 1420 generates feature learning information from a primary feature, area information, and resolution information, for each object. In Step S1608, the feature extraction device 1420 generates an object feature from the primary feature by using the resolution information and the feature learning information.
[0146] According to the present example embodiment, learning a change in feature by object tracking is performed and an object feature reflecting a learning result is extracted, the present example embodiment is capable of generating an object feature, while further suppressing tracking miss and search miss by a matching error.
Sixth Example Embodiment
[0147] Next, an object feature extraction unit of an object tracking system according to a sixth example embodiment of the present invention is described. As compared with the above-described second to fifth example embodiments, the object feature extraction unit according to the present example embodiment is different in a point that an object tracking device as a server which performs object tracking processing extracts an object feature. Since other configuration and operation are similar to those in the second to fifth example embodiments, the same reference signs are assigned to the same configuration and operation, and detailed description thereof is omitted.
[0148] <<Functional Configuration of Object Feature Extraction Device (Unit)>>
[0149]
[0150] An object tracking unit 1703 performs tracking an object on the basis of image data from at least two cameras, as illustrated in
[0151] According to the present example embodiment, a server which performs object tracking processing performs object feature extraction and object tracking at the same time, which is different from the separated object feature extraction device and an intelligent imaging device of the second example embodiment in which a camera and an object feature extraction unit are integrated. Therefore, it is possible to speedily perform efficient object tracking by using information in a wider range.
Other Example Embodiments
[0152] In the foregoing, the invention of the present application is described with reference to the example embodiments. The invention of the present application, however, is not limited to the above-described example embodiments. A configuration and details of the invention of the present application may be modified in various ways comprehensible to a person skilled in the art within the scope of the invention of the present application. A system or a device including any combination of individual features included in each of the example embodiments is also included within the scope of the present invention. For example, the configuration of a set of an object feature extraction device (unit) and an object matching unit is not limited to that in the above-described example embodiments, and configurations of different example embodiments may be synthesized.
[0153] The present invention is able to track a specific object (such as a person or a car) by using, for example, cameras at two locations away from each other. For example, when an incident occurs, the present invention may be used for the purpose of tracking a suspect by using a plurality of cameras. When there is a stray child, it is possible to use the present invention for the purpose of finding the stray child by searching among a plurality of cameras.
[0154] The present invention may be applied to a system including a plurality of devices, or may be applied to a single device. Further, the present invention is also applicable to a case where an information processing program that achieves functions of the example embodiments is directly or remotely supplied to a system or a device. Therefore, a program to be installed in a computer in order to achieve the functions of the present invention by the computer, a medium storing the program, and a world wide web (WWW) server which causes a computer to download the program are included within the scope of the present invention. In particular, a non-transitory computer readable medium storing a program causing a computer to execute at least processing steps included in the above-described example embodiments is included within the scope of the present invention.
Other Expression of Example Embodiments
[0155] A part or the entirety of the above-described example embodiments may be described as the following supplementary notes, but are not limited to the following.
[0156] (Supplementary Note 1)
[0157] An object feature extraction device including:
[0158] object detection means for detecting an object from an image, and generating area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and
[0159] feature extraction means for extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0160] (Supplementary Note 2)
[0161] The object feature extraction device according to Supplementary Note 1, wherein
[0162] the feature extraction means extracts, from the image within an area defined by the area information, a primary feature, and generates the feature indicating the feature of the object by separably adding the resolution information to the primary feature.
[0163] (Supplementary Note 3)
[0164] The object feature extraction device according to Supplementary Note 1, wherein
[0165] the feature extraction means generates the feature indicating the feature of the object by converting, based on the resolution information, a feature extracted from the image within the area defined by the area information.
[0166] (Supplementary Note 4)
[0167] The object feature extraction device according to Supplementary Note 3, wherein
[0168] the feature extraction means acquires a likelihood, based on the resolution information, with respect to the feature extracted from the image within the area defined by the area information, and generates the feature indicating the feature of the object, based on the acquired likelihood.
[0169] (Supplementary Note 5)
[0170] The object feature extraction device according to any one of Supplementary Notes 1 to 4, wherein
[0171] the feature extraction means makes the feature consist of likelihoods output by a discriminator for a plurality of subareas included in the image within the area defined by the area information, the discriminator being learned for each resolution indicated by the resolution information.
[0172] (Supplementary Note 6)
[0173] The object feature extraction device according to Supplementary Note 2, further including:
[0174] object tracking means for determining, by comparing features from time series of images within areas defined by the area information, an identical object between images of different points of time, and generating and outputting a tracking identifier identifying the identical object; and
[0175] feature learning means for grouping the primary feature calculated by the feature extraction means based on the area information, the resolution information, and the tracking identifier, estimating an original feature based on the primary feature acquired from an area having a higher resolution in a group, learning how a value of the estimated original feature varies with resolution, and feeding back a learning result to the feature extraction means.
[0176] (Supplementary Note 7)
[0177] An object tracking system including a first object feature extraction device and a second object feature extraction device each of which is the object feature extraction device according to any one of Supplementary Notes 1 to 6, including:
[0178] feature storage means for storing a first feature in an area of an object detected from a first image by the first object feature extraction device, the first feature including first resolution information; and
[0179] object matching means for performing matching between a second feature including second resolution information and a first feature including the first resolution information, the first feature read from the feature storage means, the second feature being a feature in an area of an object detected from a second image by the second object feature extraction device, the second image being different from the first image, and determining if objects are identical to each other in consideration of the first resolution information and the second resolution information.
[0180] (Supplementary Note 8)
[0181] An intelligent imaging device including:
[0182] at least an imaging unit; and an object feature extraction unit, wherein
[0183] the object feature extraction unit includes: [0184] object detection means for detecting an object from an image captured by the imaging unit, and generating area information and resolution information, the area information indicating an area where the object is present, the resolution information pertaining to resolution of the object; and [0185] feature extraction means for extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0186] (Supplementary Note 9)
[0187] An object feature extraction method including:
[0188] detecting an object from an image, and generating area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and
[0189] extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0190] (Supplementary Note 10)
[0191] The object feature extraction method according to Supplementary Note 9, wherein
[0192] the extracting includes extracting, from the image within an area defined by the area information, a primary feature, and generating the feature indicating the feature of the object by separably adding the resolution information to the primary feature.
[0193] (Supplementary Note 11)
[0194] The object feature extraction method according to Supplementary Note 9, wherein
[0195] the extracting includes generating the feature indicating the feature of the object by converting, based on the resolution information, a feature extracted from the image within the area defined by the area information.
[0196] (Supplementary Note 12)
[0197] The object feature extraction method according to Supplementary Note 11, wherein
[0198] the extracting includes acquiring a likelihood, based on the resolution information, with respect to the feature extracted from the image within the area defined by the area information, and generating the feature indicating the feature of the object, based on the acquired likelihood.
[0199] (Supplementary Note 13)
[0200] The object feature extraction method according to any one of Supplementary Notes 9 to 12, wherein
[0201] the extracting includes setting the feature to likelihoods output by a discriminator for a plurality of subareas included in the image within the area defined by the area information, the discriminator being learned for each resolution indicated by the resolution information.
[0202] (Supplementary Note 14)
[0203] The object feature extraction method according to Supplementary Note 10, further including:
[0204] determining, by comparing features from time series of images within areas defined by the area information, an identical object between images of different points of time, and generating and outputting a tracking identifier identifying the identical object; and
[0205] grouping the primary feature calculated by the extracting the feature based on the area information, the resolution information, and the tracking identifier, estimating an original feature based on the primary feature acquired from an area having a higher resolution in a group, learning how a value of the estimated original feature varies with resolution, and feeding back a learning result to the extracting the feature.
[0206] (Supplementary Note 15)
[0207] An object tracking method performing matching between a first feature and a second feature each of which is extracted by the object feature extraction method according to any one of Supplementary Notes 9 to 14, the object tracking method including:
[0208] performing matching between a second feature and a first feature including first resolution information, the first feature read from feature storage means, and determining if objects are identical to each other in consideration of the first resolution information and the second resolution information, wherein
[0209] the first feature is a feature in an area of an object detected from a first image, includes the first resolution information, and is stored in the feature storage means, and
[0210] the second feature is a feature in an area of an object detected from a second image different from the first image, and includes second resolution information.
[0211] (Supplementary Note 16)
[0212] An intelligent imaging method including:
[0213] detecting an object from an image captured by an imaging unit, and generating area information and resolution information, the area information indicating an area where the object is present, the resolution information pertaining to resolution of the object; and
[0214] extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0215] (Supplementary Note 17)
[0216] A storage medium storing an object feature extraction program causing a computer to execute:
[0217] object detection processing of detecting an object from an image, and generating area information indicating an area where the object is present, and resolution information pertaining to resolution of the object; and
[0218] feature extraction processing of extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0219] (Supplementary Note 18)
[0220] The storage medium according to Supplementary Note 17, wherein
[0221] the feature extraction processing extracts, from the image within an area defined by the area information, a primary feature, and generates the feature indicating the feature of the object by separably adding the resolution information to the primary feature.
[0222] (Supplementary Note 19)
[0223] The storage medium according to Supplementary Note 17, wherein
[0224] the feature extraction processing generates the feature indicating the feature of the object by converting, based on the resolution information, a feature extracted from the image within the area defined by the area information.
[0225] (Supplementary Note 20)
[0226] The storage medium according to Supplementary Note 19, wherein
[0227] the feature extraction processing acquires a likelihood, based on the resolution information, with respect to the feature extracted from the image within the area defined by the area information, and generates the feature indicating the feature of the object, based on the acquired likelihood.
[0228] (Supplementary Note 21)
[0229] The storage medium according to any one of Supplementary Notes 17 to 20, wherein
[0230] the feature extraction processing makes the feature consist of likelihoods output by a discriminator for a plurality of subareas included in the image within the area defined by the area information, the discriminator being learned for each resolution indicated by the resolution information.
[0231] (Supplementary Note 22)
[0232] The storage medium according to Supplementary Note 18, the program further causing a computer to execute:
[0233] object tracking processing of determining, by comparing features from time series of images within areas defined by the area information, an identical object between images of different points of time, and generating and outputting a tracking identifier identifying the identical object; and
[0234] feature learning processing of grouping the primary feature calculated by the feature extraction processing based on the area information, the resolution information, and the tracking identifier, estimating an original feature based on the primary feature acquired from an area having a higher resolution in a group, learning how a value of the estimated original feature varies with resolution, and feeding back a learning result to the feature extraction processing.
[0235] (Supplementary Note 23)
[0236] A storage medium storing an object tracking program causing a third computer, the third computer being connected with a feature storage means and a second computer, the feature storage means being connected with a first computer, each of the first computer and the second computer executing the object feature extraction program stored in the storage medium according to any one of Supplementary Notes 17 to 22, to execute
[0237] object matching processing of performing matching between a second feature including second resolution information and a first feature including the first resolution information, the first feature read from the feature storage means, and determining if objects are identical to each other in consideration of the first resolution information and the second resolution information, wherein
[0238] the first feature is a feature in an area of an object detected from a first image by the first computer and includes the first resolution information, and
[0239] the second feature is a feature in an area of an object detected from a second image that is different from the first image by the second computer and includes the second resolution information.
[0240] (Supplementary Note 24)
[0241] A storage medium storing an intelligent imaging program causing a computer connected with a imaging unit to execute:
[0242] object detection processing of detecting an object from an image captured by the imaging unit, and generating area information and resolution information, the area information indicating an area where the object is present, the resolution information pertaining to resolution of the object; and
[0243] feature extraction processing of extracting, from the image within an area defined by the area information, a feature indicating a feature of the object in consideration of the resolution information.
[0244] In the foregoing, the present invention is described with reference to the example embodiments. The invention of the present application, however, is not limited to the above-described example embodiments. A configuration and details of the present invention may be modified within the scope of the invention of the present application in various ways comprehensible to a person skilled in the art.
[0245] This application claims the priority based on Japanese Patent Application No. 2017-055913 filed on Mar. 22, 2017, the disclosure of which is incorporated herein in its entirety.
REFERENCE SIGNS LIST
[0246] 100 Object feature extraction device [0247] 101 Object detection unit [0248] 102 Feature extraction unit [0249] 110 Image [0250] 111 Area information [0251] 112 Resolution information [0252] 121 Object feature [0253] 200 Object tracking system [0254] 210 Camera [0255] 210A Camera [0256] 210B Camera [0257] 220 Feature extraction device [0258] 220 Object feature extraction device (unit) [0259] 220A Object feature extraction unit [0260] 220b Second feature [0261] 220B Object feature extraction device (unit) [0262] 230 Feature storage unit [0263] 230a First feature [0264] 240 Object matching unit [0265] 250 Intelligent camera [0266] 250A Intelligent camera [0267] 401 Object detection unit [0268] 402 Feature extraction unit [0269] 421 Primary feature extraction unit [0270] 422 Feature generation unit [0271] 501 Resolution information separation unit [0272] 502 Resolution information separation unit [0273] 503 Reliability calculation unit [0274] 504 Feature matching unit [0275] 630 Network interface [0276] 641 Captured image data [0277] 642 Object detection result [0278] 643 Resolution information [0279] 644 Feature extraction table [0280] 645 Table [0281] 646 Object feature [0282] 650 Storage [0283] 651 Parameter [0284] 652 Parameter [0285] 653 For primary feature extraction [0286] 654 For feature generation [0287] 655 Object feature extraction program [0288] 656 Object detection module [0289] 657 Primary feature extraction module [0290] 658 Feature generation module [0291] 660 Input-output interface [0292] 661 Camera control unit [0293] 702 Image data [0294] 703 Object detection information [0295] 704 Feature information [0296] 900 Matching table [0297] 901 First object information [0298] 902 Second object information [0299] 903 Resolution information [0300] 903 First resolution information [0301] 904 Resolution information [0302] 904 Second resolution information [0303] 905 Reliability information [0304] 906 Matching result [0305] 1102 Feature extraction unit [0306] 1120 Object feature extraction device (unit) [0307] 1121 Feature discrimination unit [0308] 1201 Feature matching unit [0309] 1220 Object feature extraction device (unit) [0310] 1240 Object matching unit [0311] 1304 Feature matching unit [0312] 1340 Object matching unit [0313] 1402 Feature extraction unit [0314] 1403 Object tracking unit [0315] 1404 Feature learning unit [0316] 1420 Feature extraction device [0317] 1420 Object feature extraction device (unit) [0318] 1422 Feature generation unit [0319] 1500 Feature extraction table [0320] 1502 Object tracking information [0321] 1503 Training information [0322] 1504 Feature learning information [0323] 1703 Object tracking unit [0324] 1704 Feature learning unit