HEIGHT ESTIMATION APPARATUS, HEIGHT ESTIMATION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PROGRAM
20220395193 · 2022-12-15
Assignee
Inventors
Cpc classification
A61B5/107
HUMAN NECESSITIES
G06V40/103
PHYSICS
G06V40/23
PHYSICS
A61B5/1072
HUMAN NECESSITIES
G06V10/34
PHYSICS
International classification
A61B5/107
HUMAN NECESSITIES
Abstract
A height estimation apparatus (10) according to the present disclosure includes an acquisition unit (11) for acquiring a two-dimensional image obtained by capturing an animal, a detection unit (12) for detecting a two-dimensional skeletal structure of the animal based on the two-dimensional image acquired by the acquisition unit (11), and an estimation unit (13) for estimating a height of the animal in a three-dimensional real world based on the two-dimensional skeletal structure detected by the detection unit (12) and an imaging parameter of the two-dimensional image acquired by the acquisition unit (11).
Claims
1. A height estimation apparatus comprising: at least one memory storing instructions, and at least one processor configured to execute the instructions stored in the at least one memory to; acquire a two-dimensional image obtained by capturing an animal; detect a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimate a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
2. The height estimation apparatus according to claim 1, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
3. The height estimation apparatus according to claim 2, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on a sum of the lengths of the bones from a foot to a head included in the two-dimensional skeletal structure.
4. The height estimation apparatus according to claim 2, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on a two-dimensional skeleton model showing a relationship between the length of the bone and a length of a whole body of the animal in the two-dimensional image space.
5. The height estimation apparatus according to claim 4, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on the two-dimensional skeleton model corresponding to an attribute of the animal.
6. The height estimation apparatus according to claim 1, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on a tallest height from among a plurality of the heights obtained based on the plurality of bones in the two-dimensional skeletal structure.
7. The height estimation apparatus according to claim 1, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to estimate the height based on a three-dimensional skeleton model fitted to the two-dimensional skeletal structure based on the imaging parameter.
8. The height estimation apparatus according to claim 7, wherein the at least one processor is further configured to execute the instructions stored in the at least one memory to use a height of the fitted three-dimensional skeleton model as the estimated height.
9. A height estimation method comprising: acquiring a two-dimensional image obtained by capturing an animal; detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
10. A non-transitory computer readable medium storing a program for causing a computer to execute processing of: acquiring a two-dimensional image obtained by capturing an animal; detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
DESCRIPTION OF EMBODIMENTS
[0033] Example embodiments will be described below with reference to the drawings. In each drawing, the same elements are denoted by the same reference signs, and the repeated description is omitted if necessary.
(Study Leading to Example Embodiments)
[0034] Recently, image recognition technology utilizing machine learning has been applied to various systems. As an example, a monitoring system for performing monitoring using images captured by a monitoring camera will be discussed.
[0035]
[0036] As shown in this example, there is a growing demand for easily obtaining attribute information such as age, gender, and height of a person from images or videos of a monitoring camera. Among these attributes, the height is useful information for identifying individuals and distinguishing adults from children. For example, the attribute information is used for investigation as characteristics of a criminal, such as 30s, male, 170 cm, for marketing as information of customers, and for searching for a lost child as a characteristic of the lost child.
[0037] As a result of a study on a method for recognizing a height of a person from an image by the inventors, they found that the related technique cannot always recognize or estimate the height accurately. For example, when a whole body of a person appears in the image, the height can be estimated to some extent. However, the person in the image is not always upright, or the top of the head and the foot do not always appear in the image. Especially in the case of a lost children, there is a high possibility that he/she is crouching down. In such cases, it is difficult to estimate the height.
[0038] Therefore, the inventors studied a method using a skeleton estimation technique by means of machine learning for estimating a height of a person. For example, in a skeleton estimation technique according to related art such as OpenPose disclosed in Non Patent Literature 1, a skeleton of a person is estimated by learning various patterns of annotated image data. In the following example embodiments, a height of a person can be accurately estimated by utilizing such a skeleton estimation technique.
[0039] The skeletal structure estimated by the skeleton estimation technique such as OpenPose is composed of “key points” which are characteristic points such as joints, and “bones, i.e., bone links” indicating links between the key points. Therefore, in the following example embodiments, the skeletal structure is described using the terms “key point” and “bone”, but unless otherwise specified, the “key point” corresponds to the “joint” of a person, and a “bone” corresponds to the “bone” of the person.
Overview of Example Embodiments
[0040]
[0041] The acquisition unit 11 acquires a two-dimensional image obtained by capturing an animal such as a person. The detection unit 12 detects a two-dimensional skeletal structure of the animal based on the two-dimensional image acquired by the acquisition unit 11. The estimation unit 13 estimates the height of the animal in a three-dimensional real world based on the two-dimensional skeletal structure detected by the detection unit 12 and an imaging parameter of the two-dimensional image.
[0042] Thus, in the example embodiments, a two-dimensional skeletal structure of an animal such as a person is detected from a two-dimensional image, and a height of the animal in a real world is estimated based on the two-dimensional skeletal structure, whereby the height of the animal can be accurately estimated regardless of a posture of the animal.
First Example Embodiment
[0043] A first example embodiment will be described below with reference to the drawings.
[0044] As shown in
[0045] The storage unit 106 stores information and data necessary for the operation and processing of the height estimation apparatus 100. For example, the storage unit 106 may be a non-volatile memory such as a flash memory or a hard disk apparatus. The storage unit 106 stores images acquired by the image acquisition unit 101, images processed by the skeletal structure detection unit 102, data for machine learning, and so on. The storage unit 106 may be an external storage apparatus or an external storage apparatus on the network. That is, the height estimation apparatus 100 may acquire necessary images, data for machine learning, and so on from the external storage apparatus.
[0046] The image acquisition unit 101 acquires a two-dimensional image captured by the camera 200 from the camera 200 which is connected to the height estimation apparatus 100 in a communicable manner. The camera 200 is an imaging unit such as a monitoring camera for capturing a person, and the image acquisition unit 101 acquires, from the camera 200, an image obtained by capturing the person.
[0047] The skeletal structure detection unit 102 detects a two-dimensional skeletal structure of the person in the image based on the acquired two-dimensional image. The skeletal structure detection unit 102 detects the skeletal structure of the person based on the characteristics such as joints of the person to be recognized using a skeleton estimation technique by means of machine learning. The skeletal structure detection unit 102 uses, for example, the skeleton estimation technique such as OpenPose of Non Patent Literature 1.
[0048] The height pixel count calculation unit 103 calculates the height, which is referred to as a height pixel count, of the person standing upright in the two-dimensional image based on the detected two-dimensional skeletal structure. The height pixel count can be said to be the height of the person in the two-dimensional image, i.e., the length of the whole body of the person in a two-dimensional image space. The height pixel count calculation unit 103 obtains the height pixel count, i.e., a pixel count, from the length, which is the length in the two-dimensional image space, of each bone of the detected skeletal structure. In this example embodiment, the height pixel count is obtained by summing up the lengths of respective bones from the head to the foot of the skeletal structure. When the skeletal structure detection unit 102, by means of the skeleton estimation technique, does not output the top of the head and the foot, the height pixel count may be corrected by multiplying the height pixel count by a constant as necessary.
[0049] The camera parameter calculation unit 104 calculates camera parameters, which are imaging conditions of the camera 200, based on the image captured by the camera 200. The camera parameters are imaging parameters of the image and are parameters for converting the length in the two-dimensional image into the length in a three-dimensional real world. For example, the camera parameters include a posture, a position, an imaging angle, a focal length, and the like of the camera 200. An image of an object whose length is known in advance is captured by the camera 200, and then the camera parameters can be obtained from the image.
[0050] The height estimation unit 105 estimates the height of the person in the three-dimensional real world based on the calculated camera parameters and the height pixel count in the two-dimensional image. The height estimation unit 105 obtains a relationship between the length of pixel in the image and the length in the real world from the camera parameters, and converts the height pixel count into the height of person in the real world.
[0051]
[0052] As shown in
[0053] Next, the height estimation apparatus 100 detects the skeletal structure of the person based on the acquired image of the person (S202).
[0054] The skeletal structure detection unit 102 extracts, for example, characteristic points that can be the key points from the image, and detects each key point of the person by referring to information obtained by machine learning the image of the key point. In the example of
[0055]
[0056] Next, the height estimation apparatus 100 performs the height pixel count calculation processing based on the detected skeletal structure (S203). In the height pixel count calculation processing, as shown in
[0057] In the example of
[0058] In the example of
[0059] In the example of
[0060] In the meantime, as shown in
[0061] Next, the height estimation apparatus 100 estimates the height of the person based on the height pixel count and the camera parameters (S204). The height estimation unit 105 obtains, from the camera parameters, the length in the three-dimensional real world with respect to one pixel in an area where the person is present in the two-dimensional image, namely, the actual length of the pixel unit. In particular, since the length in the real world with respect to one pixel in the image varies depending on the location in the image, the “length in the real world per pixel in the area where the person is present” in the image is obtained. The height pixel count is converted into the height from the obtained actual length of the pixel unit. For example, in
[0062] As described above, in this example embodiment, the skeletal structure of the person is detected from the two-dimensional image, the height pixel count is obtained by summing up the lengths of the bones in the two-dimensional image of the detected skeletal structure. Further, the height of the person in the real world is estimated in consideration of the camera parameters. The height can be obtained by summing the lengths of the bones from head to foot, and thus the height can be estimated in a simple way. In addition, since it is sufficient to detect at least the skeleton from the head to the foot by the skeleton estimation technique by means of machine learning, the height can be estimated with high accuracy even when the whole body of the person does not necessarily appear in the image such as when the person is crouching down.
Second Example Embodiment
[0063] Next, a second example embodiment will be described. In this example embodiment, in the height pixel count calculation processing according to the first example embodiment, the height pixel count is calculated using a human body model showing a relationship between a length of each bone and a length of a whole body, i.e., a height in the two-dimensional image space. The processing other than the height pixel count calculation processing is the same as that of the first example embodiment.
[0064]
[0065]
[0066] Next, the height pixel count calculation unit 103 calculates the height pixel count from the length of each bone based on the human body model (S302). The height pixel count calculation unit 103 obtains the height pixel count from the length of each bone with reference to the human body model 301 showing the relationship between each bone and the length of the whole body as shown in
[0067] The human body model to be referred to here is, for example, a human body model of an average person, but the human body model may be selected according to the attributes of the person such as age, gender, nationality, etc. For example, when a face of a person appears in the captured image, an attribute of the person is identified based on the face, and a human body model corresponding to the identified attribute is referred to. By referring to the information obtained by machine learning the face for each attribute, the attribute of the person can be recognized from the characteristics of the face of the image. When the attribute of the person cannot be identified from the image, a human body model of an average person may be used.
[0068] Next, the height pixel count calculation unit 103 calculates an optimum value of the height pixel count (S303). The height pixel count calculation unit 103 calculates the optimum value of the height pixel count from the height pixel count obtained for each bone. For example, as shown in
[0069] As described above, in this example embodiment, the height of the person in the real world is estimated by obtaining the height pixel count based on the bones of the detected skeletal structure using the human body model showing the relationship between the bones in the two-dimensional image space and the length of the whole body. In this way, even when all the skeletons from the head to the foot cannot be acquired, the height can be estimated from some of the bones. In particular, by employing a larger value of the height, i.e., a larger height pixel count, which is obtained from a plurality of bones, the height can be accurately estimated.
Third Example Embodiment
[0070] Next, a third example embodiment will be described. In this example embodiment, instead of the height pixel count calculation processing and the height estimation processing according to the first example embodiment, a height in the real world is estimated by fitting a three-dimensional human body model to a two-dimensional skeletal structure. Other aspects are the same as those of the first example embodiment.
[0071]
[0072]
[0073] The three-dimensional human body model 402 prepared here may be a model in a state close to the posture of the two-dimensional skeletal structure 401 as shown in
[0074] Next, the height estimation unit 105 fits the three-dimensional human body model to the two-dimensional skeletal structure (S402). As shown in
[0075] Next, the height estimation unit 105 calculates the height of the fitted three-dimensional human body model (S403). As shown in
[0076] As described above, in this example embodiment, the three-dimensional human body model is fitted to the two-dimensional skeletal structure based on the camera parameters, and the height of the person in the real world is estimated based on the three-dimensional human body model. Specifically, the height of the fitted three-dimensional human body model is used as it is as the estimated height. In this manner, even when all bones do not face the front in the image, that is, even when all bones are viewed diagonally and there is a large difference from actual lengths of the bones, the height can be accurately estimated. When the method according to the first to the third example embodiments is applicable, all of the methods or a combination of the methods may be used to obtain the height. In this case, a value closer to the average height of the person may be used as the optimum value.
[0077] Note that each of the configurations in the above-described example embodiments is constituted by hardware and/or software, and may be constituted by one piece of hardware or software, or may be constituted by a plurality of pieces of hardware or software. The functions and processing of the height estimation apparatuses 10 and 100 may be implemented by a computer 20 including a processor 21 such as a Central Processing Unit (CPU) and a memory 22 which is a storage device, as shown in
[0078] These programs can be stored and provided to a computer using any type of non-transitory computer readable media. Non-transitory computer readable media include any type of tangible storage media. Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (Read Only Memory), CD-R, CD-R/W, and semiconductor memories (such as mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.
[0079] Further, the present disclosure is not limited to the above-described example embodiments and may be modified as appropriate without departing from the purpose thereof. For example, although a height of a person is estimated in the above description, a height of an animal other than a person having a skeletal structure such as mammals, reptiles, birds, amphibians, fish, etc. may be estimated.
[0080] Although the present disclosure has been described above with reference to the example embodiments, the present disclosure is not limited to the example embodiments described above. The configurations and details of the present disclosure may be modified in various ways that would be understood by those skilled in the art within the scope of the present disclosure.
[0081] The whole or part of the example embodiments disclosed above can be described as, but not limited to, the following supplementary notes.
(Supplementary Note 1)
[0082] A height estimation apparatus comprising:
[0083] acquisition means for acquiring a two-dimensional image obtained by capturing an animal;
[0084] detection means for detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
[0085] estimation means for estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
(Supplementary Note 2)
[0086] The height estimation apparatus according to Supplementary note 1, wherein
[0087] the estimation means estimates the height based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
(Supplementary Note 3)
[0088] The height estimation apparatus according to Supplementary note 2, wherein
[0089] the estimation means estimates the height based on a sum of the lengths of the bones from a foot to a head included in the two-dimensional skeletal structure.
(Supplementary Note 4)
[0090] The height estimation apparatus according to Supplementary note 2, wherein
[0091] the estimation means estimates the height based on a two-dimensional skeleton model showing a relationship between the length of the bone and a length of a whole body of the animal in the two-dimensional image space.
(Supplementary Note 5)
[0092] The height estimation apparatus according to Supplementary note 4, wherein
[0093] the estimation means estimates the height based on the two-dimensional skeleton model corresponding to an attribute of the animal.
(Supplementary Note 6)
[0094] The height estimation apparatus according to Supplementary note 4 or 5, wherein
[0095] the estimation means estimates the height based on a tallest height from among a plurality of the heights obtained based on the plurality of bones in the two-dimensional skeletal structure.
(Supplementary Note 7)
[0096] The height estimation apparatus according to Supplementary note 1, wherein
[0097] the estimation means estimates the height based on a three-dimensional skeleton model fitted to the two-dimensional skeletal structure based on the imaging parameter.
(Supplementary Note 8)
[0098] The height estimation apparatus according to Supplementary note 7, wherein
[0099] the estimation means uses a height of the fitted three-dimensional skeleton model as the estimated height.
(Supplementary Note 9)
[0100] A height estimation method comprising:
[0101] acquiring a two-dimensional image obtained by capturing an animal;
[0102] detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
[0103] estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
(Supplementary Note 10)
[0104] The height estimation method according to Supplementary note 9, wherein
[0105] in the estimation of the height, the height is estimated based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
(Supplementary Note 11)
[0106] A height estimation program for causing a computer to execute processing of:
[0107] acquiring a two-dimensional image obtained by capturing an animal;
[0108] detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
[0109] estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
(Supplementary Note 12)
[0110] The height estimation program according to Supplementary note 11, wherein
[0111] in the estimation of the height, the height is estimated based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
(Supplementary Note 13)
[0112] A height estimation system comprising:
[0113] a camera; and
[0114] a height estimation apparatus, wherein the height estimation apparatus comprises:
[0115] acquisition means for acquiring, from the camera, a two-dimensional image obtained by capturing an animal;
[0116] detection means for detecting a two-dimensional skeletal structure of the animal based on the acquired two-dimensional image; and
[0117] estimation means for estimating a height of the animal in a three-dimensional real world based on the detected two-dimensional skeletal structure and an imaging parameter of the two-dimensional image.
(Supplementary Note 14)
[0118] The height estimation apparatus according to Supplementary note 13, wherein
[0119] the estimation means estimates the height based on a length of a bone in a two-dimensional image space included in the two-dimensional skeletal structure.
REFERENCE SIGNS LIST
[0120] 1 HEIGHT ESTIMATION SYSTEM [0121] 10 HEIGHT ESTIMATION APPARATUS [0122] 11 ACQUISITION UNIT [0123] 12 DETECTION UNIT [0124] 13 ESTIMATION UNIT [0125] 20 COMPUTER [0126] 21 PROCESSOR [0127] 22 MEMORY [0128] 100 HEIGHT ESTIMATION APPARATUS [0129] 101 IMAGE ACQUISITION UNIT [0130] 102 SKELETAL STRUCTURE DETECTION UNIT [0131] 103 HEIGHT PIXEL COUNT CALCULATION UNIT [0132] 104 CAMERA PARAMETER CALCULATION UNIT [0133] 105 HEIGHT ESTIMATION UNIT [0134] 106 STORAGE UNIT [0135] 200 CAMERA [0136] 300, 301 HUMAN BODY MODEL [0137] 401 TWO-DIMENSIONAL SKELETAL STRUCTURE [0138] 402 THREE-DIMENSIONAL HUMAN BODY MODEL