METHOD AND SYSTEM FOR HEAD POSE ESTIMATION
20210165999 · 2021-06-03
Inventors
- Bruno Mirbach (Konz, DE)
- Frederic Garcia Becerro (Belvaux, LU)
- Jilliam Maria DIAZ BARROS (Kaiserlautern, DE)
Cpc classification
G06V20/647
PHYSICS
G06V40/171
PHYSICS
International classification
Abstract
A method for head pose estimation using a monocular camera. The method includes: providing an initial image frame recorded by the camera showing a head; and performing at least one pose updating loop with the following steps: identifying and selecting of a plurality of salient points of the head having 2D coordinates in the initial image frame within a region of interest; determining 3D coordinates for the selected salient points using a geometric head model of the head, corresponding to a head pose; providing an updated image frame recorded by the camera showing the head; identifying within the updated image frame at least some previously selected salient points having updated 2D coordinates; updating the head pose by determining updated 3D coordinates corresponding to the updated 2D coordinates using a perspective-n-point method; and using the updated image frame as the initial image frame for the next pose updating loop.
Claims
1. A method for head pose estimation using a monocular camera, the method comprising: providing an initial image frame recorded by the camera showing a head; and performing at least one pose estimation loop with the following steps: identifying and selecting of a plurality of salient points of the head having 2D coordinates in the initial image frame within a region of interest; using a geometric head model of the head, determining 3D coordinates for the selected salient points corresponding to a head pose of the geometric head model; providing an updated image frame recorded by the camera showing the head; identifying within the updated image frame at least some previously selected salient points having updated 2D coordinates; updating the head pose by determining updated 3D coordinates corresponding to the updated 2D coordinates using a perspective-n-point method; and using the updated image frame as the initial image frame for the next pose updating loop.
2. The method of claim 1, wherein before performing the at least one pose updating loop, a distance between the camera and the head is determined.
3. The method of claim 1, wherein before performing the at least one pose updating loop, dimensions of the head model are determined.
4. The method of claim 1, wherein the head model is a cylindrical head model.
5. The method of claim 1, wherein a plurality of consecutive pose updating loops are performed.
6. The method of claim 1, wherein previously selected salient points are identified using optical flow.
7. The method of claim 1, wherein the 3D coordinates are determined by projecting 2D coordinates from an image plane of the camera onto a visible head surface.
8. The method of claim 1, wherein the visible head surface is determined by determining the intersection of a boundary plane with a model head surface.
9. The method of claim 1, wherein the boundary plane is parallel to an X-axis of the camera and a center axis of the cylindrical head model.
10. The method of claim 1, wherein the region of interest is defined by projecting the visible head surface onto the image plane.
11. The method of claim 1, wherein the salient points are selected based on an associated weight which depends on the distance to a border of the region of interest.
12. The method of claim 1, wherein the perspective-n-point method is performed based on the weight of the salient points.
13. The method of claim 1, wherein in each pose updating loop, the region of interest is updated.
14. A system for head pose estimation, comprising a monocular camera and a processing device, which is configured to: receive an initial image frame recorded by the camera showing a head; and perform at least one pose updating loop with the following steps: identifying and selecting of a plurality of salient points of the head having 2D coordinates in the initial image frame within a region of interest; determining 3D coordinates for the selected salient points using a geometric head model of the head, corresponding to a head pose; receiving an updated image frame recorded by the camera showing the head; identifying within the updated image frame at least some previously selected salient points having updated 2D coordinates; updating the head pose by determining updated 3D coordinates corresponding to the updated 2D coordinates using a perspective-n-point method; and using the updated image frame as the initial image frame for the next pose updating loop.
15. The system of claim 14, wherein the system is adapted to determine a distance between the camera and the head before performing the at least one pose updating loop.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] Further details and advantages of the present invention will be apparent from the following detailed description of not limiting embodiments with reference to the attached drawing, wherein:
[0039]
[0040]
[0041]
[0042]
[0043]
DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS
[0044]
[0045]
with f being the focal length of the camera in pixels, δ.sub.px the estimated distance between the eye's centers on the image frame I.sub.0, and δ.sub.mm the mean interpupillary distance, which corresponds to 64.7 mm for male and 62.3 mm for female according to anthropometric databases. As shown in
[0046] Z.sub.cam denotes the distance between the center of the CHM 20 and the camera 2 and is equal to the sum of Z.sub.eyes and the distance Z.sub.head from the centre of the head 10 to the midpoint between the eyes' baseline. Z.sub.cam is related to a radius r of the CHM by Z.sub.head=√{square root over (r.sup.2−(δ.sub.mm/2).sup.2)}. As shown in
in order to obtain the actual quantities in the 3D space. Given the 2D coordinates {p.sub.TL, p.sub.TR, p.sub.BL, p.sub.BR} of the top left, top right, bottom left and bottom right corners of the bounding box, the processing device 3 calculates
Similarly, the height h of the CHM 20 is calculated by
[0047] With Z.sub.cam determined (or estimated), the corners of the face bounding box in 3D space, i.e., {P.sub.TL, P.sub.TR, P.sub.BL, P.sub.BR} and the centers C.sub.T, C.sub.B of the top and bottom bases of the CHM 20 can be determined by projecting the corresponding 2D coordinates into 3D space and combining this with the information about Z.sub.cam.
[0048] The steps described so far can be regarded as part of an initialization process. Once this is done, the method continues with the steps referring to the actual head pose estimation, which will now be described with reference to
[0049] While
[0050] With the 2D coordinates pi of the selected salient points S known, corresponding 3D coordinates P.sub.i are determined (indicated by the white-on-black numeral 3 in
[0051] In another step, and updated image frame I.sub.n+1, which has been recorded by the camera 2, is provided to the processing device 3 and at least some of the previously selected salient points S are identified within this updated image frame I.sub.n+1 (indicated by the white-on-black numeral 2 in
[0052] In another step (indicated by the white-on-black numeral 4 in
[0053] In another step, the region of interest 30 is updated. In this embodiment, the region of interest 30 is defined by the projection of the visible head surface 22 of the CHM 20 onto the image. The visible head surface 22 in turn is defined by the intersection of the head surface 21 with a boundary plane 24. The boundary plane 24 has a normal vector resulting from the cross product between a parallel vector to the X-axis of the camera 2 and a vector parallel to the centre axis 23 of the CHM 20. In other words, the boundary plane 24 is parallel to the X-axis and to the centre axis 24 (see the white-on-black numeral 6 in
[0054] The updated region of interest 30 again comprises non-facial regions like the neck region 33, the head top region 34, the head side region 35 etc. In the next loop, salient points from at least one of these non-facial regions 33-35 may be selected. For example, the head side region 35 now is closer to the center of the region of interest 30, making it likely that a salient point from this region will be selected, e.g. a feature of an ear.