A CALIBRATION METHOD FOR A RECORDING DEVICE AND A METHOD FOR AN AUTOMATIC SETUP OF A MULTI-CAMERA SYSTEM
20210385426 · 2021-12-09
Inventors
- Simon EBNER (St. Gallen, CH)
- Nebojsa ANDELKOVIC (St. Gallen, CH)
- Fang-Lin HE (St. Gallen, CH)
- Martin AFFOLTER (St. Gallen, CH)
- Vuk ILIC (St. Gallen, CH)
- Mohammad Seyed ALIVI, I (Zürich, CH)
Cpc classification
G06V20/30
PHYSICS
G06V20/52
PHYSICS
International classification
Abstract
A calibration method (1) for a recording device (2). The method includes receiving, with a data interface (11) of a data processing system (3), a first data set (21) with an image and three dimensional information generated by the recording device (2). At least one person (4), within a field of view (8) of the recording device (2), is detected with an object detection component (12). Two or more attributes (5), of the person (4), are determined by an attribute assignment component (13). The attributes include an interest factor for an object and a location of the person (4). A descriptor (9) is generated based on the determined attributes (5). The data processing system calculates an attention model with a discretized space within the field of view based on the descriptor(s). The attention model is configured to predict a probability of a person showing interest for the object.
Claims
1. A calibration method for a recording device (2), said method including the steps of: Receiving a first data set (21) with a data interface (11) of a data processing system (3), the data set comprising an image (6) and three dimensional information (7) generated by the recording device (2) at a first point in time, the recording device (2) having a field of view (8); Detecting with an object detection component (12) of the data processing system (3) in the image at least one person (4) within the field of view (8); Determining with an attribute assignment component (13) of the data processing system (3) two or more attributes (5) of the at least one person (4) from the first data set (21), wherein the attributes (5) include an interest factor for an object, in particular for a monitor (9), and a three dimensional location (14) of the at least one person (4); Generating with the data processing system (3) a descriptor (9) for each person based on at least the determined attributes (5) of the at least one person (4); Calculating an attention model (19) with a discretized space within the field of view (8) based on the descriptor (s) (9) with the data processing system (3), wherein the attention model (19) is configured to predict a probability of a person showing interest for the object.
2-25. (Canceled)
Description
[0067] Non-limiting embodiments of the invention are described, by way of example only, with respect to the accompanying drawings, in which:
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085] The recording device 2 is realized as a stereo camera. The stereo camera is adapted to record two RGB images. The three dimensional information is reconstructed with a multiple view geometry algorithm on a processing unit from the two images. The camera may have an integrated processing unit for the reconstruction. The data processing hardware may alternatively reconstruct the three dimensional information. Thereby, three dimensional information realized as three dimensional cloud points of the field of view is obtained. The recorded data is sent as data sets including image data and the recorded three dimensional cloud points to the data interface 11. The data processing hardware system 3 further includes an object detection component 12, an attribute assignment component 13, a movement tracker component 15, an object matching component 16 and an electronic memory 26 for storing an attention model 19 and a motion model 18. With these components, the recording device 2 calculates instructions for the monitor 14 (explained in detail with reference to
[0086] The instructions cause the monitor 14 to play a specific content. The content may be selected from a content library. The content is usually a video which is displayed on the monitor and an audio belonging to the video.
[0087]
[0088] The body skeleton detection allows tracking of an orientation of the person. In particular, a head pose and an orientation of the body and the face indicate an interest in a particular object of the person. The head pose might point entirely or partly in the direction of the object. Based on the head pose and three dimensional location of the person it is projected, if the head frontal direction (yaw axis) is towards screen, it is counted as that person has an interest in screen at that particular location and moment. Later on it is calculated for how long the person is showing interest. Depending on this a factor is calculated which expresses the interest of the person in the object. Additionally, the object detection component 12 may detect the eyes and track pupils.
[0089] The head pose may be determined by the attribute assignment component. Additionally, body skeleton tracking may determine the head pose as well. The combination of both can improve the accuracy of the determination of the head pose.
[0090] The detected person is forwarded to the attribute assignment component 13. The attribute assignment component 13 assigns the current location 20 (see
[0091] As a result, the data processing hardware system 3 can calculate 40 an attention model 19. The attention model 19 is based on a discretized space of the field of view 8. The data processing hardware system 3 calculates 40 the discretized space 3 and the interest factor assigned to the location 20 (see
[0092] This process is repeated, for each person 4 detected in the field of view 8. Thereby, the discretized space of the attention model is filled with interest factors allowing a prediction over the entire field of view. The attention model may be used to calculate areas of different levels of interest.
[0093] Such levels of interest are shown in a top view in
[0094]
[0095] The attention model 19 thus includes discretized spaces 37 with a higher interest factor and discretized spaces 36, 37 with lower interest factors. This is indicated in
[0096]
[0097] Then, the object matching component 16 compares the attributes between the persons. If sufficient attributes match, the object matching component 16 matches the person and identifies them as a matched person 17 in two different frames. Regularly the matched person 17 will have moved in between the frames. With the different positions provided by the three dimensional cloud points the movement tracker component 15 can determine a trajectory 24 of the person (see
[0098] The trajectories 24 of two persons passing through the field of view are shown in
[0099]
[0100]
[0101]
[0102]
[0103]
[0104] Thus, in the overlapping region 138 the person 104 may be detected in the data generated by both recording devices 131 and 132. The recording devices 131, 132 generate RGB image data 106 (see
[0105] Each camera 131, 132 has its own coordinate system. The recording device 131 or 132 is at the origin of the coordinate system.
[0106] Since each camera has an aperture angle 135, 136 and three dimensional information, each camera can determine the coordinates of all cloud points in its coordinate space.
[0107] The flowchart shown in
[0108] However, it is preferred, that data processing hardware system 103 is realized as a computing module that is installed on-site.
[0109] The RGB image data 106 and the three dimensional cloud points 107 are send as data sets from the recording devices 131, 132 to the data processing hardware system 103. The data processing hardware system 103 receives the data sets 121, 122 at an interface 111 and forwards the RGB image data 106 to an object detection component 112. Optionally, the three dimensional cloud points 107 may also be forwarded to the object detection compovent 112. The object detection component 112 detects a person 104 in the image data 106 based on attributes. In particular, the object detection component 112 may identify attributes characteristic for persons, e.g. legs, arms, a torso, a head or similar. Further, the object detection component identifies attributes that are characteristic for a person.
[0110] The object and the attributes are then sent to an attribute assignment component 113, where the attributes as well as the current position identified by the three dimensional cloud points 107 belonging to the identified object are assigned to each person. This information is then aggregated in a descriptor 109.
[0111] The data processing hardware system 103 receives a data set 121 with RGB image data and three dimensional cloud points from the first recording device 131 and a second data set 122 with RGB image data and three dimensional cloud points from the second recording device 132. Both data sets 121, 122 are analyzed in the way outlined above. The data sets 121 of the first recording device 131 and the data sets 122 of the second recording device 132 are analyzed independently and in each data set objects are detected and persons are identified.
[0112] Persons 104 that are located in the overlapping region 138 will be identified in both data sets 121, 122. An object matching component 116 compares the attributes in the descriptors 109 and thereby identifies identical persons in the overlapping region 138. The identification of a person 104 in the overlapping region 138 allows the calculation 119 of a coordinate transformation matrix 127. A plurality (in particular at least 4) of three dimensional cloud points is associated to the person 104. The three dimensional cloud points are determined by the first and the second recording devices 131, 132 independently.
[0113] The data processing hardware system 103 determines the position for the detected person in the coordinate system of the first camera 131 and in the coordinate system of the second camera 132.
[0114] In a variant, the data processing may obtain the position of one or more body parts in the data sets 121, 122 and use the positions to calculate a coordinate transformation matrix for transposing the coordinates of the first camera coordinate system into the second camera coordinate system.
[0115]
[0116] This results in the reconstruction of the trajectory through the overlapping area 138 as can be seen from
[0117] Though any body part might be suitable, the three dimensional neck pose 142 is a particularly preferred tracking point for the persons 104 and 140 (see
[0118] This plane allows transforming the coordinates further into a coordinate system that allows a bird view. Such a coordinate system and its transformation are shown in