A CALIBRATION METHOD FOR A RECORDING DEVICE AND A METHOD FOR AN AUTOMATIC SETUP OF A MULTI-CAMERA SYSTEM

20210385426 · 2021-12-09

    Inventors

    Cpc classification

    International classification

    Abstract

    A calibration method (1) for a recording device (2). The method includes receiving, with a data interface (11) of a data processing system (3), a first data set (21) with an image and three dimensional information generated by the recording device (2). At least one person (4), within a field of view (8) of the recording device (2), is detected with an object detection component (12). Two or more attributes (5), of the person (4), are determined by an attribute assignment component (13). The attributes include an interest factor for an object and a location of the person (4). A descriptor (9) is generated based on the determined attributes (5). The data processing system calculates an attention model with a discretized space within the field of view based on the descriptor(s). The attention model is configured to predict a probability of a person showing interest for the object.

    Claims

    1. A calibration method for a recording device (2), said method including the steps of: Receiving a first data set (21) with a data interface (11) of a data processing system (3), the data set comprising an image (6) and three dimensional information (7) generated by the recording device (2) at a first point in time, the recording device (2) having a field of view (8); Detecting with an object detection component (12) of the data processing system (3) in the image at least one person (4) within the field of view (8); Determining with an attribute assignment component (13) of the data processing system (3) two or more attributes (5) of the at least one person (4) from the first data set (21), wherein the attributes (5) include an interest factor for an object, in particular for a monitor (9), and a three dimensional location (14) of the at least one person (4); Generating with the data processing system (3) a descriptor (9) for each person based on at least the determined attributes (5) of the at least one person (4); Calculating an attention model (19) with a discretized space within the field of view (8) based on the descriptor (s) (9) with the data processing system (3), wherein the attention model (19) is configured to predict a probability of a person showing interest for the object.

    2-25. (Canceled)

    Description

    [0067] Non-limiting embodiments of the invention are described, by way of example only, with respect to the accompanying drawings, in which:

    [0068] FIG. 1: a schematic drawing of a data processing hardware system according to the invention,

    [0069] FIG. 2A: a flowchart of a part of a method according to the invention,

    [0070] FIG. 2B: a flowchart of the method according to the invention,

    [0071] FIG. 3: a top view of a recording device with persons in its field of view and their interest factor,

    [0072] FIG. 4: a top view of a recording device with two persons moving through the field of view,

    [0073] FIGS. 5A and 5B: a further top view of the recording device, wherein an object blocks a trajectory of persons moving through the field of view,

    [0074] FIG. 6: a series of top views of the recording device through time,

    [0075] FIG. 7: a top view of two recording devices with their fields of view,

    [0076] FIG. 8: a side view of the recording devices of FIG. 7,

    [0077] FIG. 9A: a flowchart of a method to determine a transformation matrix,

    [0078] FIG. 9B: a schematic drawing of another data processing hardware system according to the invention,

    [0079] FIG. 10: a top view of the recording devices of FIG. 7 in individualized form,

    [0080] FIG. 11: a second aspect of the recording devices as shown in FIG. 10,

    [0081] FIG. 12: another top view of the recording devices of FIG. 7, with details regarding a tracking of trajectories,

    [0082] FIGS. 13A and 13B: a side view of a recording device with multiple persons whose neck pose is detected and

    [0083] FIG. 14: a coordinate transformation.

    [0084] FIG. 1 shows a data processing hardware system 3 according to the invention. The data processing hardware system 3 comprises a data interface 11. At the data interface 11, the data processing hardware system 3 can receive information and send information. The data interface 11 is connected wirelessly or with wires to a recording device 2 and receives data sets from the recording device 2. Further, the data interface 11 is connected to a monitor 14. The data processing hardware system 3 sends instructions to the monitor 14 via the data interface 11. The instructions are based on data of the recording device 2.

    [0085] The recording device 2 is realized as a stereo camera. The stereo camera is adapted to record two RGB images. The three dimensional information is reconstructed with a multiple view geometry algorithm on a processing unit from the two images. The camera may have an integrated processing unit for the reconstruction. The data processing hardware may alternatively reconstruct the three dimensional information. Thereby, three dimensional information realized as three dimensional cloud points of the field of view is obtained. The recorded data is sent as data sets including image data and the recorded three dimensional cloud points to the data interface 11. The data processing hardware system 3 further includes an object detection component 12, an attribute assignment component 13, a movement tracker component 15, an object matching component 16 and an electronic memory 26 for storing an attention model 19 and a motion model 18. With these components, the recording device 2 calculates instructions for the monitor 14 (explained in detail with reference to FIGS. 2A and 2B).

    [0086] The instructions cause the monitor 14 to play a specific content. The content may be selected from a content library. The content is usually a video which is displayed on the monitor and an audio belonging to the video.

    [0087] FIG. 2 A shows a flowchart showing a part of the method according to the invention. First, the recording device 2 records an RGB image 6 and three dimensional information 7 realized as three dimensional cloud points. The camera has a field of view 8. The image 6 and the corresponding three dimensional cloud points 7 form a first data set 21 that is forwarded to the data processing hardware system 3. The object detection component 12 of the data processing hardware system 3 detects and identifies an object in the RGB image. For example, if the object includes certain characteristics such as arms ahead and legs, it may be identified as a person. If an object is identified as a person, further attributes are assigned to the person. One of the further attributes is a body skeleton detection. Optionally the object detection component 12 may also use the three dimensional information (dashed line).

    [0088] The body skeleton detection allows tracking of an orientation of the person. In particular, a head pose and an orientation of the body and the face indicate an interest in a particular object of the person. The head pose might point entirely or partly in the direction of the object. Based on the head pose and three dimensional location of the person it is projected, if the head frontal direction (yaw axis) is towards screen, it is counted as that person has an interest in screen at that particular location and moment. Later on it is calculated for how long the person is showing interest. Depending on this a factor is calculated which expresses the interest of the person in the object. Additionally, the object detection component 12 may detect the eyes and track pupils.

    [0089] The head pose may be determined by the attribute assignment component. Additionally, body skeleton tracking may determine the head pose as well. The combination of both can improve the accuracy of the determination of the head pose.

    [0090] The detected person is forwarded to the attribute assignment component 13. The attribute assignment component 13 assigns the current location 20 (see FIG. 3) to the detected person by using the three dimensional cloud points of the detected person 4. Then, the attribute assignment component 13 assigns the determined interest factor for the monitor to the person.

    [0091] As a result, the data processing hardware system 3 can calculate 40 an attention model 19. The attention model 19 is based on a discretized space of the field of view 8. The data processing hardware system 3 calculates 40 the discretized space 3 and the interest factor assigned to the location 20 (see FIG. 3) in the discretized space. Thereby, an interest factor is assigned to a particular location 20 of the discretized space. Based on this interest factor, it is predicted, whether a future person standing or walking through the location 20 will pay interest to the monitor or not.

    [0092] This process is repeated, for each person 4 detected in the field of view 8. Thereby, the discretized space of the attention model is filled with interest factors allowing a prediction over the entire field of view. The attention model may be used to calculate areas of different levels of interest.

    [0093] Such levels of interest are shown in a top view in FIG. 3.

    [0094] FIG. 3 shows a top view of the recording device 2 (camera) and its field of view 8. Within the field of view, two persons 4a and 4b are located. The person 4a has a head pose pointed directly directed towards a monitor 14 located below the recording device 2 (not shown). Thus, the discretized space of the person 4a (and the discretized spaces around) is assigned a high interest factor in the attention model. The head pose of person 4b does not point directly at the monitor 14 but the field of view of the person 4b includes the monitor 14. Person 4b might thus be able to observe the monitor. However, his interest is lower than the interest of person 4a. Thus, based on the head pose of person 4b, the space is assigned a lower interest factor.

    [0095] The attention model 19 thus includes discretized spaces 37 with a higher interest factor and discretized spaces 36, 37 with lower interest factors. This is indicated in FIG. 3 by the thickness of the color black over the different areas.

    [0096] FIGS. 2B and 4 show an advanced determination of the attention model 19. The determination and assignment of attributes is identical to the process shown in FIG. 2A. However, since the recording device 2 delivers a continuous stream of three dimensional cloud points and corresponding RGB images, the attention model 19 may be defined more precisely. In different frames of the received video, the same person may reoccur. This is detected with the object matching component 16. In each frame, the attribute assignment component 13 deduces attributes of the detected objects (i.e. persons). The attribute assignment component 13 assigns current positions as well as the found attributes to the detected persons.

    [0097] Then, the object matching component 16 compares the attributes between the persons. If sufficient attributes match, the object matching component 16 matches the person and identifies them as a matched person 17 in two different frames. Regularly the matched person 17 will have moved in between the frames. With the different positions provided by the three dimensional cloud points the movement tracker component 15 can determine a trajectory 24 of the person (see FIG. 4).

    [0098] The trajectories 24 of two persons passing through the field of view are shown in FIG. 4. Person 4a and person 4b are identified and matched at different positions. Thereby, the data processing hardware system 3 can detect the trajectories 24.

    [0099] FIG. 5A shows a plurality of detected trajectories 24. The trajectories 24 are the result of an obstacle 27 in the movement path of the persons. As a result, most trajectories cross the field of view instead of walking directly towards the recording device 2. These recorded past trajectories can be utilized by the movement tracker component 15 to develop the motion model 19. The motion model 15 predicts the movement of persons within the field of view. For example, if 80% of the trajectories 24 take a certain direction, while 20% turn in another direction, the motion model can provide a probabilistic estimation of the future trajectories 24 of the detected persons. This allows an estimation, where the detected persons are going to be in the future.

    [0100] FIG. 5B shows a prediction of the walking path of the person 4 walking through the field of view 8. As can be seen in FIG. 5B, the estimation is probabilistic and calculates a multitude of possible paths as well as their likelihood.

    [0101] FIG. 6 also shows a top view of the recording device 2 and the corresponding field of view 8. The recording device is shown at three different time stages. At point in time in the past 42 two persons 4a and 4b (labeled as “P1” and “P2” in the drawing) enter the field of view. The data processing hardware system 3 detects the two persons 4a and 4b and tracks their trajectories 24 until the present 43. At this point the data processing hardware system 3 calculates a probabilistic estimation of the trajectories 28 in the future 44.

    [0102] FIGS. 7 to 14 relate to the second aspect of the invention and to the calculation of a coordinate transformation matrix.

    [0103] FIG. 7 shows a top view of a first recording device 131 and a second recording device 132. The recording devices each have a field of view. The first recording device has a first field of view 108 and the second recording device has a second field of view 110. The fields of view overlap in an overlapping region 138. This can also be seen in the side view of FIG. 7 in FIG. 8. A person 104, which enters the first field of view 108 is detected by a data processing hardware system 103 (see FIG. 9) and tracked through the first field of view 108. As soon as the person 104 enters the second field of view 110 the person 104 is also detected in the data generated by the second camera 132.

    [0104] Thus, in the overlapping region 138 the person 104 may be detected in the data generated by both recording devices 131 and 132. The recording devices 131, 132 generate RGB image data 106 (see FIG. 9A). Further the recording devices are realized as stereo cameras, which enables them to generate three dimensional information realized as three dimensional cloud points 107 of the respective fields of view 108, 110.

    [0105] Each camera 131, 132 has its own coordinate system. The recording device 131 or 132 is at the origin of the coordinate system.

    [0106] Since each camera has an aperture angle 135, 136 and three dimensional information, each camera can determine the coordinates of all cloud points in its coordinate space.

    [0107] The flowchart shown in FIG. 9A and the data processing hardware system shown in FIG. 9B show how this data is processed in the data processing hardware system 103. The data processing hardware system may be realized as a server. In one embodiment, the data of the recording devices 131, 132 is transferred via a network, such as the Internet, to the server where the calculations according to FIG. 9 are made.

    [0108] However, it is preferred, that data processing hardware system 103 is realized as a computing module that is installed on-site.

    [0109] The RGB image data 106 and the three dimensional cloud points 107 are send as data sets from the recording devices 131, 132 to the data processing hardware system 103. The data processing hardware system 103 receives the data sets 121, 122 at an interface 111 and forwards the RGB image data 106 to an object detection component 112. Optionally, the three dimensional cloud points 107 may also be forwarded to the object detection compovent 112. The object detection component 112 detects a person 104 in the image data 106 based on attributes. In particular, the object detection component 112 may identify attributes characteristic for persons, e.g. legs, arms, a torso, a head or similar. Further, the object detection component identifies attributes that are characteristic for a person.

    [0110] The object and the attributes are then sent to an attribute assignment component 113, where the attributes as well as the current position identified by the three dimensional cloud points 107 belonging to the identified object are assigned to each person. This information is then aggregated in a descriptor 109.

    [0111] The data processing hardware system 103 receives a data set 121 with RGB image data and three dimensional cloud points from the first recording device 131 and a second data set 122 with RGB image data and three dimensional cloud points from the second recording device 132. Both data sets 121, 122 are analyzed in the way outlined above. The data sets 121 of the first recording device 131 and the data sets 122 of the second recording device 132 are analyzed independently and in each data set objects are detected and persons are identified.

    [0112] Persons 104 that are located in the overlapping region 138 will be identified in both data sets 121, 122. An object matching component 116 compares the attributes in the descriptors 109 and thereby identifies identical persons in the overlapping region 138. The identification of a person 104 in the overlapping region 138 allows the calculation 119 of a coordinate transformation matrix 127. A plurality (in particular at least 4) of three dimensional cloud points is associated to the person 104. The three dimensional cloud points are determined by the first and the second recording devices 131, 132 independently.

    [0113] The data processing hardware system 103 determines the position for the detected person in the coordinate system of the first camera 131 and in the coordinate system of the second camera 132.

    [0114] In a variant, the data processing may obtain the position of one or more body parts in the data sets 121, 122 and use the positions to calculate a coordinate transformation matrix for transposing the coordinates of the first camera coordinate system into the second camera coordinate system.

    [0115] FIG. 10 shows the same person 104 passing through the first field of view 108 and the second field of view 110 separately. As can be seen in FIG. 10, the recording devices 131, 132 record points 129, 128 along a trajectory 124 of the person. Thereby, the trajectory 124 can be reconstructed for each camera 131, 132 independently. Since the object matching component 116 determined the person to be identical, the trajectories 124 of the at least one person 104 can be matched. This is shown in FIG. 11 for the person 104 and a second person 140. Then, as can be seen from FIG. 12, the trajectories are matched and the trajectory provides a plurality of points which can be used for the calculation of the transformation matrix 127.

    [0116] This results in the reconstruction of the trajectory through the overlapping area 138 as can be seen from FIG. 12.

    [0117] Though any body part might be suitable, the three dimensional neck pose 142 is a particularly preferred tracking point for the persons 104 and 140 (see FIGS. 13A and 13B). The neck pose provides the advantage that it stays at a relatively constant height. Thus, if the neck pose is tracked along its way, a plane might be reconstructed from the trajectory that is parallel to the ground 141.

    [0118] This plane allows transforming the coordinates further into a coordinate system that allows a bird view. Such a coordinate system and its transformation are shown in FIG. 14.