HUMAN ACTION RECOGNITION AND ASSISTANCE TO AR DEVICE
20230105603 · 2023-04-06
Assignee
Inventors
Cpc classification
Y02P90/02
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G05B2219/32005
PHYSICS
G05B2219/31046
PHYSICS
G05B2219/31027
PHYSICS
G05B19/4183
PHYSICS
International classification
Abstract
A method for operating a monitoring unit (100), configured to monitor a manipulation of at least one object (40, 45) by a user, the method comprising: —receiving, via a cellular network (60), actual user position data comprising an actual position of at least one portion of user used to manipulate the at least one object, and actual object position data comprising an actual position of the at least one object, —matching the actual user position data to predefined user position data provided at the monitoring unit (100), the predefined user position data indicating a correct position of the at least one portion of the user for manipulating the at least one object, and matching the actual object position data to predefined object position data provided at the monitoring unit, the predefined object position data indicating a correct position of the at least one object, —determining, based on the matching, whether the manipulation of the at least one object by the at least one portion of the user is correct or not.
Claims
1. A method for operating a monitoring unit configured to monitor a manipulation of at least one object by a user, the method comprising: receiving, via a cellular network, actual user position data comprising an actual position of at least one portion of user used to manipulate the at least one object, and actual object position data comprising an actual position of the at least one object; matching the actual user position data to predefined user position data provided at the monitoring unit, the predefined user position data indicating a correct position of the at least one portion of the user for manipulating the at least one object, and matching the actual object position data to predefined object position data provided at the monitoring unit, the predefined object position data indicating a correct position of the at least one object; and determining, based on the matching, whether the manipulation of the at least one object by the at least one portion of the user is correct or not.
2. The method of claim 1, wherein determining whether the manipulation is correct or not comprises determining that the manipulation of the at least one object is not correct when the actual user position data differ from the predefined user position data by more than a threshold, wherein a notification is transmitted to the user over the network that the manipulation is not correct.
3. The method of claim 1, further comprising determining in a sequence of steps to be carried out by the user a next step to be carried out by user, a next position of the at least one portion of the user in the next step, wherein the matching comprises determining whether the actual user position data are in agreement with the next position.
4. The method of claim 1, further comprising determining, based on the matching, an instruction for the user how to manipulate the at least one object, wherein the instruction is sent to the user.
5. The method of claim 4, further comprising dividing an environment reachable by the user into different sections, the different sections comprising a preferred section in which the correct position of the at least one portion of the user is located, wherein information about the preferred section is transmitted to the user such that the information about the preferred section can be displayed to the user as augmented reality.
6. The method of claim 1, further comprising determining more detailed information about the at least one object describing the at least one object in more detail, wherein the more detailed information is sent to the user.
7. The method of claim 1, wherein matching the actual user position data to the predefined user position data comprises: determining an actual object position from the received actual object position data and an actual user position from the actual user position data; comparing the actual object position and the actual user position to a plurality of position sets, each position set comprising a desired object position and with a corresponding desired user position, in order to find the position set best matching the actual user position and the actual object position, the best matching position set comprising a best matching position of the portion of the user and a best matching position of the object; determining, in a sequence of steps to be carried out by the user the next step to be carried out by user, a next position of the at least one portion of the user in the next step; and determining whether the next position is within a threshold distance to the best matching position of the portion of the user.
8. The method of claim 7, further comprising indicating to the user the next position of the at least one portion of the user.
9. The method of claim 1, wherein matching the actual user position data to predefined user position data comprises at least one of a nearest neighbor clustering and a k-means clustering.
10. The method of claim 1, further comprising collecting the predefined object position data and the predefined user position data including monitoring a plurality of user manipulating the at least one object in a sequence of steps.
11. The method of claim 1, further comprising: receiving first image data generated by first image sensor comprising 2 dimensional images of the user and its direct environment, receiving second image data generated by a second image sensor different from the first image sensor, the second image data comprising further images comprising an additional depth information, generating fused image data based on the first image data and the second image data in a common coordinate system, the fused image data comprising the actual position of the at least one portion of the user and the actual position of the at least one object in the common coordinate system, wherein receiving the actual user position data and the actual object position data comprises receiving the fused image data.
12. The method of claim 11, wherein the actual position of the at least one portion is determined based on the second image data, wherein the actual position of the at least one object is determined based on the first image data.
13. The method of claim 11, wherein the fused image data are 2 dimensional image data
14. (canceled)
15. A monitoring unit configured to monitor a manipulation of at least of object by a user, the monitoring unit comprising: a memory; and at least one processing unit, the memory containing instructions executable by said at least one processing unit, wherein the monitoring unit is operative to: receive, via the cellular network, actual user position data comprising an actual position of at least one portion of user used to manipulate the at least one object, and actual object position data comprising an actual position of the at least one object; match the actual user position data to predefined user position data provided at the monitoring unit, the predefined user position data indicating a correct position of the at least one portion of the user for manipulating the at least one object, and matching the actual object position data to predefined object position data provided at the monitoring unit, the predefined object position data indicating a correct position of the at least one object; and determine, based on the matching, whether the manipulation of the at least one object by the at least one portion of the user is correct or not.
16. The monitoring unit of claim 15, further being operative, for determining whether the manipulation is correct or not, to determine that the manipulation of the at least one object is not correct when the actual user position data differ from the predefined user position data by more than a threshold, and to transmit a notification to the user over the network that the manipulation is not correct.
17. The monitoring unit of claim 15, further being operative to determine in a sequence of steps to be carried out by the user a next step to be carried out by user, a next position of the at least one portion of the user in the next step, and to determine, in the matching, whether the actual user position data are in agreement with the next position.
18. The monitoring unit of claim 15, further being operative, to determine, based on the matching, an instruction for the user how to manipulate the at least one object, and to send the instruction to the user.
19-26. (canceled)
27. The monitoring unit claim 15, comprising a first monitoring module and a second monitoring module, wherein the first monitoring module is operative to receive the actual user position data and the actual object position data, and to carry out the matching, wherein the second monitoring module is operative to receive the first and second image data and to generate the fused image data.
28. A viewing apparatus for a user comprising. at least one lens through which the user visually perceives a field of view in which at least one object is located; a projecting unit configured to project information onto the lens so that the user wearing the viewing apparatus perceives the field of view to which the projected information is added; and a receiver configured to receive an instruction via a cellular network from a monitoring unit, the instruction indication how to manipulate the at least one object, wherein the projecting unit is configured to translate the received instruction into operating information by which the user is informed whether the manipulation is correct or not.
29. (canceled)
30. A non-transitory computer readable medium storing a computer program comprising program code to be executed by at least one processing unit of a monitoring unit, wherein execution of the program code causes the monitoring unit to perform the method of claim 1.
31. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] The foregoing and additional features and effects of the application will become apparent from the following detailed description when read in conjunction with the accompanying drawings in which like reference numerals refer to like elements.
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
DETAILED DESCRIPTION OF DRAWINGS
[0029] In the following, embodiments of the invention will be described in detail with reference to the accompanying drawings. It is to be understood that the following description of embodiments is not to be taken in a limiting sense. The scope of the invention is not intended to be limited by the embodiments described hereinafter or by the drawings, which are to be illustrative only.
[0030] The drawings are to be regarded as being schematic representations, and elements illustrated in the drawings are not necessarily shown to scale. Rather, the various elements are represented such that their function in general purpose becomes apparent to a person skilled in the art. Any connection or coupling between functional blocks, devices, components of physical or functional units shown in the drawings and described hereinafter may also be implemented by an indirect connection or coupling. The coupling between the components may be established over a wired or wireless connection. Functional blocks may be implemented in hardware, software, firmware, or a combination thereof.
[0031] As will be discussed below, a method is provided in which the human behavior is detected in an environment where a user is manipulating an object. Environment may be a smart factory or any other location. The behavior is detected by combining the human action recognition such as a hand tracking with object recognition algorithms in order to provide support for the user. The movement of at least a portion of the user such as the hand or any other part of the body is correlated with the identified objects with predefined user position data and predefined object position data in order to determine a correct or erroneous behavior. Furthermore, it is possible that instructions are provided to the user which is wearing a viewing apparatus such as a headset with an augmented reality feature, in the following called AR headset.
[0032]
[0033] Furthermore, another image sensor 30 is provided wherein this further image sensor is configured to generate images with 2D information such as RGB images. In the embodiment shown, the 3D image sensor 70 is provided at the user, wherein the 2D image sensor is fixedly installed. However, it should be understood that the 2D image sensor is provided at the user whereas the 2D image sensor is located in the neighborhood of the user. The 3D image sensor may be located at the headset or may be connected to gloves that the user is wearing for manipulating the object.
[0034] Furthermore, a cellular network 60 is provided via which the image data generated by the two image sensors are transmitted to a monitoring unit 100. The cellular network can be a mobile communication network, such as an LTE or 5G network. The monitoring unit 100 may be provided in the cloud as illustrated by reference numeral 50. Preferably, the monitoring unit is located at the edge of the cloud or at the edge of the mobile communications network or cellular network 60. The monitoring unit can include a frontend and a backend as shown later. The frontend may be located at the edge of the mobile communications or cellular network, wherein the backend can be located at an application outside the network 60. In another example, both the frontend and the backend are located at the edge of the cellular network. Furthermore it is possible that the monitoring unit 100 is not divided into different parts, but is provided as a one piece unit. The transmission of the image data from image sensor 30 is illustrated by arrow 1. At the monitoring unit 100 the object detection is carried out wherein artificial intelligence may be used for the object detection and recognition. Furthermore, any other known method for object detection may be used. This is shown by reference numeral 2. Furthermore, the image data from the second sensor 70 is also transmitted to the monitoring unit via the cellular network 60. When the image sensor provided at the user is providing the 3D data including the depth information, it is possible to detect in step 4 the interaction of the user, especially the tracking of the hand or any other part on the identified object. A database 80 may be provided which stores a manipulation plan which indicates which manipulation of the object should be carried out at what situation and which knows the correct positions of the user and the object during manipulation. For example, a database can correspond to a digital twin where physical objects and their associated digital data such as manipulation plan are stored. In step 5, it is possible to detect whether the manipulation of the user is in line with a desired manipulation as will be discussed in further detail below. This can include a matching process in which the actual user position is matched to a correct user position deduced from the database 80.
[0035] Summarizing, as shown in
[0036] As will be explained below, the 2D image data and the image data comprising the depth information are fused in order to generate fused image data wherein these fused image data comprise the actual position of the user and the actual position of the object in a common coordinate system. The fused image data are then compared to the position sets in the database 80 and the position set is identified which best matches the fused image data. Then it is checked whether the position of the user relative to the object to be manipulated is correct or not. The position set that best matches the used image data comprises an object position and a user position. When it is determined that the object and user, or part of the user (hand) position is in line with the next steps to be carried out by the user, the manipulation is corrected. Otherwise, it is not correct as the user may manipulate the wrong object or the right object in a wrong way.
[0037] The scenario shown in
[0038] In a further scenario, the user may simply use the image data for consulting the database about an object to be manipulated. Here, the database comprises additional information such as the exact definition of the object that is identified, additional parameters such as the temperature of the object, which part of a complete assembly the object belongs to etc. Accordingly, in this scenario more detailed information is provided about the object which describes the object in more detail, and this object information is then sent to the user and displayed to the user. This scenario can be part of a quality inspection or quality management where assistance is provided to the user in inspecting a site comprising several objects.
[0039] In a further scenario, the database 80 may be consulted for providing a state information about the identified object to the user. This can be state information at an assembly line or a diagnosis of the identified objects including any errors which are identified in the assembled object. This kind of state information is provided to the user on the AR headset. Accordingly the database can include quality information of parts and/or can include vehicle information at assembly lines. The database can thus be used for a quality inspection and/or for diagnosis of the object the user is looking at.
[0040]
[0041]
[0042] In most of the examples give, the hand of the user is used as portion of the user manipulating the object. It should be understood that any other part of the user may be used.
[0043]
[0048] The database 80 can be manually or automatically updated to include such behavior and can furthermore include the additional information such as the diagnosis information or the quality information for certain objects as mentioned above. The object identification and the human action recognition can be enhanced with location information coming from the image sensors when determining the object at a specific location. The present application is not limited to hand tracking, for instance, a pointer tracking from the AR device or a gesture recognition might also be used. The same principle can be used to detect the actual parts or objects the user is inspecting so that the corresponding information can be provided from a database for the quality inspection and for the diagnosis.
[0049]
[0050] After matching the resulting 2D points for objects and the user part are provided as input to the back end in step S15. The input object and hand (user) coordinates are compared to the correct positions in the database in step S16 so that there is a correlation carried out between the hand and the objects in the database. This step S16 can include an input of 2D points for objects and hands or user parts which are correlated to the actual positions in the database. Here, a nearest neighbor k-means clustering algorithms may be used. Based on the correlation, the matching hand and the object locations in the database are determined and a correct or erroneous behavior is retrieved from the database in step S17. As explained above, the correct positions are determined for the current task and were added to the database before runtime. The result of the behavior is then transmitted to the front end part in step S18. If the behavior is correct no information may be provided to the user in the display. However, it is also possible that the correct behavior is displayed in a kind of heat map as discussed above, or if the behavior is wrong the correct behavior may be displayed as augmented reality to the user so that the user is informed how to carry out the next manipulation step.
[0051] The processing at the back end may use as an input the 2D object and hand positions and can use k-means or nearest neighbor algorithm or any other correlation algorithm to find the correct hand and object positions in the database. The correct or erroneous behavior is then determined based on the hand and object locations and based on the position sets comprising the desired object positions and the desired user positions. The notification transmitted in step S19 to the user can, in case of a correct behavior, comprise the information that the user has completed the task successfully and can provide instructions for the next task, or even no notification is sent. On the other hand, in case of an erroneous behavior, the type of error is detected and the user is notified about the error and instructions may be repeated until the task is completed.
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058] From the above said some general conclusions can be drawn.
[0059] For determining, whether the manipulation is correct or not, it is possible to determine that the manipulation of the at least one object is not correct when the actual user position data differ from the predefined user position data by more than a threshold. In this case, a notification can be transmitted to the user over the cellular network that the manipulation is not correct. Applied to the example shown in
[0060] Furthermore, the monitoring unit may determine in a sequence of steps to be carried out by the user a next step to be carried out by the user and a next position of the at least one portion of the user in the next step. The matching then comprises the step of determining whether the actual user position data are in agreement with the next position. The monitoring unit may have monitored the different actions by the user and knows which of the manipulation steps has to be carried out in the next step. Accordingly, the next position the user should take is known and the matching comprises the step of determining whether the user is actually moving close to the position which was determined as the next position and is doing in the right manipulation.
[0061] Furthermore, it is possible that, based on the matching step, an instruction is generated for the user how to manipulate the at least one object and this is instruction is then sent to the user.
[0062] It is possible to divide the environment which is reachable by the user into different sections, such as the four sections shown in
[0063] Furthermore, it is possible that a more detailed information about the at least one object is determined which describes the at least one object in more detail and this more detailed information is also transmitted to the user.
[0064] For the matching the actual user position data to the predefined user position data, the following steps may be carried out:
[0065] an actual object position is determined from the received actual object position data and an actual user position is determined from the actual user position data. The actual object position and the actual user position are then compared to a plurality of position sets. Each position set can comprise a desired object position and a corresponding desired user position the comparing is used in order to find the position set best matching the actual user position and the actual object position and the best matching position set comprises best matching position of the portion of the user and a best matching position of the object. Furthermore, it is determined in a sequence of steps to be carried out by the user the next step to be carried out by the user and a next position of the user in this next step. Furthermore, it is determined whether the next position is within a threshold distance to the best matching position of the portion to the user. As discussed in connection with
[0066] Furthermore, it is possible to indicate the next position the user should take for the manipulation of the object to the user.
[0067] The matching step of the actual user position data to the predefined user position can comprise methods such as nearest neighbor clustering or k-means clustering.
[0068] Furthermore, it is possible to collect the predefined object position data and the predefined user position data including monitoring a plurality of user manipulations in which the user is manipulating the at least one object in a sequence of steps. The populating of the database was discussed above inter alia in
[0069] The above-mentioned steps have been mainly carried out in the back end part of the monitoring unit. If it is considered that the monitoring unit also comprises the front end, the monitoring unit also receives first image data generated by the first image sensor which comprises the 2D images of the user and its direct environment. The monitoring unit further receives the second image data generated by a second image sensor which is different from the first image sensor and which comprises further images comprising an additional depth information. The monitoring unit generates fused image data based on the first image data and the second image data in a common coordinate system wherein the fused image data comprise the actual position of the user or at least the portion of the user and which comprise the actual position of the at least one object in the common coordinate system. When the actual user position data and the actual object position data are received, the fused image data are received.
[0070] The actual position of the user or of the portion of the user is determined based on the second image data, namely the 3D image data wherein the actual position of the at least one object may be determined based on the first image data, the 2D image data. The fused image data may also be implemented as 2D image data.
[0071] The receiving of the image data and the generation of the fused image may be carried out by a first part of the monitoring unit wherein the matching and the determining whether the manipulation of the user is correct or not may be carried out by a second part of the monitoring unit which can be located at another location.
[0072] Summarizing, a method is provided for determining a behavior of a user and for especially determining whether the user is correctly or erroneously manipulating an object. Furthermore, support is provided for the user wearing the headset. The method can use algorithms for object recognition and human action recognition, such as the hand tracking to build and consult a back end system, here the monitoring unit at real-time in order to track the user's behavior. Furthermore, feedback can be provided to the user whether the behavior or the manipulation is correct or not.