Minimized Bandwidth Requirements for Transmitting Mobile HMD Gaze Data
20220404916 · 2022-12-22
Inventors
- Arnd Rose (Stahnsdorf, DE)
- Tom Sengelaub (Berlin, DE)
- Julia Benndorf (Berlin, DE)
- Marvin Vogel (Berlin, DE)
Cpc classification
G06F3/017
PHYSICS
G02B27/0093
PHYSICS
International classification
G02B27/00
PHYSICS
Abstract
A method is performed at a first device with a non-transitory memory, one or more processors, and a network interface. The method includes storing, in the non-transitory memory, reference data describing at least one reference object. The method includes receiving user data via the network interface at a time after completion of a user session of a user of a second device. The user behavior data includes a user behavior characteristic of the user of the second device at a plurality of times during the user session and respective time stamps indicative of the plurality of times during the user session. The method includes combining, using one or more processors, the user behavior data and the reference data based on the respective time stamps to generate data regarding user behavior during the user session with respect to the at least one reference object.
Claims
1. A method comprising: at a first device with one or more processors, non-transitory memory, and a network interface: storing, in the non-transitory memory, reference data describing at least one reference object; receiving, via the network interface, user behavior data at a time after completion of a user session of a user of a second device, wherein the user behavior data includes a user behavior characteristic of the user of the second device at a plurality of times during the user session and respective time stamps indicative of the plurality of times during the user session; and combining, using the one or more processors, the user behavior data and the reference data based on the respective time stamps to generate data regarding user behavior during the user session with respect to the at least one reference object.
2. The method of claim 1, further comprising, storing, in the non-transitory memory for access at a time after the user session has completed, the data regarding user behavior during the user session with respect to the at least one reference object.
3. The method of claim 1, wherein storing the reference data is independent of receiving the user behavior data.
4. The method of claim 1, further comprising receiving reference data from a data source different from the second device.
5. The method of claim 1, further comprising, after receiving the user behavior data from the second device, foregoing receiving any additional data from the second device.
6. The method of claim 1, wherein the user behavior characteristic is captured by the second device at the plurality of times during the user session.
7. The method of claim 1, wherein the at least one reference object is one of a reference coordinate system, a virtual object, or a video sequence.
8. The method of claim 1, wherein the reference data describes a scene model of a virtual scene including the at least one reference object.
9. The method of claim 8, wherein the reference data further indicates how the virtual scene changes during the user session.
10. The method of claim 9, wherein the references data further indicates how the virtual scene changes in response to an input of the user of the second device.
11. The method of claim 1, further comprising providing, at a time after the user session has completed, a visual representation of the generated data regarding user behavior during the user session with respect to the at least one reference object.
12. The method of claim 1, wherein the user behavior characteristic corresponds to user measurement information regarding the user of the second device.
13. The method of claim 12, wherein the user measurement information includes a gaze point and/or gaze direction of the user of the second device at the plurality of times during the user session.
14. The method of claim 12, wherein the user measurement information includes a position of the user, a pose of the user of the second device, or an orientation of the user of the second device at the plurality of times during the user session.
15. The method of claim 1, wherein the user behavior data indicates a plurality of interactions between the user of the second device and the at least one reference object during the user session.
16. The method of claim 15, wherein each of the respective time stamps is associated with a corresponding interaction of the plurality of interactions.
17. The method of claim 15, wherein the plurality of interactions together comprise a gesture performed by the user of the second device.
18. The method of claim 1, wherein the user behavior data includes synchronization data indicative of a temporal relationship between the user behavior characteristic and a virtual scene at the plurality of times during the user session.
19. A first device comprising: a non-transitory memory to store reference data describing at least one reference object; a network interface to receive user behavior data at a time after completion of a user session of a user of a second device, wherein the user behavior data includes a user behavior characteristic of the user of the second device at a plurality of times during the user session and respective time stamps indicative of the plurality of times during the user session; and one or more processors to combine the user behavior data and the reference data based on the respective time stamps to generate data regarding user behavior during the user session with respect to the at least one reference object.
20. A non-transitory computer-readable medium storing instructions which, when executed by a first device including one or more processors and a network interface, causes the first device to perform operations comprising: storing, in the non-transitory computer-readable memory, reference data describing at least one reference object; receiving, via the network interface, user behavior data at a time after completion of a user session of a user of a second device, wherein the user behavior data includes a user behavior characteristic of the user of the second device at a plurality of times during the user session and respective time stamps indicative of the plurality of times during the user session; and combining, using one or more processors, the user behavior data and the reference data based on the respective time stamps to generate data regarding user behavior during the user session with respect to the at least one reference object.
Description
[0047] In the following preferred embodiments of the invention are described with regard to the figures. Therein show:
[0048]
[0049]
[0050]
[0051]
[0052] In the figures elements that provide the same function are marked with identical reference signs.
[0053]
[0054] The invention especially applies in the field of virtual reality systems. Virtual reality can advantageously be used for a great variety of different applications. For example a virtual scene can be presented to a user by means of a display device, and the user can virtually walk around in this virtual scene and e.g. change the perspective of the few on the virtual scene a head movement. Also, there are many situations, for which it would be desirable to be able to share such a virtual reality user experience, which in this example is provided to a user by means of the first device 14, also with third parties, like an observer, an instructor or supervisor associated with the second device 16.
[0055] However, large amounts of data are associated with such virtual reality scenes, so that prior art systems are not capable of sharing such a virtual reality experience with third parties in a satisfactory manner. Especially a present barrier to field tests based on mobile augmented reality and virtual reality users is the resource overload of the mobile client when processing the 3D scene and transmitting large data amounts (gaze and referencing content data). Mobile client processing power limits or even avoids sharing a virtual reality scene with a third party. Additionally, available bandwidth for wireless networks limit high resolution transfer of scene data.
[0056] The invention and/or its embodiments however advantageously make it possible to reduce the necessary bandwidth to a minimum while allowing a complete recreation of the user experience with respect to the virtual reality. The recreation can be realized in real or near time to observe the user or can be stored/transmitted for an offline (timely decoupled) recreation.
[0057] According to an embodiment as presented in
[0058] Moreover, for capturing the user behavior with respect to the displayed virtual scene VRS, the first device 14 also comprises capturing means, which in this case comprise an eye tracking device 20a, 20b configured to determine the gaze direction and/or gaze point of the user with respect to the displayed virtual scene VRS and optionally further eye features or eye related features. In this case the eye tracking device 20a, 20b comprises two eye cameras 20b for continuously capturing images of the eyes of the user as well as an eye tracking module 20a, which in this case is part of the processing unit 21 of the head mounted display 14. The eye tracking module 20a is configured to process and analyze the images captured by the eye cameras 20b and on the basis of the captured images to determine the gaze direction and/or the gaze point of the user and/or further eye properties or eye features, like the pupil size, the frequency of eye lid closure, etc. Moreover, the first device 14 may also comprise further capturing means 22 different from an eye tracking device for capturing different or additional user behavior characteristics, like for example a gyroscope or a scene camera for capturing images of the environment of the user, on the basis of which e.g. a head orientation of the head of the user or head movement can be determined. The capturing means 22 may also comprise a microphone for capturing speech of the user. The first device may also comprise a controller (not shown), like a hand held controller to receive a user input. Such a controller can be configured as a separate physical entity and be communicatively coupled to the head mounted part of the first device 14. The first device 14 may also comprise not-head-worn capturing means, like a camera for capturing gestures or a pose of the user. So generally, the captured user data, namely the captured user behavior characteristic, among others may include any subset of:— [0059] a pose of the user; [0060] eye tracking data, like a point of regard, a gaze direction, a visual foci, a focal point, [0061] eye tracking events, like an eye attention, an eye fixation, [0062] a facial expression, like a blink, a smile, [0063] user emotions, like joy, hate, anger, [0064] user interactions, like speech, user events, a controller input, [0065] a position, like a position of the user, a position of one or both eyes of the user.
[0066] On the basis of the captured user behavior characteristics it can be determined for example, where a user is looking with respect to the displayed virtual scene VRS, or from which virtual point of view or perspective a user is currently looking at the displayed virtual scene VRS. These user behavior characteristics can now advantageously be transmitted in form of user behavior data UD to the second device 16 and be combined with the reference data that are, e.g. a priori, present on the second device 16. Therefore, these data relating to the virtual scene VRS, namely the reference data, do not have to be transmitted from the first device 14 to the second 16 device together with the user behavior data UD via the network 12, and therefore the data to be transmitted can be reduced to a minimum and at the same time allowing for a full recreation of the user behavior with respect to the virtual scene VRS.
[0067] So for example when the user associated with the first device 14 moves and interacts with a known virtual environment, which is displayed in form of the virtual scene VRS, e.g. when playing a game or walking through a virtual supermarket, it is only necessary to make information about the user's current state available on the second device 16 to recreate the user experience on the second device 16. The recreation may also be intentionally altered, e.g. upscaling or downscaling the resolution, for example in the region of the virtual scene VRS that comprises the user's current gaze point. In both a static and interactive virtual environment the unknown component is how the user moves and interacts with it, where the known component is the virtual environment itself. So advantageously only the user behavior characteristics with regard to the virtual environment, e.g. defined with respect to a defined coordinate system associated with the virtual scene VRS and being fixed with respect to the virtual scene VRS, can be captured and transmitted from the first device 14 to the second device 16, whereas the second device 16 is already provided with the data describing the virtual scene VRS, namely the reference data VRD, and the second device 16 can therefore advantageously combine these reference data VRD with the transmitted user behavior data UD to reconstruct the user behavior with regard to the virtual scene VRS. For this purpose, namely for the combination and recreation of the user behavior, the second device 16 can comprise a processing unit 24 with a data storage, in which the reference data VRD can be stored. Furthermore, the second device 16 can also comprise a display device 26, like a monitor, to display the result of the recreation of the user behavior with regard to the virtual scene VRS. For example the virtual scene VRS can be displayed on the display device 26 from the same perspective the user associated with the first device 14 is seeing the virtual scene VRS displayed by the first device 14.
[0068] Moreover, the reaction of the environment can be either deterministic or non-deterministic. In case of a deterministic virtual scene VRS, for the purpose of recreating the user experience, only user data, namely the user behavior characteristics as described above, are captured and made available to a third party or its technical device, like the second device 16, especially to at least one computer, host, or server of the third party. The third party or its technical device, like the second device 16, have access to the virtual scene VRS, especially by the provision of the reference data VRD on the second device 16, and the timely and/or regional marked captured user data transmitted in form of the user behavior data UD, to recreate the user experience and make it available.
[0069] In case of a nondeterministic scene, e.g. when the virtual scene VRS, especially the scene content, changes in response to a certain user action, it may be useful not only to capture the user state in form of the user behavior characteristic, but also the scene state in a timely or regional marked fashion. The captured scene data, which are provided in form of the reference data VRD, among others may then include a subset of: [0070] scene events and state changes, [0071] dynamic scene data, [0072] random scene content.
[0073] Also this process or procedure reduces the data to replay the session on the second device 16 to the minimum of necessary data to be transmitted via the network 12, because e.g. only the information about a certain event or change of the scene state but not the scene content itself needs to be transmitted. Also, the data can be streamed in real time or stored for later usage. Moreover, the state of the virtual scene VRS may not only change in response to a certain user action, but such a change can also be controlled or initiated by the second user, like a supervisor or observer, associated with the second device 16. For example, a second user associated with the second device 16 can initiate by means of the second device 16 a calibration of the eye tracker 20a, 20b of the first device 14, which causes the displays 18 to show a virtual scene VRS with calibration points. Such control commands can also be transmitted via the network 12 in form of control data CD from the second device 16 to the first device 14. This advantageously allows for real time interaction between the users of the first device 14 and second device 16 respectively.
[0074] Further the invention is beneficial with current CPU/GPU architectures where a transmission of a scene by the CPU would require a GPU memory access.
[0075] This system 10a allows for many advantages applications like a live streaming of one participant, like the user associated with the first device 14, to one client PC, like the second device 16, a live streaming to let other user watch what the user associated with the first device 14 is doing or a recording on mobile, like the first device 14, and later import by the second device 16.
[0076] For a live streaming of one participant to one client PC the method and system according to the invention or its embodiments allow for reducing bandwidth requirements for transmitting eye tracking data of a mobile user, like the user associated with the first device 14, or also a mobile user group, each user of the group associated with a respective first device 14, sharing the same augmented reality/virtual reality application. For this purpose a user is wearing a virtual reality head mounted display, like the first device 14, and it is interacting with a virtual content, while the eye tracker 20a, 20b tracks the user's gaze. The information of position, orientation, user action and gaze are being transmitted to an observer station, like the second device 16, using the same virtual reality model, provided by the reference data VRD, to re-render or newly render the scene including the users gaze behavior in it. Thus the observer can see the user's interactions, perceptions and performances in order to control, guide and or monitor the user's behaviors.
[0077] According to one possible implementation, a setup can be used, where the same application is compiled for a HMD (head mounted display) device, like the first device 14, as well as for a PC like the second device 16. Both applications know about the scene which will be rendered. Moreover, the application, especially the virtual scene VRS provided by the application, is rendered live on the user system, namely the first device 14. This system, namely the first device 14, may include a mobile device to run the application, a network connection, like the network interface 17a, to transfer the data or a local memory to store them, a head mounted display can generate a virtual reality experience and a controller to interact with the application. The session can then be replayed on a desktop PC, like the second device 16, using the generated data. Therefore, the observing application on the second device 16 re-renders or renders newly or renders again the scene and generates the same view as shown on the HMD of the first device 14. This can be used to guide and observe the user associated with the first device 14, analyze and/or aggregate the gaze perception data with other user data. A live connection between the user system, namely the first device 14, and the observing system, namely the second device 16, can also be used to remotely trigger events on the user system, e.g. by above described control data CD.
[0078] Both applications, the virtual reality application on the first device 14 as well as the observation application on the second device 16, know about the data describing the shown scene, namely the reference data VRD. These may include the 3D virtual reality model, reactions to input events and animations or visualizations. Therefore a system and method is provided for streaming user's pose, eye tracking data and events of one participant to one client PC, like the second device 16, and events of the client PC to one participant, like the first device 14, comprising a controller client, like the second device 16, and a group of mobile client devices, like the first device 14. The user's system, like the first device 14, will connect to the client's PC, like the second device 16, and stream continuously pose data, eye tracking data and triggered events. The client PC will send triggered events, e.g. starting a calibration, to the user associated with the first device 14. The network 12 in this example may be a local area network or a peer to peer network, wireless or cabled.
[0079] For an application like a live streaming to let other users watch what the user associated with the first device 14 is doing a similar implementation of the system 10a can be used as described above, but now the user data, namely the user behavior data UD, are transmitted via the internet (or intranet) as the network 12 and either a cloud service or the recipients processing unit, like the second device 16, is recreating the users view.
[0080] According to another example for recording on mobile and later import, the system 10a can be configured to save the user's pose, eye tracking data and events locally on the device, namely the first device 14, itself, and a system (PC), like the second device 16, is capable of importing the recorded file and running the scene. Using the recorded data, the view will move in the same way the user did, as well as events will be triggered.
[0081] According to another example of the invention also the user's pose, eye tracking data and events can be streamed into a cloud and the collected and rendered there, which is illustrated schematically in
[0082]
[0083] In this example the displaying of the virtual scene VRS, the capturing of the corresponding user behavior characteristics, the transmitting of the user behavior data UD as well as the reconstruction and displaying of the user behavior on the second device 16 is performed continuously in form of live streaming in real time.
[0084]
[0085] To conclude the invention and its embodiments allow for a plurality of advantageous applications, especially in the field of market research, scientific research, training of user behavior with mobile participants, game/experience streaming for online broadcast, or arrangement of a SDK (software development kit) user, providing a configured app to a server, a supervisor controlling the app, interacting with the participants' clients and especially monitoring the collective behavior of the participants, as well as allowing for a group of mobile eye tracked participants running the configured application.
[0086] Great advantages can be achieved by the invention or its embodiments, because the necessary data to be transmitted during a user session can be reduced to user's pose, user's action, user's current state including (but not limited to) eye tracking, emotional states and facial expression data for the purpose of recording, analyzing, streaming or sharing the user session.
[0087] The invention or its embodiments allow to transmit, stream and record user behavior in a virtual reality environment, like a mobile virtual environment, with minimal processing and bandwidth overhead. User behavior is encoded and transmitted in parallel to the user's interaction with the virtual environment. The encoded data can be interpreted by an independent processing unit to recreate the user's behavior.
[0088] Therefore the invention or its embodiments allow for field tests with concurrent HMD users in real time, for reducing bandwidth required to transmit user scene, for recording of user session, independent of user's display or interaction device and for reducing bandwidth demand needed for transmission and consequently enable the analysis of user perception at a central data location.
LIST OF REFERENCE SIGNS
[0089] 10a, 10b system [0090] 12 network [0091] 14 first device [0092] 16 second device [0093] 17a, 17b network interface [0094] 18 displaying means [0095] 20a eye tracking module [0096] 20b eye camera [0097] 21 processing unit of the first device [0098] 22 capturing means [0099] 24 processing unit of the second device [0100] 26 display device [0101] 28 third device [0102] CD control data [0103] UD user behavior data [0104] VRD reference data [0105] VRS virtual scene