G06V40/176

METHOD AND SYSTEM FOR PROCESSING CONFERENCE USING AVATAR

A non-transitory computer-readable recording medium may storing instructions that, when executed by a processor, cause the processor to set a communication session for a conference in which a plurality of users participates through a server, transmit, to the server, an identifier of an avatar to be represented on a virtual space for the conference and coordinate information of the avatar in the virtual space, receive, from the server, resources of neighboring avatars selected based on the coordinate information, transmit, to the server, motion data of the avatar through the communication session, receive, from the server, motion data of the neighboring avatars through the communication session, and represent the neighboring avatars on the virtual space based on the resources of the neighboring avatars and the motion data of the neighboring avatars.

System and method for tracking physical characteristics of multifunction peripheral users to determine device user satisfaction

A system and method for tracking multifunction peripheral device user satisfaction captures user images and audio during device operation. User characteristics such as gestures, posture, spoke words or facial expressions are used in conjunction with device status information to determine whether the user is satisfied with the device. If not, remedial action is initiated, such as launching a virtual assistant on the multifunction peripheral touchscreen or summoning of a human assistant.

METHOD AND SYSTEM FOR DETECTING AND RECOGNIZING TARGET IN REAL-TIME VIDEO, STORAGE MEDIUM, AND DEVICE

This disclosure provides a method and a system for detecting and recognizing a target object in a real-time video. The method includes: determining whether a target object recognition result R.sub.X-1 of a previous frame of image of a current frame of image is the same as a target object recognition result R.sub.X-2 of a previous frame of image of the previous frame of image; performing target object position detection in the current frame of image by using a first-stage neural network to obtain a position range C.sub.X of a target object in the current frame of image when the two recognition results R.sub.X-1 and R.sub.X-2 are different; or determining a position range C.sub.X of a target object in the current frame of image according to a position range C.sub.X-1 of the target object in the previous frame of image when the two recognition results R.sub.X-1 and R.sub.X-2 are the same; and performing target object recognition in the current frame of image according to the position range C.sub.X by using a second-stage neural network. Therefore, the operating frequency of the first-stage neural network used for position detection is reduced, the recognition speed is accelerated, and the usage of CPU and internal memory resources is reduced.

Automated sign language translation and communication using multiple input and output modalities
11557152 · 2023-01-17 · ·

Methods, apparatus and systems for recognizing sign language movements using multiple input and output modalities. One example method includes capturing a movement associated with the sign language using a set of visual sensing devices, the set of visual sensing devices comprising multiple apertures oriented with respect to the subject to receive optical signals corresponding to the movement from multiple angles, generating digital information corresponding to the movement based on the optical signals from the multiple angles, collecting depth information corresponding to the movement in one or more planes perpendicular to an image plane captured by the set of visual sensing devices, producing a reduced set of digital information by removing at least some of the digital information based on the depth information, generating a composite digital representation by aligning at least a portion of the reduced set of digital information, and recognizing the movement based on the composite digital representation.

AVATAR GENERATION IN A VIDEO COMMUNICATIONS PLATFORM

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for generating an avatar within a video communication platform. The system may receive a selection of an avatar model from a group of one or more avatar models. The system receives a first video stream and audio data of a first video conference participant. The system analyzes image frames of the first video stream to determine a group of pixels representing the first video conference participant. The system determines a plurality of facial expression parameter associated with the determined group of pixels. Based on the determined plurality of facial expression parameter values, the system generates a first modified video stream depicting a digital representation of the first video conference participant in an avatar form.

Multimodal inputs for computer-generated reality
11698674 · 2023-07-11 · ·

Implementations of the subject technology provide determining an operating mode of an electronic device based at least in part on whether the electronic device is communicatively coupled to an associated base device. Based on the determined operating mode, the subject technology identifies a set of input modalities for initiating a recording of content within a field of view of the electronic device. The subject technology monitors sensor information generated by at least one sensor included in, or communicatively coupled to, the electronic device. Further, the subject technology initiates the recording of content within the field of view of the electronic device when the monitored sensor information indicates that at least one of the identified set of input modalities has been triggered.

ANIMATED CHAT PRESENCE
20230216901 · 2023-07-06 ·

The present invention relates to a method for generating and causing display of a communication interface that facilitates the sharing of emotions through the creation of 3D avatars, and more particularly with the creation of such interfaces for displaying 3D avatars for use with mobile devices, cloud based systems and the like.

Apparatus and method for associating images from two image streams

An apparatus configured to, based on first imagery (301) of at least part of a body of a user (204), and contemporaneously captured second imagery (302) of a scene, the second imagery comprising at least a plurality of images taken over time, and based on expression-time information indicative of when a user expression of the user (204) occurs, provide a time window (303) temporally extending from a first time (t−1) prior to the time (t) of the expression-time information, to a second time (t−5) comprising a time equal to or prior to the first time (t−1), the time window (303) provided to identify at least one expression-causing image (305) from the plurality of images of the second imagery (302) that was captured in said time window, and provide for recordal of the at least one expression-causing image (305) with at least one expression-time image (306) comprising at least one image from the first imagery (301).

SYSTEMS AND METHODS FOR AUTOMATED REAL-TIME GENERATION OF AN INTERACTIVE AVATAR UTILIZING SHORT-TERM AND LONG-TERM COMPUTER MEMORY STRUCTURES

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.

Second factor authentication of electronic devices

A method for a multi-factor authentication, the method receives results of an initial authentication of a user. Responsive to confirming the initial authentication, an image of a secondary set of authentication options is presented. An option selection is received from the user, wherein the selection is determined by tracking eye movement of the user over the image that includes the set of second factor authentication options. User facial activity is tracked corresponding to the selection made from the secondary set of authentication options. The monitored facial activity is compared to a pre-established authentication condition to determine whether a match exists with the selected secondary set of authentication options, and responsive to facial activity monitored matching the authentication condition pre-established by the user and corresponding to the selection made from the secondary set of authentication options, authentication of the user is confirmed.