G06V40/175

Method and Apparatus for Generating Landmark

Provided is a method of transforming a landmark including: receiving an input image including a facial image of a first person and a landmark corresponding to the facial image; estimating a transformation matrix corresponding to the landmark; and calculating an expression landmark and an identity landmark corresponding to the input image by using the transformation matrix.

NORMALIZED THREE-DIMENSIONAL AVATAR SYNTHESIS AND PERCEPTUAL REFINEMENT
20220222892 · 2022-07-14 ·

A system, method, and apparatus for generating a normalized three-dimensional model of a human face from a single unconstrained two-dimensional image of the human face. The system includes a processor that executes instructions including receiving the single unconstrained two-dimensional image of the human face, using an inference network to determine an inferred normalized three-dimensional model of the human face based on the single unconstrained two-dimensional image of the human face, and using a refinement network to iteratively determine the normalized three-dimensional model of the human face with a neutral expression and unshaded albedo textures under diffuse lighting conditions based on the inferred normalized three-dimensional model of the human face.

MULTIVIEW NEURAL HUMAN PREDICTION USING IMPLICIT DIFFERENTIABLE RENDERER FOR FACIAL EXPRESSION, BODY POSE SHAPE AND CLOTHES PERFORMANCE CAPTURE
20220319055 · 2022-10-06 ·

A neural human performance capture framework (MVS-PERF) captures the skeleton, body shape and clothes displacement, and appearance of a person from a set of calibrated multiview images. It addresses the ambiguity of predicting the absolute position in monocular human mesh recovery, and bridges the volumetric representation from NeRF to animation-friendly performance capture. MVS-PERF includes three modules to extract feature maps from multiview images and fuse them to a feature volume, regress the feature volume to a naked human parameters vector, generating an SMPL-X skin-tight body mesh with a skeletal pose, body shape, and expression, and leverage a neural radiance field and a deformation field to infer the clothes as the displacement on the naked body using differentiable rendering. Clothed body mesh is obtained by adding the interpolated displacement vectors to the SMPL-X skin-tight body mesh vertices. The obtained radiance field is used for free-view volumetric rendering of the input subject.

Generating an automatic virtual photo album

The present disclosure relates to system(s) and method(s) for generating an automatic virtual photo album. The system receives a signal. The signal is configured to enable a rear camera and a front camera of a device to click a set of images. The set of images comprise a subset of front images and a subset of rear images. The subset of front images is further analysed. Based on the analysis, the system extracts a photographer mood. The system further links the photographer mood with the subset of rear images. Upon the linking, the system generates an automatic photo album.

LEARNING APPARATUS, LEARNING SYSTEM, AND NONVERBAL INFORMATION LEARNING METHOD

A learning apparatus includes circuitry. The circuitry receives an input of first label information to be given to a facial expression image indicating a face of a person. The circuitry estimates second label information to be given to the facial expression image based on an interpolated image generated using the facial expression image and line-of-sight information indicating a direction of a line of sight of an annotator, the direction being detected at a time when the input is received. The circuitry calculates a difference between the first label information of which the input is received and the estimated second label information. The circuitry updates a parameter used for processing of estimating the second label information based on the calculated difference.

DETERMINING A MOOD FOR A GROUP
20220189201 · 2022-06-16 ·

A system and method for determining a mood for a crowd is disclosed. In example embodiments, a method includes identifying an event that includes two or more attendees, receiving at least one indicator representing emotions of attendees, determining a numerical value for each of the indicators, and aggregating the numerical values to determine an aggregate mood of the attendees of the event.

Detecting Facial Expressions in Digital Images

A method and system for detecting facial expressions in digital images and applications therefore are disclosed. Analysis of a digital image determines whether or not a smile and/or blink is present on a person's face. Face recognition, and/or a pose or illumination condition determination, permits application of a specific, relatively small classifier cascade.

Control of a computer via distortions of facial geometry
11275435 · 2022-03-15 ·

A system which, with data provided by one or more sensors, detects a user's alteration of the geometries of parts of his face, head, neck, and/or shoulders. It determines the extent of each alteration and normalizes it with respect to the maximum possible range of each alteration so as to assign to each part-specific alteration a numeric score indicative of its extent. The normalized part-specific scores are combined so as to produce a composite numeric code representative of the complete set of simultaneously-executed geometric alterations. Each composite code is translated, or interpreted, relative to an appropriate context defined by an embodiment, an application executing on an embodiment, or by the user. For example, each composite code might be interpreted as, or assigned to, a specific alphanumeric letter, a color, a musical note, etc. Through the use of this system, a user may communicate data and/or commands to a computerized device, while retaining full use of his hands and his voice for other tasks, and while being free to focus his visual attention on something other than the system.

Determining a mood for a group
11301671 · 2022-04-12 · ·

A system and method for determining a mood for a crowd is disclosed. In example embodiments, a method includes identifying an event that includes two or more attendees, receiving at least one indicator representing emotions of attendees, determining a numerical value for each of the indicators, and aggregating the numerical values to determine an aggregate mood of the attendees of the event.

Human-machine interaction processing method and apparatus thereof

Embodiments of the present disclosure provide a human-machine interaction processing method, an apparatus thereof, a user terminal, a processing server and a system. On the user terminal side, the method includes: receiving an interaction request voice inputted from a user, and collecting video data of the user when inputting the interaction request voice; obtaining an interaction response voice corresponding to the interaction request voice, where the interaction response voice is obtained according to expression information of the user when inputting the interaction request voice and included in the video data; and outputting the interaction response voice to the user. The method imbues the interaction response voice with an emotional tone that matches the current emotion of the user, so that the human-machine interaction process is no longer monotonous, greatly enhancing the user experience.