G06V40/165

ACTIVE SPEAKER DETECTION USING IMAGE DATA
20230068798 · 2023-03-02 ·

A system can operate a speech-controlled device to perform active speaker detection to detect an utterance using image data showing a user speaking the utterance. This enables the device to perform utterance detection using the image data and/or determine which user is speaking the utterance. To perform active speaker detection, the device processes the image data to determine expression parameters associated with the user's face and generates facial measurements based on the expression parameters. For example, the device can use the expression parameters to generate a 3D model including an agnostic facial representation and determine a mouth aspect ratio by measuring a mouth height and a mouth width of the agnostic facial representation. As the mouth aspect ratio changes when the user is speaking, the device can determine that the user is speaking and/or detect an utterance based on an amount of variation of the mouth aspect ratio.

Automated avatar generation

Systems, devices, media, and methods are presented for generating facial representations using image segmentation with a client device. The systems and methods receive an image depicting a face, detect at least a portion of the face within the image, and identify a set of facial landmarks within the portion of the face. The systems and methods determine one or more characteristics representing the portion of the face, in response to detecting the portion of the face. Based on the one or more characteristics and the set of facial landmarks, the systems and methods generate a representation of a face.

AGE AND GENDER ESTIMATION

A method of age and gender estimation, comprising receiving an input image, detecting a facial image within the input image, estimating a head pose based on a set of facial image intensities of the facial image, wherein the head pose is expressed as a yaw, a pitch and a roll, determining whether the yaw, the pitch and the roll of the head pose is less than a predetermined threshold, aligning the facial image if the yaw, the pitch and the roll of the head pose are less than the predetermined threshold and predicting an age and a gender of the aligned facial image.

TAGGING OBJECTS WITH FAULT OVERRIDE PATTERNS DURING CALIBRATION OF VEHICLE SENSING SYSTEMS
20220327319 · 2022-10-13 · ·

A vision sensing system of a vehicle comprising a camera, an object detection module, and a calibration module. The object detection module is configured to detect a first object in data received from the camera. The calibration module is configured to calibrate the object detection module to detect the first object in the presence of a second object that obstructs a view of the camera and that includes a predetermined pattern sensed by the camera. A driver monitoring system for a vehicle comprises a camera and a driver monitoring module. The camera is arranged proximate to a steering wheel of the vehicle to monitor a face of a driver of the vehicle. The driver monitoring module is configured to detect an obstruction between the camera and the face of the driver and to ignore the obstruction in response to the obstruction including a predetermined pattern sensed by the camera.

Method, apparatus, device and storage medium for transforming hairstyle

A method, apparatus, device, and storage medium for transforming a hairstyle are provided. The method may include: determining a face bounding box according to information on face key points of acquired face image; constructing grids according to the face bounding box; deforming, by using an acquired target hairstyle function, edge lines of at least a part of the constructed grids, which comprises the hairstyle, to obtain a deformed grid curve; determining a deformed hairstyle in the face image according to the deformed grid curve.

Live data viewing security

The techniques utilize an authentication process to authenticate the user to view protected data and an image monitoring process to monitor the field of view of the image detection component. When a user requests access to the protected data, the authentication process is activated. After a user is authenticated, the data may be displayed and an image monitoring process is activated and may use the image detection component to monitor the field of view to determine whether the user is actively viewing the data or that an additional person is in the field of view. When either event is detected, the protected data is concealed at the display of the user device.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, LEARNING METHOD, AND STORAGE MEDIUM
20220326768 · 2022-10-13 · ·

The present invention provides an information processing apparatus that estimates a line of sight of a person, executing a first calculation of estimating a direction of a face of the person using a first model that outputs a calculation result of the direction of the face of the person when an image of the face of the person is input; and executing a second calculation of estimating the line of sight of the person using a second model that outputs a calculation result of the line of sight of the person when an image of at least one eye of the person is input, wherein the at least one processor is configured to change coefficients of the second model to be used in the second calculation, according to the direction of the face estimated in the first calculation.

System, method and storage medium for 2D on-screen user gaze estimation

A system and a method for performing 2D on-screen user gaze estimation using an input facial image of a user captured using a camera associated to a processing device having a display. The method and system allow automated user calibration through automatic recording of calibration samples each including a calibration facial image of the user and an interaction point corresponding to a point on the display where an occurrence of a user interaction was detected when the corresponding calibration image was captured. The system and method also optimize user-specific parameters using the calibration samples by iteratively minimizing a total difference between the interaction points of a plurality of the calibration samples and corresponding 2D gaze estimation results and convert an estimated 3D gaze direction into a 2D gaze estimation result corresponding to a point on the display, by applying the users-specific parameters.

Image segmentation and modification of a video stream
11663706 · 2023-05-30 · ·

Systems, devices, media, and methods are presented for segmenting an image of a video stream with a client device, identifying an area of interest, generating a modified area of interest within one or more image, identifying a first set of pixels and a second set of pixels, and modifying a color value for the first set of pixels.

Inward/outward vehicle monitoring for remote reporting and in-cab warning enhancements

Systems and methods are provided for intelligent driving monitoring systems, advanced driver assistance systems and autonomous driving systems, and providing alerts to the driver of a vehicle, based on anomalies detected between driver behavior and environment captured by the outward facing camera. Various aspects of the driver, which may include his direction of sight, point of focus, posture, gaze, is determined by image processing of the upper visible body of the driver, by a driver facing camera in the vehicle. Other aspects of environment around the vehicle captured by the multitude of cameras in the vehicle are used to correlate driver behavior and actions with what is happening outside to detect and warn on anomalies, prevent accidents, provide feedback to the driver, and in general provide a safer driver experience.