G10L21/10

Systems and methods for communicating with vision and hearing impaired vehicle occupants

Methods and systems for controlling an occupant output system associated with a vehicle are provided. The methods and systems receive vehicle or occupant context data from a source of vehicle context data, generate occupant message data based on the vehicle or occupant context data and determine if an occupant associated with the occupant output system is vision or hearing impaired. When the occupant is determined to be vision or hearing impaired, the methods and systems decide on an output modality to assist the occupant, and generate an output for the occupant on the output device, and in the output modality, based on the occupant message data.

DYNAMIC ADAPTATION OF GRAPHICAL USER INTERFACE ELEMENTS BY AN AUTOMATED ASSISTANT AS A USER ITERATIVELY PROVIDES A SPOKEN UTTERANCE, OR SEQUENCE OF SPOKEN UTTERANCES
20230035713 · 2023-02-02 ·

Implementations described herein relate to an automated assistant that iteratively renders various GUI elements as a user iteratively provides a spoken utterance, or sequence of spoken utterances, corresponding to a request directed to the automated assistant. These various GUI elements can be dynamically adapted as the user iteratively provides the spoken utterance to assist the user with efficiently completing the request. In some implementations, a generic container graphical element associated with candidate intent(s) can be initially rendered at a display interface of a computing device and dynamically adapted with tailored container graphical elements as a particular intent is determined while the user iteratively provides the spoken utterance. In additional or alternative implementations, the tailored container graphical elements can include a current status of one or more settings associated with the computing device or additional computing device(s) such that the user can view the current status while completing the spoken utterance.

DYNAMIC ADAPTATION OF GRAPHICAL USER INTERFACE ELEMENTS BY AN AUTOMATED ASSISTANT AS A USER ITERATIVELY PROVIDES A SPOKEN UTTERANCE, OR SEQUENCE OF SPOKEN UTTERANCES
20230035713 · 2023-02-02 ·

Implementations described herein relate to an automated assistant that iteratively renders various GUI elements as a user iteratively provides a spoken utterance, or sequence of spoken utterances, corresponding to a request directed to the automated assistant. These various GUI elements can be dynamically adapted as the user iteratively provides the spoken utterance to assist the user with efficiently completing the request. In some implementations, a generic container graphical element associated with candidate intent(s) can be initially rendered at a display interface of a computing device and dynamically adapted with tailored container graphical elements as a particular intent is determined while the user iteratively provides the spoken utterance. In additional or alternative implementations, the tailored container graphical elements can include a current status of one or more settings associated with the computing device or additional computing device(s) such that the user can view the current status while completing the spoken utterance.

SYSTEM, METHOD, AND COMPUTER PROGRAM FOR TRANSMITTING FACE MODELS BASED ON FACE DATA POINTS
20230093132 · 2023-03-23 ·

A system, method, and computer program are provided for receiving face models based on face nodal points. In use, a real-time face model is received, wherein the real-time face model includes one or more face nodal points. Real-time face nodal points are received, including additional one or more face nodal points. The real-time face model is manipulated based on the real-time face nodal points.

SYSTEM, METHOD, AND COMPUTER PROGRAM FOR TRANSMITTING FACE MODELS BASED ON FACE DATA POINTS
20230093132 · 2023-03-23 ·

A system, method, and computer program are provided for receiving face models based on face nodal points. In use, a real-time face model is received, wherein the real-time face model includes one or more face nodal points. Real-time face nodal points are received, including additional one or more face nodal points. The real-time face model is manipulated based on the real-time face nodal points.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO BY USING TEXT
20220351439 · 2022-11-03 ·

A device for generating a speech video according to an embodiment has one or more processor and a memory storing one or more programs executable by the one or more processors, and the device includes a video part generator configured to receive a person background image of a person and generate a video part of a speech video of the person; and an audio part generator configured to receive text, generate an audio part of the speech video of the person, and provide speech-related information occurring during the generation of the audio part to the video part generator.

LEARNING DEVICE AND METHOD FOR GENERATING IMAGE
20220351348 · 2022-11-03 ·

A learning device for generating an image according to an embodiment disclosed is a computing device including one or more processors and a memory storing one or more programs executed by the one or more processors. The learning device includes a first machine learning model that generates a mask for masking a portion related to speech in a person basic image with the person basic image as an input, and generates a person background image by synthesizing the person basic image and the mask.

AUDIO REACTIVE AUGMENTED REALITY

Methods, systems, and storage media for augmenting a video are disclosed. Exemplary implementations may: receive a selection of an effect; receive user-generated content comprising video data and audio data; detect a characteristic of the audio data comprising at least a volume and/or a pitch of the audio data during a period of time; determine a series of numeric values based on the characteristic of the audio data during the period of time, individual numeric values of the series of numeric values being correlated with an amplitude of the volume and/or pitch at a discrete point within the period of time; and augment at least one of the video data and/or the audio data to include the effect based on the series of numeric values at discrete points in time within the period of time.

OPTIMIZATION OF LIP SYNCING IN NATURAL LANGUAGE TRANSLATED VIDEO

An approach for generating an optimized video of a speaker, translated from a source language into a target language with the speaker's lips synchronized to the translated speech, while balancing optimization of the translation into a target language. A source video may be fed into a neural machine translation model. The model may synthesize a plurality of potential translations. the translations may be received by a generative adversarial network which generates video for each translation and classifies the translations as in-sync or out of sync. A lip-syncing score may be for each of the generated videos that are classified as in-sync.

SYSTEMS AND METHODS FOR GENERATING SYNTHETIC VIDEOS BASED ON AUDIO CONTENTS
20220345796 · 2022-10-27 · ·

Systems and methods for generating a synthetic video based on an audio are provided. An exemplary system may include a memory storing computer-readable instructions and at least one processor. The processor may execute the computer-readable instructions to perform operations. The operations may include receiving a reference video including a motion picture of a human face and receiving the audio including a speech. The operations may also include generating a synthetic motion picture of the human face based on the reference video and the audio. The synthetic motion picture of the human face may include a motion of a mouth of the human face presenting the speech. The motion of the mouth may match a content of the speech. The operations may further include generating the synthetic video based on the synthetic motion picture of the human face.