G10L21/10

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO ON BASIS OF MACHINE LEARNING
20220358703 · 2022-11-10 ·

A device for generating a speech video may include a first encoder to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder to reconstruct the speech video of the person using the combined vector as an input. The person background image input to the first encoder includes a face and an upper body of the person, with a portion related to speech of the person covered with a mask.

Accessibility Enhanced Content Rendering

A user system for rendering accessibility enhanced content includes processing hardware, a display, and a memory storing software code. The processing hardware executes the software code to receive primary content from a content distributor and determine whether the primary content is accessibility enhanced content including an accessibility track. When the primary content omits the accessibility track, the processing hardware executes the software code to perform a visual analysis, an audio analysis, or both, of the primary content, generate, based on the visual analysis and/or the audio analysis, the accessibility track to include at least one of a sign language performance or one or more video tokens configured to be played back during playback of the primary content, and synchronize the accessibility track to the primary content. The processing hardware also executes the software code to render, using the display, the primary content or the accessibility enhanced content.

Systems, methods, devices and apparatuses for detecting facial expression

A system, method and apparatus for detecting facial expressions according to EMG signals.

SPEECH RECOGNITION SYSTEM FOR TEACHING ASSISTANCE

The present invention provides a speech recognition system for teaching assistance, which provides caption service for the hearing impaired. This system includes a speaker and a automatic speech recognition (ASR) classroom server, a listener-typist and a computer, a hearing impaired and a live screen, all are in the same classroom. Connect the ASR classroom server, the computer and the live screen with a local area network. The speaker's audio is sent to the ASR classroom server by a microphone for being converted into text caption, and then the text caption is sent to the live screen of the hearing impaired together with the speaker's audio so that the hearing impaired can read the text caption spoken by the speaker. The text caption can be corrected by the listener-typist to make it completely correct.

SPEECH RECOGNITION SYSTEM FOR TEACHING ASSISTANCE

The present invention provides a speech recognition system for teaching assistance, which provides caption service for the hearing impaired. This system includes a speaker and a automatic speech recognition (ASR) classroom server, a listener-typist and a computer, a hearing impaired and a live screen, all are in the same classroom. Connect the ASR classroom server, the computer and the live screen with a local area network. The speaker's audio is sent to the ASR classroom server by a microphone for being converted into text caption, and then the text caption is sent to the live screen of the hearing impaired together with the speaker's audio so that the hearing impaired can read the text caption spoken by the speaker. The text caption can be corrected by the listener-typist to make it completely correct.

Sound Boundaries for a Virtual Collaboration Space

An illustrative collaboration space provider system defines, within a virtual collaboration space, a sound boundary associated with a particular avatar located within the virtual collaboration space. The collaboration space provider system then prevents, based on the sound boundary, at least one direction of audio communication for a user represented by the particular avatar. Corresponding methods and systems are also disclosed.

Sound Boundaries for a Virtual Collaboration Space

An illustrative collaboration space provider system defines, within a virtual collaboration space, a sound boundary associated with a particular avatar located within the virtual collaboration space. The collaboration space provider system then prevents, based on the sound boundary, at least one direction of audio communication for a user represented by the particular avatar. Corresponding methods and systems are also disclosed.

SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users

METHODS AND SYSTEMS FOR MANIPULATING AUDIO PROPERTIES OF OBJECTS
20230029775 · 2023-02-02 ·

In one implementation, a method of changing an audio property of an object is performed at a device including one or more processors coupled to non-transitory memory. The method includes displaying, using a display, a representation of a scene including a representation of an object associated with an audio property. The method includes displaying, using the display, in association with the representation of the object, a manipulator indicating a value of the audio property. The method includes receiving, using one or more input devices, a user input interacting with the manipulator. The method includes, in response to receiving the user input, changing the value of the audio property based on the user input and displaying, using the display, the manipulator indicating the changed value of the audio property.

Systems and methods for communicating with vision and hearing impaired vehicle occupants

Methods and systems for controlling an occupant output system associated with a vehicle are provided. The methods and systems receive vehicle or occupant context data from a source of vehicle context data, generate occupant message data based on the vehicle or occupant context data and determine if an occupant associated with the occupant output system is vision or hearing impaired. When the occupant is determined to be vision or hearing impaired, the methods and systems decide on an output modality to assist the occupant, and generate an output for the occupant on the output device, and in the output modality, based on the occupant message data.