Patent classifications
G10L25/81
ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS
Some implementations involve determining a noise metric and/or a speech intelligibility metric and determining a compensation process corresponding to the noise metric and/or the speech intelligibility metric. The compensation process may involve altering a processing of audio data and/or applying a non-audio-based compensation method. In some examples, altering the processing of the audio data does not involve applying a broadband gain increase to the audio signals. Some examples involve applying the compensation process in an audio environment. Other examples involve determining compensation metadata corresponding to the compensation process and transmitting an encoded content stream that includes encoded compensation metadata, encoded video data and encoded audio data from a first device to one or more other devices.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
Headset sound leakage mitigation
An audio system for a headset includes a plurality of speakers and an audio controller. The plurality of speakers may be in a dipole configuration that cancel sound leakage into a local area of the headset. The controller filters audio content presented by the plurality of speakers to further mitigate leakage of audio content into the local area. The audio determines sound filters based on environmental conditions, such as ambient noise levels, as well as based on the audio content being presented.
Volume leveler controller and controlling method
Volume leveler controller and controlling method are disclosed. In one embodiment, A volume leveler controller includes an audio content classifier for identifying the content type of an audio signal in real time; and an adjusting unit for adjusting a volume leveler in a continuous manner based on the content type as identified. The adjusting unit may configured to positively correlate the dynamic gain of the volume leveler with informative content types of the audio signal, and negatively correlate the dynamic gain of the volume leveler with interfering content types of the audio signal.
Viseme data generation for presentation while content is output
Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.
Machine learning-based audio codec switching
Described herein are techniques, devices, and systems for selectively using a music-capable audio codec on-demand during a communication session. A user equipment (UE) may adaptively transition between using a first audio codec that provides a first audio bandwidth and a second audio codec (e.g., the EVS-FB codec) that provides a second audio bandwidth that is greater than the first audio bandwidth. The transition to the second audio codec may occur in response to determining that sound in the environment of the UE includes frequencies outside of a range of frequencies associated with a human voice, such as by determining that music is being played in the environment of the UE, which allows for selectively using a music-capable audio codec when it would be beneficial to do so.
METHOD AND DEVICE FOR SPEECH/MUSIC CLASSIFICATION AND CORE ENCODER SELECTION IN A SOUND CODEC
Two-stage speech/music classification device and method classify an input sound signal and select a core encoder for encoding the sound signal. A first stage classifies the input sound signal into one of a number of final classes. A second stage extracts high-level features of the input sound signal and selects the core encoder for encoding the input sound signal in response to the extracted high-level features and the final class selected in the first stage.
METHOD AND DEVICE FOR SPEECH/MUSIC CLASSIFICATION AND CORE ENCODER SELECTION IN A SOUND CODEC
Two-stage speech/music classification device and method classify an input sound signal and select a core encoder for encoding the sound signal. A first stage classifies the input sound signal into one of a number of final classes. A second stage extracts high-level features of the input sound signal and selects the core encoder for encoding the input sound signal in response to the extracted high-level features and the final class selected in the first stage.
System and method for real-time synchronization of media content via multiple devices and speaker systems
A method and system for real-time customizing and synchronizing media by a client device in communication with a server device. A client device customizes stock media content based on user preferences, and synchronizes the customized content for playback with a server-side playback of the stock media content.