Patent classifications
G10L21/055
METHODS, SYSTEMS, AND DEVICES FOR ASSEMBLY OF LIVE AND RECORDED AUDIO CONTENT
Aspects of the subject disclosure may include, for example, receiving first audio content from a first communication device and receiving second audio content from a second communication device, and adjusting the first audio content and the second audio content. The adjusting of the first audio content and the second audio content can comprise: detecting a gap in the first audio content; analyzing the first audio content resulting in an audio analysis; generating filler audio content based on the audio analysis; and inserting the filler audio content into the gap of the first audio content. Further embodiments can include aggregating the first adjusted audio content with the second adjusted audio content resulting in aggregated audio content, and providing the aggregated audio content to a third communication device for playback. The third communication device plays the aggregated audio content. Other embodiments are disclosed.
EXTRACTION OF AN AUDIO OBJECT
A method for extracting at least one audio object from at least two audio input signals, each of which contains the audio object. The second audio input signal is syncronized with the first audio input signal while obtaining a synchronized second audio input signal. The audio object is extracted by applying at least one trained model to the first audio signal and to the synchronized second audio input signal. The audio object is outputted. Further, the step of synchronizing the second audio input signal with the first audio input signal includes the steps of: generating audio signals; analytically calculating a correlation between the audio signals; optimizing the correlation vector; and determining the synchronized second audio input signal using the optimized correlation vector.
EXTRACTION OF AN AUDIO OBJECT
A method for extracting at least one audio object from at least two audio input signals, each of which contains the audio object. The second audio input signal is syncronized with the first audio input signal while obtaining a synchronized second audio input signal. The audio object is extracted by applying at least one trained model to the first audio signal and to the synchronized second audio input signal. The audio object is outputted. Further, the step of synchronizing the second audio input signal with the first audio input signal includes the steps of: generating audio signals; analytically calculating a correlation between the audio signals; optimizing the correlation vector; and determining the synchronized second audio input signal using the optimized correlation vector.
Techniques for modifying audiovisual media titles to improve audio transitions
A playback application is configured to analyze audio frames associated with transitions between segments within a media title to identify one or more portions of extraneous audio. The playback application is configured to analyze the one or more portions of extraneous audio and then determine which of the one or more corresponding audio frames should be dropped. In doing so, the playback application can analyze a topology associated with the media title to determine whether any specific portions of extraneous audio are to be played outside of a logical ordering of audio samples set forth in the topology. These specific portions of extraneous audio are preferentially removed.
Techniques for modifying audiovisual media titles to improve audio transitions
A playback application is configured to analyze audio frames associated with transitions between segments within a media title to identify one or more portions of extraneous audio. The playback application is configured to analyze the one or more portions of extraneous audio and then determine which of the one or more corresponding audio frames should be dropped. In doing so, the playback application can analyze a topology associated with the media title to determine whether any specific portions of extraneous audio are to be played outside of a logical ordering of audio samples set forth in the topology. These specific portions of extraneous audio are preferentially removed.
Distribution of Sign Language Enhanced Content
A system for distributing sign language enhanced content includes a computing platform having processing hardware and a system memory storing a software code. The processing hardware is configured to execute the software code to receive content including at least one of a sequence of audio frames or a sequence of video frames, perform an analysis of the content, and identify, based on the analysis, a message conveyed by the content. The processing hardware is further configured to execute the software code to generate a sign language translation of the content, the sign language translation including one or more of a gesture, body language, or a facial expression communicating the message conveyed by the content.
Systems, Devices, and Methods for Synchronizing Audio
Disclosed herein are new techniques carried out by a computing system for determining delays of various components of an audio system to allow for accurate correction of these delays, which may improve the audio quality of live performances for listeners who hear audio reproduced by loudspeakers at live performance venues. In one implementation the computing system, which may comprise a transmitter device and one or more receiver devices, may be configured to perform functions, including receiving a first audio signal, receiving, via an audio input interface of the receiver, a second audio signal, and determining, based on the first audio signal and the second audio signal, an audio delay that is associated with the second audio signal. The computing system may be configured to perform further functions, including based on a determined cross-correlation between a downsampled audio signal and a filtered second audio signal, determining the audio signal delay.
Systems, Devices, and Methods for Synchronizing Audio
Disclosed herein are new techniques carried out by a computing system for determining delays of various components of an audio system to allow for accurate correction of these delays, which may improve the audio quality of live performances for listeners who hear audio reproduced by loudspeakers at live performance venues. In one implementation the computing system, which may comprise a transmitter device and one or more receiver devices, may be configured to perform functions, including receiving a first audio signal, receiving, via an audio input interface of the receiver, a second audio signal, and determining, based on the first audio signal and the second audio signal, an audio delay that is associated with the second audio signal. The computing system may be configured to perform further functions, including based on a determined cross-correlation between a downsampled audio signal and a filtered second audio signal, determining the audio signal delay.
METHOD AND DEVICE FOR GENERATING SPEECH IMAGE
A device for generating a speech image according to an embodiment disclosed herein is a speech image generation device including one or more processors and a memory storing one or more programs executed by the one or more processors. The device includes a first machine learning model that extracts an image feature with a speech image of a person as an input to reconstruct the speech image from the extracted image feature and a second machine learning model that predicts the image feature with a speech audio signal of the person as an input.
METHOD AND DEVICE FOR GENERATING SPEECH IMAGE
A device for generating a speech image according to an embodiment disclosed herein is a speech image generation device including one or more processors and a memory storing one or more programs executed by the one or more processors. The device includes a first machine learning model that extracts an image feature with a speech image of a person as an input to reconstruct the speech image from the extracted image feature and a second machine learning model that predicts the image feature with a speech audio signal of the person as an input.