Patent classifications
G10L25/81
Spatially informed audio signal processing for user speech
A device implementing a system for processing speech in an audio signal includes at least one processor configured to receive an audio signal corresponding to at least one microphone of a device, and to determine, using a first model, a first probability that a speech source is present in the audio signal. The at least one processor is further configured to determine, using a second model, a second probability that an estimated location of a source of the audio signal corresponds to an expected position of a user of the device, and to determine a likelihood that the audio signal corresponds to the user of the device based on the first and second probabilities.
Spatially informed audio signal processing for user speech
A device implementing a system for processing speech in an audio signal includes at least one processor configured to receive an audio signal corresponding to at least one microphone of a device, and to determine, using a first model, a first probability that a speech source is present in the audio signal. The at least one processor is further configured to determine, using a second model, a second probability that an estimated location of a source of the audio signal corresponds to an expected position of a user of the device, and to determine a likelihood that the audio signal corresponds to the user of the device based on the first and second probabilities.
Method and apparatus for adaptive control of decorrelation filters
An audio signal processing method and apparatus for adaptively adjusting a decorrelator. The method comprises obtaining a control parameter and calculating mean and variation of the control parameter. Ratio of the variation and mean of the control parameter is calculated, and a decorrelation parameter is calculated based on the said ratio. The decorrelation parameter is then provided to a decorrelator.
Context aware hearing optimization engine
One or more context aware processing parameters and an ambient audio stream are received. One or more sound characteristics associated with the ambient audio stream are identified using a machine learning model. One or more actions to perform are determined using the machine learning model and based on the one or more context aware processing parameters and the identified one or more sound characteristics. The one or more actions are performed.
AUDIO ONSET DETECTION METHOD AND APPARATUS
An audio onset detection method and apparatus, an electronic device, and a computer readable storage medium. The audio onset detection method comprises: determining a first voice frequency spectrum parameter corresponding to each frequency band according to a frequency domain signal corresponding to an audio signal of an audio; for each frequency band, determining a second voice frequency spectrum parameter of a current frequency band according to the first voice frequency spectrum parameter of the current frequency band and the first voice frequency spectrum parameters of frequency bands positioned before the current frequency band according to a time sequence; and determining one or more onset positions of notes and syllables in the audio according to the second voice frequency spectrum parameters corresponding to the frequency bands.
AUDIO ONSET DETECTION METHOD AND APPARATUS
An audio onset detection method and apparatus, an electronic device, and a computer readable storage medium. The audio onset detection method comprises: determining a first voice frequency spectrum parameter corresponding to each frequency band according to a frequency domain signal corresponding to an audio signal of an audio; for each frequency band, determining a second voice frequency spectrum parameter of a current frequency band according to the first voice frequency spectrum parameter of the current frequency band and the first voice frequency spectrum parameters of frequency bands positioned before the current frequency band according to a time sequence; and determining one or more onset positions of notes and syllables in the audio according to the second voice frequency spectrum parameters corresponding to the frequency bands.
Method and device for adding lyrics to short video
Methods and devices are provided for adding lyrics to a short video. The device obtains a music material required by the short video and obtains a first playback duration of the short video. The device obtains a target music material having a playback duration matching the first playback duration. The device obtains a lyric sticker corresponding to the target music material based on the lyrics extracted from the target music material and displays a processed short video after adding with the lyric sticker.
Method and device for adding lyrics to short video
Methods and devices are provided for adding lyrics to a short video. The device obtains a music material required by the short video and obtains a first playback duration of the short video. The device obtains a target music material having a playback duration matching the first playback duration. The device obtains a lyric sticker corresponding to the target music material based on the lyrics extracted from the target music material and displays a processed short video after adding with the lyric sticker.
Audio Source Separation Systems and Methods
Systems and methods for audio source separation include receiving an audio input stream including a mixture of audio signals generated from a plurality of audio sources; processing, through a trained audio source separation model, the audio input stream to generate a plurality of audio stems corresponding to one or more of the plurality of audio sources; updating, using a self-iterative processing and training system, the audio source separation model based at least in part on the plurality of audio stems; and re-processing, using the updated trained audio source separation model, the audio input stream to generate a plurality of enhanced audio stems.
ENHANCING MUSICAL SOUND DURING A NETWORKED CONFERENCE
Dynamic adjustment of audio characteristics for enhancing musical sound during a networked conference is disclosed. In an embodiment, a method is provided for sound enhancement performed by a device coupled to a network. The method includes receiving an audio signal to be transmitted over the network, detecting when musical content is present in the audio signal, processing the audio signal to enhance voice characteristics to generate an enhanced audio signal when the musical content is not detected, processing the audio signal to enhance music characteristic to generate the enhanced audio signal when the musical content is detected, and transmitting the enhanced audio signal over the network.