Patent classifications
G10L21/0316
AI BASED REMIXING OF MUSIC: TIMBRE TRANSFORMATION AND MATCHING OF MIXED AUDIO DATA
The present invention provides a method for processing audio data, comprising the steps of providing input audio data containing a mixture of audio data including first audio data of a first musical timbre and second audio data of a second musical timbre different from said first musical timbre, decomposing the input audio data to provide decomposed data representative of the first audio data, transforming the decomposed data to obtain third audio data.
VOICE COMMUNICATION METHOD AND SYSTEM UNDER A BROADBAND AND NARROW-BAND INTERCOMMUNICATION ENVIRONMENT
Provided are a voice communication method and system under a broadband and narrow-band intercommunication environment. The method comprises: when a broadband terminal calls a narrow-band terminal, the broadband terminal executing a reducing operation on an energy amplitude of voice data to obtain a reduced voice data packet, and sending the reduced voice data packet to the narrow-band terminal such that the narrow-band terminal plays the reduced voice data packet; and when the narrow-band terminal calls the broadband terminal, the broadband terminal receiving a voice data packet, and executing an amplification operation on an energy amplitude of the voice data packet to obtain an enlarged voice data packet, and the broadband terminal playing the enlarged voice data packet. The present application can solve the problem of voice size inconsistency between a broadband terminal and a narrow-band terminal, thereby enhancing the usage experience for a user.
VOICE COMMUNICATION METHOD AND SYSTEM UNDER A BROADBAND AND NARROW-BAND INTERCOMMUNICATION ENVIRONMENT
Provided are a voice communication method and system under a broadband and narrow-band intercommunication environment. The method comprises: when a broadband terminal calls a narrow-band terminal, the broadband terminal executing a reducing operation on an energy amplitude of voice data to obtain a reduced voice data packet, and sending the reduced voice data packet to the narrow-band terminal such that the narrow-band terminal plays the reduced voice data packet; and when the narrow-band terminal calls the broadband terminal, the broadband terminal receiving a voice data packet, and executing an amplification operation on an energy amplitude of the voice data packet to obtain an enlarged voice data packet, and the broadband terminal playing the enlarged voice data packet. The present application can solve the problem of voice size inconsistency between a broadband terminal and a narrow-band terminal, thereby enhancing the usage experience for a user.
Apparatus and method for reducing noise in an audio signal
An apparatus for processing an audio signal includes an audio signal analyzer and a filter. The audio signal analyzer is configured to analyze an audio signal to determine a plurality of noise suppression filter values for a plurality of bands of the audio signal, wherein the analyzer is configured to determine a noise suppression filter value so that a noise suppression filter value is greater than or equal to a minimum noise suppression filter value and so that the minimum noise suppression value depends on a characteristic of the audio signal. The filter is configured for filtering the audio signal, wherein the filter is adjusted based on the noise suppression filter values.
Apparatus and method for reducing noise in an audio signal
An apparatus for processing an audio signal includes an audio signal analyzer and a filter. The audio signal analyzer is configured to analyze an audio signal to determine a plurality of noise suppression filter values for a plurality of bands of the audio signal, wherein the analyzer is configured to determine a noise suppression filter value so that a noise suppression filter value is greater than or equal to a minimum noise suppression filter value and so that the minimum noise suppression value depends on a characteristic of the audio signal. The filter is configured for filtering the audio signal, wherein the filter is adjusted based on the noise suppression filter values.
Trigger word detection with multiple digital assistants
Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for selecting a digital assistant from among multiple digital assistants. An embodiment operates by receiving a voice input containing a trigger word at a first voice adapter associated with a digital assistant that generates a first confidence score for the trigger word. The embodiment further receives the voice input at a second voice adapter that generates a second confidence score for the trigger word. The embodiment determines the first confidence score is higher than the second confidence score. The embodiment selects the digital assistant based on the determining.
MODULATION OF PACKETIZED AUDIO SIGNALS
Modulating packetized audio signals in a voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request, and generate a first action data structure. The system can identify a content item object based on the trigger keyword, and generate an output signal comprising a first portion corresponding to the first action data structure and a second portion corresponding to the content item object. The system can apply a modulation to the first or second portion of the output signal, and transmit the modulated output signal to the device.
MODULATION OF PACKETIZED AUDIO SIGNALS
Modulating packetized audio signals in a voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request, and generate a first action data structure. The system can identify a content item object based on the trigger keyword, and generate an output signal comprising a first portion corresponding to the first action data structure and a second portion corresponding to the content item object. The system can apply a modulation to the first or second portion of the output signal, and transmit the modulated output signal to the device.
SEPARATING SPEECH BY SOURCE IN AUDIO RECORDINGS BY PREDICTING ISOLATED AUDIO SIGNALS CONDITIONED ON SPEAKER REPRESENTATIONS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.
SEPARATING SPEECH BY SOURCE IN AUDIO RECORDINGS BY PREDICTING ISOLATED AUDIO SIGNALS CONDITIONED ON SPEAKER REPRESENTATIONS
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.