Patent classifications
G10L19/04
Stereo signal encoding method and encoding apparatus
A stereo signal encoding method includes determining a window length of an attenuation window based on an inter-channel time difference; determining a modified linear prediction analysis window based on the window length of the attenuation window, where values of at least some points from a point (L−sub_window_len) to a point (L−1) in the modified linear prediction analysis window are less than values of corresponding points from a point (L−sub_window_len) to a point (L−1) in an initial linear prediction analysis window, and the window length of the modified linear prediction analysis window is equal to a window length of the initial linear prediction analysis window; and performing linear prediction analysis on a to-be-processed sound channel signal based on the modified linear prediction analysis window.
Stereo signal encoding method and encoding apparatus
A stereo signal encoding method includes determining a window length of an attenuation window based on an inter-channel time difference; determining a modified linear prediction analysis window based on the window length of the attenuation window, where values of at least some points from a point (L−sub_window_len) to a point (L−1) in the modified linear prediction analysis window are less than values of corresponding points from a point (L−sub_window_len) to a point (L−1) in an initial linear prediction analysis window, and the window length of the modified linear prediction analysis window is equal to a window length of the initial linear prediction analysis window; and performing linear prediction analysis on a to-be-processed sound channel signal based on the modified linear prediction analysis window.
SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND PROGRAM
The present technology relates to a signal processing apparatus, a signal processing method, and a program that are to enable acquisition of a signal with higher sound quality.
A signal processing apparatus includes: a difference-signal generation unit configured to generate, on the basis of an input signal and a prediction coefficient that is acquired by learning with, as training data, a difference signal based on a re-quantized signal for learning acquired by re-quantization of an original sound signal and the original sound signal, the difference signal corresponding to the input signal; and a combining unit configured to combine the difference signal generated and the input signal. The present technology is applicable to a signal processing apparatus.
Contextual biasing for speech recognition
A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.
Contextual biasing for speech recognition
A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.
LOUDSPEAKER ARRAY PASSIVE ACOUSTIC CONFIGURATION PROCEDURE
An example method of operation includes identifying a loudspeaker array profile defining characteristics of a loudspeaker array stored in memory, identifying a three-dimensional venue geometry value stored in the memory, defining virtual receivers to simulate acoustic characteristics within the venue geometry, defining a number of passive acoustic filter permutations to perform within a range of passive acoustic filter settings, and each passive acoustic filter setting is unique and has one or more passive acoustic filters to apply to one or more loudspeakers in the loudspeaker array, selecting performance criteria to apply to the loudspeaker array to represent its sound coverage uniformity at a given location throughout the venue geometry, calculating the performance criteria of the loudspeaker array via a passive acoustic filter setting selected from one or more of the passive acoustic filter permutations by performing a simulation with the passive acoustic filter settings, identifying an optimized passive acoustic filter setting from a specific permutation, with which the loudspeaker array achieves optimal uniform sound coverage in the venue geometry, and applying the optimized passive acoustic filter setting to the loudspeaker array.
LOUDSPEAKER ARRAY PASSIVE ACOUSTIC CONFIGURATION PROCEDURE
An example method of operation includes identifying a loudspeaker array profile defining characteristics of a loudspeaker array stored in memory, identifying a three-dimensional venue geometry value stored in the memory, defining virtual receivers to simulate acoustic characteristics within the venue geometry, defining a number of passive acoustic filter permutations to perform within a range of passive acoustic filter settings, and each passive acoustic filter setting is unique and has one or more passive acoustic filters to apply to one or more loudspeakers in the loudspeaker array, selecting performance criteria to apply to the loudspeaker array to represent its sound coverage uniformity at a given location throughout the venue geometry, calculating the performance criteria of the loudspeaker array via a passive acoustic filter setting selected from one or more of the passive acoustic filter permutations by performing a simulation with the passive acoustic filter settings, identifying an optimized passive acoustic filter setting from a specific permutation, with which the loudspeaker array achieves optimal uniform sound coverage in the venue geometry, and applying the optimized passive acoustic filter setting to the loudspeaker array.
SPEECH SYNTHESIS METHOD AND APPARATUS, AND READABLE STORAGE MEDIUM
A speech synthesis method includes: converting a text input sequence into a text feature representation sequence; inputting the text feature representation sequence into an encoder including N encoding layers; the N encoding layers including an encoding layer E.sub.i and an encoding layer E.sub.i+1; the encoding layer E.sub.i+1 including a first multi-head self-attention network; acquiring a first attention matrix and a historical text encoded sequence outputted by the encoding layer E.sub.i, and generating a second attention matrix of the encoding layer E.sub.i+1 according to residual connection between the first attention matrix and the first multi-head self-attention network and the historical text encoded sequence; and generating a target text encoded sequence of the encoding layer E.sub.i+1 according to the second attention matrix and the historical text encoded sequence, and generating synthesized speech data matched with the text input sequence based on the target text encoded sequence.
SPEECH SYNTHESIS METHOD AND APPARATUS, AND READABLE STORAGE MEDIUM
A speech synthesis method includes: converting a text input sequence into a text feature representation sequence; inputting the text feature representation sequence into an encoder including N encoding layers; the N encoding layers including an encoding layer E.sub.i and an encoding layer E.sub.i+1; the encoding layer E.sub.i+1 including a first multi-head self-attention network; acquiring a first attention matrix and a historical text encoded sequence outputted by the encoding layer E.sub.i, and generating a second attention matrix of the encoding layer E.sub.i+1 according to residual connection between the first attention matrix and the first multi-head self-attention network and the historical text encoded sequence; and generating a target text encoded sequence of the encoding layer E.sub.i+1 according to the second attention matrix and the historical text encoded sequence, and generating synthesized speech data matched with the text input sequence based on the target text encoded sequence.
CONTROLLING SLEW RATE
This application relates to methods and apparatus for controlling slew-rate of components for outputting an analogue output signal. Described is a signal processing circuit having a forward signal path for receiving an input signal and outputting an analogue output signal. The signal processing circuit has a first component located in said forward signal path for outputting the analogue output signal. A predictor is configured to predict a required slew-rate for the first component based on the input signal and a controller is configured to controllably vary an output slew-rate limit of the first component based on the prediction of required slew-rate.