Patent classifications
G10L15/20
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
An electronic apparatus includes: a memory storing a first threshold value and a second threshold value corresponding to a receiving direction of a wake-up word, a sound receiver comprising sound receiving circuitry, and a processor configured to: identify a receiving direction of the sound based on a sound received through the sound receiver, based on a similarity between sound data obtained in response to the received sound and the wake-up word being greater than or equal to the first threshold value corresponding to the identified receiving direction, perform voice recognition for a subsequent sound received through the sound receiver, and based on the similarity being less than the first threshold value and greater than or equal to the second threshold value, change the first threshold value.
Recognition or synthesis of human-uttered harmonic sounds
Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.
Recognition or synthesis of human-uttered harmonic sounds
Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.
ACOUSTIC CROSSTALK SUPPRESSION DEVICE AND ACOUSTIC CROSSTALK SUPPRESSION METHOD
An acoustic crosstalk suppression device includes a speaker estimation unit configured to estimate a main speaker based on voice signals collected by n units of microphones corresponding to n number of persons (n: an integer equal to or larger than 3); n units of filter update units each of which is configured to update a parameter of a filter configured to generate a suppression signal of a crosstalk component included in a voice signal of the main speaker; and a crosstalk suppression unit configured to suppress the crosstalk component by using a synthesis suppression signal generated by the maximum (n-1) units of filter update units corresponding to reference signals collected by the maximum (n-1) units of microphones.
ACOUSTIC CROSSTALK SUPPRESSION DEVICE AND ACOUSTIC CROSSTALK SUPPRESSION METHOD
An acoustic crosstalk suppression device includes a speaker estimation unit configured to estimate a main speaker based on voice signals collected by n units of microphones corresponding to n number of persons (n: an integer equal to or larger than 3); n units of filter update units each of which is configured to update a parameter of a filter configured to generate a suppression signal of a crosstalk component included in a voice signal of the main speaker; and a crosstalk suppression unit configured to suppress the crosstalk component by using a synthesis suppression signal generated by the maximum (n-1) units of filter update units corresponding to reference signals collected by the maximum (n-1) units of microphones.
ELECTRONIC DEVICE AND METHOD FOR PROCESSING USER INPUT
An electronic device and method are disclosed herein. The electronic device includes a communication circuit, a processor, and a memory. The processor implements the method, including: receiving, from each of one or more external devices receiving a voice signal of a user, via the communication circuit, a first probability value based on usage frequency, and a second probability value based on signal-to-noise (SNR) magnitude, calculating final probability values for each of the one or more external devices, based on respective first and second probability values of each of the one or more external devices, and selecting an external device from among the one or more external devices having a highest final probability value from among the calculated final probability values.
ELECTRONIC DEVICE AND METHOD FOR PROCESSING USER INPUT
An electronic device and method are disclosed herein. The electronic device includes a communication circuit, a processor, and a memory. The processor implements the method, including: receiving, from each of one or more external devices receiving a voice signal of a user, via the communication circuit, a first probability value based on usage frequency, and a second probability value based on signal-to-noise (SNR) magnitude, calculating final probability values for each of the one or more external devices, based on respective first and second probability values of each of the one or more external devices, and selecting an external device from among the one or more external devices having a highest final probability value from among the calculated final probability values.
Method and apparatus for audio data processing
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
Method and apparatus for audio data processing
Embodiments of the disclosure provide methods and apparatuses processing audio data. The method can include: acquiring audio data by an audio capturing device, determining feature information of an enclosure in which the audio capturing device is located, and reverberating the feature information into the audio data.
MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES
An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.