Patent classifications
G10L21/0208
METHOD AND APPARATUS FOR PROCESSING SPEECH, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method for processing a speech includes: acquiring an original speech; extracting a spectrogram from the original speech; acquiring a speech synthesis model, where the speech synthesis model comprises a first generation sub-model and a second generation sub-model; generating a harmonic structure of the spectrogram, by invoking the first generation sub-model to process the spectrogram; and generating a target speech, by invoking the second generation sub-model to process the harmonic structure and the spectrogram.
METHOD AND APPARATUS FOR PROCESSING SPEECH, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method for processing a speech includes: acquiring an original speech; extracting a spectrogram from the original speech; acquiring a speech synthesis model, where the speech synthesis model comprises a first generation sub-model and a second generation sub-model; generating a harmonic structure of the spectrogram, by invoking the first generation sub-model to process the spectrogram; and generating a target speech, by invoking the second generation sub-model to process the harmonic structure and the spectrogram.
Acoustic output apparatus
The present disclosure provides an acoustic output apparatus including one or more status sensors, at least one low-frequency acoustic driver, at least one high-frequency acoustic driver, at least two first sound guiding holes, and at least two second sound guiding holes. The status sensors may detect status information of a user. The low-frequency acoustic driver may generate at least one first sound, a frequency of which is within a first frequency range. The high-frequency acoustic driver may generate at least one second sound, a frequency of which is within a second frequency range including at least one frequency exceeding the first frequency range. The first and second sound guiding holes may output the first and second spatial sound, respectively. The first and second sound may be generated based on the status information, and may simulate a target sound coming from at least one virtual direction with respect to the user.
Acoustic output apparatus
The present disclosure provides an acoustic output apparatus including one or more status sensors, at least one low-frequency acoustic driver, at least one high-frequency acoustic driver, at least two first sound guiding holes, and at least two second sound guiding holes. The status sensors may detect status information of a user. The low-frequency acoustic driver may generate at least one first sound, a frequency of which is within a first frequency range. The high-frequency acoustic driver may generate at least one second sound, a frequency of which is within a second frequency range including at least one frequency exceeding the first frequency range. The first and second sound guiding holes may output the first and second spatial sound, respectively. The first and second sound may be generated based on the status information, and may simulate a target sound coming from at least one virtual direction with respect to the user.
ELECTRONIC DEVICE AND SPEAKER VERIFICATION METHOD OF ELECTRONIC DEVICE
An electronic device is provided. The electronic device includes a microphone configured to receive an audio signal including a voice of a user, a sensor configured to detect a vibration signal generated by the user, at least one processor, and a memory configured to store an instruction executable by the processor. The at least one processor may be configured to determine a noise level included in the audio signal, calculate a verification score based on the noise level, the audio signal, and the vibration signal, and perform speaker verification for the user based on the verification score.
GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
Smart Noise Reduction Device and the Method Thereof
The present invention discloses a smart noise reduction device including a control device; an audio waveform pattern recognizer coupled to the control device for identifying an audio mixed signal including a regularity signal and a non-regularity signal; an audio waveform pattern database coupled to the control device, including at least one audio type, each having a plurality of preset second regularity signals; and an audio filter coupled to the control device to obtain the regularity signal.
Method of Noise Reduction for Intelligent Network Communication
The present invention discloses a method of noise reduction for an intelligent network communication, which includes the following steps: first, receiving a local sound message through a sound receiver of a communication device at the transmitting end. Next, a voice recognizer is used to identify the voice characteristics of the speaker; then, it is determined from a voice database whether there is a corresponding or similar voice characteristic of the speaker recognized by the voice recognizer. Finally, filtering other signals other than the voice characteristic signal of the speaker through a sound filter to obtain the original sound emitted by the speaker.
SYSTEM AND METHOD FOR IDENTIFYING ACTIVITY IN AN AREA USING A VIDEO CAMERA AND AN AUDIO SENSOR
Identifying activity in an area even during periods of poor visibility using a video camera and an audio sensor are disclosed. The video camera is used to identify visible events of interest and the audio sensor is used to capture audio occurring temporally with the identified visible events of interest. A sound profile is determined for each of the identified visible events of interest based on sounds captured by the audio sensor during the corresponding identified visible event of interest. Then, during a time of poor visibility, a subsequent sound event is identified in a subsequent audio stream captured by the audio sensor. One or more sound characteristics of the subsequent sound event are compared with the sound profiles associated with each of the identified visible events of interest, and if there is a match, one or more matching sound profiles are filtered out from the subsequent audio stream.
SYSTEM AND METHOD FOR AUTOMATIC SETUP OF AUDIO COVERAGE AREA
Embodiments include an audio system comprising a plurality of microphones disposed in an environment, wherein the plurality of microphones is configured to detect one or more audio sources, and generate location data indicating a location of each of the one or more audio sources relative to the plurality of microphones; and at least one processor communicatively coupled to the plurality of microphones, wherein the at least one processor is configured to receive the location data from the plurality of microphones, and define a plurality of audio pick-up regions in the environment based on the location data, the plurality of audio pick-up regions comprising a first audio pick-up region and a second audio pick-up region, wherein the plurality of microphones are configured to deploy a first lobe within the first audio pick-up region and a second lobe within the second audio pick-up region.