Patent classifications
G10L25/78
HEARING SYSTEM INCLUDING A HEARING INSTRUMENT AND METHOD FOR OPERATING THE HEARING INSTRUMENT
A hearing system includes a hearing instrument for capturing a sound signal from an environment of the hearing instrument. The captured sound signal is processed, and the processed sound signal is output to a user of the hearing instrument. In a speech recognition step, the captured sound signal is analyzed to recognize speech intervals, in which the captured sound signal contains speech. In a speech enhancement procedure performed during recognized speech intervals, the amplitude of the processed sound signal is periodically varied according to a temporal pattern that is consistent with a stress rhythmic pattern of the user. A method for operating the hearing instrument is also provided.
MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM
The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.
MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM
The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.
EMOTION TAG ASSIGNING SYSTEM, METHOD, AND PROGRAM
Provided are an emotion tag assigning system, method, and program for assigning, to a content, an emotion tag indicating an emotion of a user in execution of an event using the content.
An emotion tag assigning method includes a step of detecting, by a voice detector, voice data indicating a voice uttered by a person who participates in an event using a content during execution of the event; a step of recognizing, by an emotion recognizer, an emotion of the person based on the voice data; a step of acquiring, by a processor, emotion information indicating the recognized emotion of the person during the execution of the event using the content; and a step of assigning, by the emotion recognizer, an emotion rank calculated from the acquired emotion information to the content as an emotion tag.
EMOTION TAG ASSIGNING SYSTEM, METHOD, AND PROGRAM
Provided are an emotion tag assigning system, method, and program for assigning, to a content, an emotion tag indicating an emotion of a user in execution of an event using the content.
An emotion tag assigning method includes a step of detecting, by a voice detector, voice data indicating a voice uttered by a person who participates in an event using a content during execution of the event; a step of recognizing, by an emotion recognizer, an emotion of the person based on the voice data; a step of acquiring, by a processor, emotion information indicating the recognized emotion of the person during the execution of the event using the content; and a step of assigning, by the emotion recognizer, an emotion rank calculated from the acquired emotion information to the content as an emotion tag.
CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.
CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.
Synthetic speech processing
A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.
Intelligent voice recognizing method, apparatus, and intelligent computing device
An intelligent voice recognition method, voice recognition apparatus and intelligent computing device are disclosed. An intelligent voice recognition method of a voice recognition apparatus according to an embodiment of the present invention detects a voice of a user, receives an authentication request from the user, and performs authentication for the user on the basis of a result of determination of whether authentication for the user has recently been performed and a result of recognition of the voice of the user, thereby reducing a time and the quantity of calculations necessary for user authentication. One or more of the voice recognition apparatus and the intelligent computing device can be associated with artificial intelligence (AI) modules, unmanned aerial vehicle (UAV) robots, augmented reality (AR) devices, virtual reality (VR) devices, 5G service related devices, etc.
Hearing system comprising a personalized beamformer
A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.