Patent classifications
G10L15/24
METHOD AND APPARATUS FOR ACQUIRING SEMANTIC INFORMATION, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method and an apparatus for acquiring semantic information, an electronic device and a storage medium are provided. The method includes: collecting an echo signal of vibrations of a throat; performing a Fourier transform on a waveform of each period of the echo signal to obtain a spectrogram of each period, wherein the spectrograms of M periods form a spectrogram set, the spectrogram set includes M spectrograms, and the spectrograms are arranged in sequence from first to last according to a return time sequence of the corresponding echo signal; extracting a characteristic waveform of the vibrations of the throat from the spectrogram set; segmenting the characteristic waveform to obtain characteristic segments containing the semantic information; and inputting the characteristic segments into a semantic acquisition model to acquire the semantic information.
Detecting potential significant errors in speech recognition results
In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
Detecting potential significant errors in speech recognition results
In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
CONTEXTUAL UTTERANCE RESOLUTION IN MULTIMODAL SYSTEMS
A system and method of responding to a vocal utterance may include capturing and converting the utterance to word(s) using a language processing method, such as natural language processing. The context of the utterance and of the system, which may include multimodal inputs, may be used to determine the meaning and intent of the words.
CONTEXTUAL UTTERANCE RESOLUTION IN MULTIMODAL SYSTEMS
A system and method of responding to a vocal utterance may include capturing and converting the utterance to word(s) using a language processing method, such as natural language processing. The context of the utterance and of the system, which may include multimodal inputs, may be used to determine the meaning and intent of the words.
SPEECH SIGNAL PROCESSING METHOD AND RELATED DEVICE THEREOF
A speech signal processing method and a related device thereof are provided. The method may be applied to the audio field and includes: obtaining a user speech signal captured by a sensor; obtaining a corresponding vibration signal when a user generates a speech, where the vibration signal indicates a vibration feature of a body part of the user, and the body part is a part that vibrates correspondingly based on sound-making behavior when the user is making a sound; and obtaining target speech information based on the vibration signal and the user speech signal captured by the sensor. In this application, the vibration signal is used as a basis for speech recognition.
SPEECH SIGNAL PROCESSING METHOD AND RELATED DEVICE THEREOF
A speech signal processing method and a related device thereof are provided. The method may be applied to the audio field and includes: obtaining a user speech signal captured by a sensor; obtaining a corresponding vibration signal when a user generates a speech, where the vibration signal indicates a vibration feature of a body part of the user, and the body part is a part that vibrates correspondingly based on sound-making behavior when the user is making a sound; and obtaining target speech information based on the vibration signal and the user speech signal captured by the sensor. In this application, the vibration signal is used as a basis for speech recognition.
Voice activation using a laser listener
A voice activation system for a vehicle. The voice activation system for a vehicle which has at least one sound panel capable of providing vibrations of a user's voice from the outside of the vehicle into an inside area of the vehicle. A laser listening device is operably connected to the panel for receiving vibrations from a user's voice. A controller receives a pre-identified command of the user from the laser listener and operates an action in the vehicle in response thereto.
Voice activation using a laser listener
A voice activation system for a vehicle. The voice activation system for a vehicle which has at least one sound panel capable of providing vibrations of a user's voice from the outside of the vehicle into an inside area of the vehicle. A laser listening device is operably connected to the panel for receiving vibrations from a user's voice. A controller receives a pre-identified command of the user from the laser listener and operates an action in the vehicle in response thereto.
ACCIDENTAL VOICE TRIGGER AVOIDANCE USING THERMAL DATA
Methods and systems for processing voice commands are disclosed. A voice controlled device may receive audio data comprising a voice command. Location information indicative of the source of the audio data may be determined. One or more devices may be caused to determine signals based on the location information. The one or more devices may receive thermal data in response to the signals. The thermal data may be analyzed to determine if the thermal data indicates the presence of a person at the expected location. If a person is detected, then the audio data may processed to cause the voice command to be executed.