G10L15/28

MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

Systems and methods for voice identification and analysis
11580986 · 2023-02-14 ·

Obtaining configuration audio data including voice information for a plurality of meeting participants. Generating localization information indicating a respective location for each meeting participant. Generating a respective voiceprint for each meeting participant. Obtaining meeting audio data. Identifying a first meeting participant and a second meeting participant. Linking a first meeting participant identifier of the first meeting participant with a first segment of the meeting audio data. Linking a second meeting participant identifier of the second meeting participant with a second segment of the meeting audio data. Generating a GUI indicating the respective locations of the first and second meeting participants, and the GUI indicating a first transcription of the first segment and a second transcription of the second segment. The first transcription is associated with the first meeting participant in the GUI, and the second transcription is associated with the second meeting participant in the GUI.

Discrete Three-Dimensional Processor

A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.

Discrete Three-Dimensional Processor

A discrete three-dimensional (3-D) processor comprises stacked first and second dice. The first die comprises three-dimensional memory (3D-M) arrays, whereas the second die comprises at least a portion of a logic/processing circuit and an off-die peripheral-circuit component of the 3D-M array(s). The preferred 3-D processor can be used to compute non-arithmetic function/model. In other applications, the preferred 3-D processor may also be a 3-D configurable computing array, a 3-D pattern processor, or a 3-D neuro-processor.

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD
20230045458 · 2023-02-09 · ·

An information processing apparatus of the present disclosure includes an acquisition unit that acquires respiration information indicating respiration of a user, and a determination unit that determines an operation amount regarding an operation by the user on the basis of the respiration of the user indicated by the respiration information acquired by the acquisition unit.

VOICE RECOGNITION USING ACCELEROMETERS FOR SENSING BONE CONDUCTION

Voice command recognition and natural language recognition are carried out using an accelerometer that senses signals from the vibrations of one or more bones of a user and receives no audio input. Since word recognition is made possible using solely the signal from the accelerometer from a person's bone conduction as they speak, an acoustic microphone is not needed and thus not used to collect data for word recognition. According to one embodiment, a housing contains an accelerometer and a processor, both within the same housing. The accelerometer is preferably a MEMS accelerometer which is capable of sensing the vibrations that are present in the bone of a user as the user is speaking words. A machine learning algorithm is applied to the collected data to correctly recognize words spoken by a person with significant difficulties in creating audible language.

DEVICE INCLUDING SPEECH RECOGNITION FUNCTION AND METHOD OF RECOGNIZING SPEECH
20180005627 · 2018-01-04 ·

A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker.

DEVICE INCLUDING SPEECH RECOGNITION FUNCTION AND METHOD OF RECOGNIZING SPEECH
20180005627 · 2018-01-04 ·

A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker.

CALL MANAGEMENT SYSTEM AND ITS SPEECH RECOGNITION CONTROL METHOD
20180012600 · 2018-01-11 ·

A speech recognition server has a speech recognition engine, and a mode control table to hold a speech recognition mode for each call. The speech recognition engine has a mode management unit to designate a speech recognition mode for a decoder, and an output analysis unit to analyze recognition result data speech-to-text converted by speech recognition. The output analysis unit designates the speech recognition mode for the mode management unit in accordance with result of analysis of the recognition result data speech-to-text converted by the speech recognition. The mode management unit designates the speech recognition mode for the decoder in accordance with the designation with the output analysis unit. Upon speech recognition of call data, it is possible to suppress hardware resource consumption while improve users' satisfaction.