Patent classifications
G10L25/93
Method for user voice input processing and electronic device supporting same
According to an embodiment, disclosed is an electronic device including a speaker, a microphone, a communication interface, a processor operatively connected to the speaker, the microphone, and the communication interface, and a memory operatively connected to the processor. The memory stores instructions that, when executed, cause the processor to receive a first utterance through the microphone, to determine a speaker model by performing speaker recognition on the first utterance, to receive a second utterance through the microphone after the first utterance is received, to detect an end-point of the second utterance, at least partially using the determined speaker model. Besides, various embodiments as understood from the specification are also possible.
VOICE ACTIVITY DETECTION METHOD AND APPARATUS, AND STORAGE MEDIUM
Provided are a voice activity detection method and apparatus, an electronic device and a storage medium, which relate to the technical field of voice processing, for example, to the technical field of artificial intelligence and deep learning. The specific implementation solution is described below. A first audio signal is acquired, and a frequency domain feature of the first audio signal is extracted; and the frequency domain feature of the first audio signal is input into a voice activity detection model, and a voice presence detection result output by the voice activity detection model is obtained, where the voice activity detection model is configured to detect whether voice is present in the first audio signal.
VOICE ACTIVITY DETECTION METHOD AND APPARATUS, AND STORAGE MEDIUM
Provided are a voice activity detection method and apparatus, an electronic device and a storage medium, which relate to the technical field of voice processing, for example, to the technical field of artificial intelligence and deep learning. The specific implementation solution is described below. A first audio signal is acquired, and a frequency domain feature of the first audio signal is extracted; and the frequency domain feature of the first audio signal is input into a voice activity detection model, and a voice presence detection result output by the voice activity detection model is obtained, where the voice activity detection model is configured to detect whether voice is present in the first audio signal.
ACOUSTIC ANALYSIS OF CROWD SOUNDS
A method, computer system, and a computer program product for detecting face mask usage based on a crowd sound is provided. The present invention may include capturing an audio stream including a crowd voice data. The present invention may also include analyzing the crowd voice data using a machine learning model to determine an amount of people wearing masks. The present invention may further include in response to determining that the amount of people wearing masks does not meet a compliance threshold, displaying a content to promote face mask usage.
ACOUSTIC ANALYSIS OF CROWD SOUNDS
A method, computer system, and a computer program product for detecting face mask usage based on a crowd sound is provided. The present invention may include capturing an audio stream including a crowd voice data. The present invention may also include analyzing the crowd voice data using a machine learning model to determine an amount of people wearing masks. The present invention may further include in response to determining that the amount of people wearing masks does not meet a compliance threshold, displaying a content to promote face mask usage.
SYSTEM, METHOD, AND RECORDING MEDIUM FOR CONTROLLING DIALOGUE INTERRUPTIONS BY A SPEECH OUTPUT DEVICE
A computer speech output control method, system, and non-transitory computer readable medium, include a computer speech output control system, including a computer speech output unit configured to output a computer speech, a human speech monitoring circuit configured to determine whether a human conversation is occurring, an interruption priority setting circuit configured to set a priority setting for when the human conversation can be interrupted by the computer speech, and an interruption determining circuit configured to determine whether to cause the computer speech output unit to output the computer speech based on the priority setting and a status of the human conversation.
SYSTEM, METHOD, AND RECORDING MEDIUM FOR CONTROLLING DIALOGUE INTERRUPTIONS BY A SPEECH OUTPUT DEVICE
A computer speech output control method, system, and non-transitory computer readable medium, include a computer speech output control system, including a computer speech output unit configured to output a computer speech, a human speech monitoring circuit configured to determine whether a human conversation is occurring, an interruption priority setting circuit configured to set a priority setting for when the human conversation can be interrupted by the computer speech, and an interruption determining circuit configured to determine whether to cause the computer speech output unit to output the computer speech based on the priority setting and a status of the human conversation.
Selective noise suppression during automatic speech recognition
An automatic speech recognition engine and a method of using the engine is described. The method pertains to front-end processing an audio signal and includes the steps of: identifying a plurality of voiced-frames of the audio signal; determining that one or more of the plurality of voiced-frames have a signal-to-noise (SNR) value greater than a first predetermined threshold; and based on the determination, bypassing noise suppression for the one or more of the plurality of voiced-frames.
Assisted near-distance communication using binaural cues
Techniques are described for assisting near distance communications. A first device comprising a receiver, a sensor and a processor may be configured to perform the assisted near distance communication techniques. The receiver may receive, from a second device located within a conversational distance to the first device, monophonic audio data representative of the near distance communication. The sensor may generate a sensor signal representative of spatial information of the near distance communication. The processor may render, based on the spatial information and the monophonic audio data, multi-dimensional audio data in which the near distance communication originates in a soundfield from a location of the second device relative to the first device. The processor may next output the multi-dimensional audio data to a transducer so as to reproduce the near distance communication in multiple dimensions.
Assisted near-distance communication using binaural cues
Techniques are described for assisting near distance communications. A first device comprising a receiver, a sensor and a processor may be configured to perform the assisted near distance communication techniques. The receiver may receive, from a second device located within a conversational distance to the first device, monophonic audio data representative of the near distance communication. The sensor may generate a sensor signal representative of spatial information of the near distance communication. The processor may render, based on the spatial information and the monophonic audio data, multi-dimensional audio data in which the near distance communication originates in a soundfield from a location of the second device relative to the first device. The processor may next output the multi-dimensional audio data to a transducer so as to reproduce the near distance communication in multiple dimensions.