Patent classifications
G10L2021/02082
VOICE CALL CONTROL METHOD AND APPARATUS, COMPUTER-READABLE MEDIUM, AND ELECTRONIC DEVICE
Embodiments of this application provide a real-time voice call control method performed by an electronic device. The method includes: obtaining a mixed call voice in real time during a cloud conference call, where the mixed call voice includes at least one branch voice; determining energy information corresponding to each frequency point of the call voice in a frequency domain; determining an energy proportion of each branch voice at each frequency point in total energy of the frequency point based on the energy information at the frequency point; determining a quantity of branch voices comprised in the call voice based on the energy proportion of each branch voice at each frequency point; and controlling the voice call by setting a call voice control manner based on the quantity of branch voices.
HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.
METHOD, APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT FOR VIDEO COMMUNICATION
A method for video communication includes: selecting a first virtual object on a first video communication interface and obtaining associated first virtual object information, displaying a first virtual reality video image and a second virtual reality video image on the first video communication interface, the first virtual reality video image corresponds the first virtual object information and a first user feature, and the second virtual reality video image corresponds second virtual object information and a second user feature; and playing a target virtual audio, the target virtual audio including one or both of a first virtual audio or a second virtual audio, the first virtual audio corresponds to first voice data and the first virtual object information, and the second virtual audio corresponds to second voice data and the second virtual object information.
METHOD FOR PROCESSING AUDIO SIGNAL AND ELECTRONIC DEVICE SUPPORTING THE SAME
An electronic device is provided. The electronic device includes a speaker, a microphone, a processor, and a memory. For example, the electronic device may obtain a first audio signal during a first specified time by using the microphone, may identify reference noise intensity based on the first audio signal, may obtain a second audio signal exceeding the reference noise intensity by using the microphone, may turn off the microphone based on a fact that a third audio signal having the reference noise intensity or less is obtained during a second specified time or longer, and may modulate and output the second audio signal through the speaker while the microphone is turned off.
In-vehicle speech processing apparatus
An in-vehicle apparatus is connectable to a device that includes a voice assistant function. The in-vehicle apparatus includes: a voice detector that performs voice recognition of an audio signal input from a microphone and that controls functions of the in-vehicle apparatus based on a result of the voice recognition; and an interface that communicates with the device. When being informed of a detection of a predetermined word in the audio signal as the result of the voice recognition of the audio signal performed by the voice detector, the interface sends to the device, not via the voice detector, the audio signal input from the microphone. The predetermined word is for activating the voice assistant function of the device.
Method for adjusting sound playback and portable device thereof
A method for adjusting sound playback of a portable device for constancy notwithstanding different environments outputs from the portable device detectable audio signals inaudible to user and the device receives reflected audio before the portable device is actually commanded to play an audio file. A list of volume weightings for reflected audio is calculated. Before commencing playback of the audio file, the portable device obtains reference volume weightings from a list according to the current volume setting, and calculates adjustment coefficients for different frequency bands based on weightings of the reference volume list and of the reflected audio list. The audio signals of the audio file are output after adjustment. A portable device is also disclosed.
Automated clinical documentation system and method
A method, computer program product, and computing system for proactive encounter scanning is executed on a computing device and includes obtaining encounter information of a patient encounter. The encounter information is proactively processed to determine if the encounter information is indicative of one or more medical conditions and to generate one or more result set. The one or more result sets are provided to the user.
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition
A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.
Method and System for Dereverberation of Speech Signals
A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.
AUDIBLE HOWLING CONTROL SYSTEMS AND METHODS
An audio system includes: a speaker; a microphone that generates a microphone signal based on sound output from the speaker; a mixer module configured to generate a mixed signal by mixing the microphone signal with an audio signal; a filter module configured to filter the mixed signal to produce a filtered signal and to apply the filtered signal to the speaker; and a detector module configured to determine a howling frequency in the microphone signal attributable to sound output from the speaker, where the filter module is configured to decrease a magnitude of the filtered signal at the howling frequency.