Patent classifications
G10L2025/937
Voice activity detector for audio signals
According to one aspect, a method for detecting voice activity is disclosed, the method including receiving a frame of an input audio signal, the input audio signal having an sample rate; dividing the frame into a plurality of subbands based on the sample rate, the plurality of subbands including at least a lowest subband and a highest subband; filtering the lowest subband with a moving average filter to reduce an energy of the lowest subband; estimating a noise level for each of the plurality of subbands; calculating a signal to noise ratio value for each of the plurality of subbands; and determining a speech activity level of the frame based on an average of the calculated signal to noise ratio values and a weighted average of an energy of each of the plurality of subbands. Other aspects include audio decoders that decode audio that was encoded using the methods described herein.
VOICE PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
Provided in the present disclosure are a voice processing method, an apparatus, an electronic device, and a storage medium, the method comprising: detecting the working state of a current call system, and when the working state is a two-end speaking state or a remote-end speaking state, performing compression processing on a subsequent remote-end voice signal, acquiring a near-end voice signal by means of a microphone, performing echo processing on the basis of the near-end voice signal and the compression-processed remote-end voice signal to obtain an echo-processed near-end voice signal and a remaining echo signal, performing non-linear suppression processing on the near-end voice signal and the remaining echo signal, and performing gain control on the suppression-processed near-end voice signal.
METHODS AND DEVICES FOR DETECTING AN ATTACK IN A SOUND SIGNAL TO BE CODED AND FOR CODING THE DETECTED ATTACK
A method and device for detecting an attack in a sound signal to be coded wherein the sound signal is processed in successive frames each including a number of sub-frames. The device comprises a first-stage attack detector for detecting the attack in a last sub-frame of a current frame, and a second-stage attack detector for detecting the attack in one of the sub-frames of the current frame, including the sub-frames preceding the last sub-frame. No attack is detected when the current frame is not an active frame previously classified to be coded using a generic coding mode. A method and device for coding an attack in a sound signal are also provided. The coding device comprises the above mentioned attack detecting device and an encoder of the sub-frame comprising the detected attack using a transition coding mode using a glottal-shape codebook populated with glottal impulse shapes.
DETECTION OF LIVE SPEECH
A method of detecting live speech comprises: receiving a signal containing speech; forming a framed version of the received signal that comprises a plurality of frames; forming a first subset of the plurality of frames, wherein each frame of the first subset contains a signal that contains voiced speech; forming a second subset of the plurality of frames, wherein each frame of the second subset contains a signal that contains unvoiced speech; forming a first frame that is representative of a sum of a plurality of frames of the first subset; forming a second frame that is representative of a sum of a plurality of frames of the second subset; performing a time-frequency transformation operation on the first frame, to form an average voiced frequency spectrum; performing a time-frequency transformation operation on the second frame, to form an average unvoiced frequency spectrum; obtaining one or more voiced features from the voiced frequency spectrum; and obtaining one or more unvoiced features from the unvoiced frequency spectrum. Based on the one or more voiced features and the one or more unvoiced features, a determination is made whether the speech is live speech, or not.
Responding method and device, electronic device and storage medium
Aspects of the disclosure provide a responding method and device, an electronic device and a storage medium. The method is applied to a first electronic device including an audio acquisition component and an audio output component. The method can include acquiring a voice signal through the audio acquisition component, determining whether to respond to the voice signal, and responsive to determining to respond to the voice signal, outputting a first sound signal by the audio output component, the first sound signal being configured to notify at least one second electronic device that the first electronic device responds to the voice signal. In such a manner, an electronic device, responsive to determining to respond to a voice signal, outputs a sound signal to prevent other electronic device(s) from responding to the voice signal, so that competitions between electronic devices are reduced and a user experience is improved.
Voice detection from sub-band time-domain signals
A method for detecting voice, an apparatus for detecting voice, and a chip for processing voice are disclosed. The apparatus includes: a sub-band generation module and a voice activity detection module; wherein the sub-band generation module is configured to process a current time-domain signal frame to obtain sub-band time-domain signals, and the voice activity detection module is configured to determine, according to amplitudes of the sub-band time-domain signals in the current time-domain signal frame, whether the current time-domain signal frame is an effective voice signal. The apparatus for detecting voice may be practiced in a time domain, such that complexity of algorithms is lowered, and power consumption is reduced.
DETECTION OF LIVE SPEECH
A method of detecting live speech comprises: receiving a signal containing speech; obtaining a first component of the received signal in a first frequency band, wherein the first frequency band includes audio frequencies; and obtaining a second component of the received signal in a second frequency band higher than the first frequency band. Then, modulation of the first component of the received signal is detected; modulation of the second component of the received signal is detected; and the modulation of the first component of the received signal and the modulation of the second component of the received signal are compared. It may then be determined that the speech may not be live speech, if the modulation of the first component of the received signal differs from the modulation of the second component of the received signal.
ACOUSTIC VOICE ACTIVITY DETECTION (AVAD) FOR ELECTRONIC SYSTEMS
Acoustic Voice Activity Detection (AVAD) methods and systems are described. The AVAD methods and systems, including corresponding algorithms or programs, use microphones to generate virtual directional microphones which have very similar noise responses and very dissimilar speech responses. The ratio of the energies of the virtual microphones is then calculated over a given window size and the ratio can then be used with a variety of methods to generate a VAD signal. The virtual microphones can be constructed using either an adaptive or a fixed filter.
METHOD OF DETECTING SPEECH AND SPEECH DETECTOR FOR LOW SIGNAL-TO-NOISE RATIOS
The present disclosure relates in a first aspect to a method of detecting speech of incoming sound at a portable communication device. A microphone signal is divided into a plurality of separate frequency band signals from which respective power envelope signals are derived. Onsets of voiced speech of a first frequency band signal are determined based on a first stationary noise power signal and a first clean power signal and onsets of unvoiced speech in a second frequency band signal are determined based on a second stationary noise power signal and second clean power signal.
Audio classification based on perceptual quality for low or medium bit rates
The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria. In some embodiments, only low or medium bit rate signals are considered for re-classification. The periodicity parameters can include any characteristic or set of characteristics indicative of periodicity. For example, the periodicity parameter may include pitch differences between subframes in the audio signal, a normalized pitch correlation for one or more subframes, an average normalized pitch correlation for the audio signal, or combinations thereof. Audio signals which are re-classified as VOICED signals may be encoded in the time-domain, while audio signals that remain classified as AUDIO signals may be encoded in the frequency-domain.