Patent classifications
G10L25/84
HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.
HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.
Altering undesirable communication data for communication sessions
This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a doorbell ringing, etc., and/or video data, such as rude gestures, inappropriate facial expressions, etc. The communications service may stream the communication data for the communication session partly through managed servers and analyze the communication data to detect undesirable portions. The communications service may alter or remove the portions of communication data received from a first user device, such as by filtering, refraining from transmitting, or modifying the undesirable portions. The communications service may send the modified communication data to a second user device engaged in the communication session after removing the undesirable portions.
Altering undesirable communication data for communication sessions
This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a doorbell ringing, etc., and/or video data, such as rude gestures, inappropriate facial expressions, etc. The communications service may stream the communication data for the communication session partly through managed servers and analyze the communication data to detect undesirable portions. The communications service may alter or remove the portions of communication data received from a first user device, such as by filtering, refraining from transmitting, or modifying the undesirable portions. The communications service may send the modified communication data to a second user device engaged in the communication session after removing the undesirable portions.
Hearing system comprising a personalized beamformer
A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.
Hearing system comprising a personalized beamformer
A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.
Dynamic voice accentuation and reinforcement
Systems and methods for dynamic voice accentuation and reinforcement are presented herein. One embodiment comprises one or more audio input sources; one or more audio output sources; one or more band pass filters; and a processing control unit that includes an audio processing unit, and which executes a method: differentiating between audio input sources as vocal sound audio input sources and ambient noise audio input sources; increasing the gain of the vocal sound audio input sources; inverting a polarity of an ambient noise signal received by each of the ambient noise audio input sources; and adding the inverted polarity to either an output signal of at least one of the one or more audio output sources, or to an input signal of at least one of the vocal sound audio input sources, to reduce ambient noise.
Dynamic voice accentuation and reinforcement
Systems and methods for dynamic voice accentuation and reinforcement are presented herein. One embodiment comprises one or more audio input sources; one or more audio output sources; one or more band pass filters; and a processing control unit that includes an audio processing unit, and which executes a method: differentiating between audio input sources as vocal sound audio input sources and ambient noise audio input sources; increasing the gain of the vocal sound audio input sources; inverting a polarity of an ambient noise signal received by each of the ambient noise audio input sources; and adding the inverted polarity to either an output signal of at least one of the one or more audio output sources, or to an input signal of at least one of the vocal sound audio input sources, to reduce ambient noise.
Speech feature extraction apparatus, speech feature extraction method, and computer-readable storage medium
A speech feature extraction apparatus 100 includes a voice activity detection unit 103 that drops non-voice frames from frames corresponding to an input speech utterance, and calculates a posterior of being voiced for each frame, a voice activity detection process unit 106 calculates a function value as weights in pooling frames to produce an utterance-level feature, from a given a voice activity detection posterior, and an utterance-level feature extraction unit 112 that extracts an utterance-level feature, from the frame on a basis of multiple frame-level features, using the function values.
Speech feature extraction apparatus, speech feature extraction method, and computer-readable storage medium
A speech feature extraction apparatus 100 includes a voice activity detection unit 103 that drops non-voice frames from frames corresponding to an input speech utterance, and calculates a posterior of being voiced for each frame, a voice activity detection process unit 106 calculates a function value as weights in pooling frames to produce an utterance-level feature, from a given a voice activity detection posterior, and an utterance-level feature extraction unit 112 that extracts an utterance-level feature, from the frame on a basis of multiple frame-level features, using the function values.