Patent classifications
G10L21/02
AUDIO SIGNAL PROCESSING METHOD, DEVICE AND STORAGE MEDIUM
An audio signal processing method, device and storage medium, are provided. The method includes performing sub-band filtering on a to-be-processed audio signal to obtain a plurality of sub-band signals, wherein the number of the sub-band signals is determined according to a lowest frequency of a band-pass filter and a cut-off frequency of an audio apparatus, and the sub-band signals comprise sub-band band-pass signals; and obtaining a target audio signal according to each of the sub-band band-pass signals and a processing algorithm of virtual bass enhancement signal.
METHOD AND SYSTEM FOR AUTOMATIC DETECTION AND CORRECTION OF SOUND DISTORTION
A computer-implemented method for correcting muffled speech caused by facial coverings is disclosed. The computer-implemented method includes monitoring a user's speech for speech distortion. The computer-implemented method further includes determining that the user's speech is distorted. The computer-implemented method further includes determining that a cause of the user's speech distortion is based, at least in part, on a presence of a particular type of facial covering. The computer-implemented method further includes automatically correcting the speech distortion of the user based, at least in part, on the particular type of facial covering causing the speech distortion.
COMMUNICATION SERVER, COMMUNICATION SYSTEM, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
A communication server includes a processor that processes voice data and that is configured to: function as a voice filter that extracts a voice component of a specific person; provide input voice data to the voice filter, the input voice data being received from a first terminal apparatus; and transmit output voice data to a second terminal apparatus different from the first terminal apparatus, the output voice data including a voice component that is output from the voice filter.
COMMUNICATION SERVER, COMMUNICATION SYSTEM, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
A communication server includes a processor that processes voice data and that is configured to: function as a voice filter that extracts a voice component of a specific person; provide input voice data to the voice filter, the input voice data being received from a first terminal apparatus; and transmit output voice data to a second terminal apparatus different from the first terminal apparatus, the output voice data including a voice component that is output from the voice filter.
INFORMATION PROCESSING METHOD, ELECTRONIC EQUIPMENT, AND STORAGE MEDIUM
Methods, apparatuses, and non-transitory computer-readable storage mediums are provided for information processing. The method may be applied to an electronic equipment. The electronic equipment may collect environmental audio information when the electronic equipment plays multimedia. The electronic equipment may also perform noise detection on the environmental audio information to determine whether the environmental audio information represents a target noise scenario. The electronic equipment may also process a parameter of the multimedia played by the electronic equipment when the environmental audio information represents the target noise scenario.
Anti-causal filter for audio signal processing
An audio signal processor includes a digital filter block configured to receive an audio signal and output a first filtered audio signal, and a phase linearization block configured to receive the first filtered audio signal and output a second filtered audio signal with a more linear phase.
NOISE FLOOR ESTIMATION AND NOISE REDUCTION
Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.
NOISE FLOOR ESTIMATION AND NOISE REDUCTION
Embodiments are disclosed for noise floor estimation and noise reduction, In an embodiment, a method comprises: obtaining an audio signal; dividing the audio signal into a plurality of buffers; determining time-frequency samples for each buffer of the audio signal; for each buffer and for each frequency, determining a median (or mean) and a measure of an amount of variation of energy based on the samples in the buffer and samples in neighboring buffers that together span a specified time range of the audio signal; combining the median (or mean) and the measure of the amount of variation of energy into a cost function; for each frequency: determining a signal energy of a particular buffer of the audio signal that corresponds to a minimum value of the cost function; selecting the signal energy as the estimated noise floor of the audio signal; and reducing, using the estimated noise floor, noise in the audio signal.
Textual echo cancellation
A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.
Textual echo cancellation
A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.