G10L21/0272

LOUDSPEAKER PROTECTION

This application describes methods and apparatus for loudspeaker protection. A loudspeaker protection system (100) is described having a first frequency band-splitter (102) for splitting an input audio signal (Vin) into a plurality of audio signals (v1, v2 . . . , vn) in different respective frequency bands (ω1, ω2 . . . , ωn). A first gain block (103) is configured to apply a respective frequency band gain (g1, g2 . . . , g3) to each of the audio signals in the different respective frequency bands and a gain controller (107, 108, 109) is provided for controlling the respective band gains. A displacement modeller (104, 105) determines a plurality of displacement signals (x1, x2 . . . , xn) based on the input audio signal (Vin) and a displacement model (104a) where each displacement signal corresponds to a modelled cone displacement for the loudspeaker for one of said different respective frequency bands. The gain controller (107, 108, 109) is configured to control the respective frequency band gains based on the plurality of displacement signals.

LOUDSPEAKER PROTECTION

This application describes methods and apparatus for loudspeaker protection. A loudspeaker protection system (100) is described having a first frequency band-splitter (102) for splitting an input audio signal (Vin) into a plurality of audio signals (v1, v2 . . . , vn) in different respective frequency bands (ω1, ω2 . . . , ωn). A first gain block (103) is configured to apply a respective frequency band gain (g1, g2 . . . , g3) to each of the audio signals in the different respective frequency bands and a gain controller (107, 108, 109) is provided for controlling the respective band gains. A displacement modeller (104, 105) determines a plurality of displacement signals (x1, x2 . . . , xn) based on the input audio signal (Vin) and a displacement model (104a) where each displacement signal corresponds to a modelled cone displacement for the loudspeaker for one of said different respective frequency bands. The gain controller (107, 108, 109) is configured to control the respective frequency band gains based on the plurality of displacement signals.

METHOD, APPARATUS FOR ELIMINATING POPPING SOUNDS AT THE BEGINNING OF AUDIO, AND STORAGE MEDIUM
20180012620 · 2018-01-11 ·

A method and apparatus for eliminating popping sounds at the beginning of audio includes: examining audio frames within a pre-set time period at the beginning of audio to determine a popping residing section; applying popping elimination to audio frames in the popping residing section; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting the amplitudes of the audio frames in the popping residing section to zero in response to a determination that the two average values are both smaller than a pre-set sound reduction threshold; weakening the amplitudes of the audio frames in the popping residing section in response to a determination that both the two average values are not smaller than a pre-set sound reduction threshold; M and K are integers larger than one.

Voice Filtering Other Speakers From Calls And Audio Messages
20230005480 · 2023-01-05 · ·

A method includes receiving a first instance of raw audio data corresponding to a voice-based command and receiving a second instance of the raw audio data corresponding to an utterance of audible contents for an audio-based communication spoken by a user. When a voice filtering recognition routine determines to activate voice filtering for at least the voice of the user, the method also includes obtaining a respective speaker embedding of the user and processing, using the respective speaker embedding, the second instance of the raw audio data to generate enhanced audio data for the audio-based communication that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user The method also includes executing.

Automatic isolation of multiple instruments from musical mixtures

A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, Θ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, Θ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, Θ) of the neural network system to corresponding target signals. For each compared output f(X, Θ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, Θ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.

Automatic isolation of multiple instruments from musical mixtures

A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, Θ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, Θ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, Θ) of the neural network system to corresponding target signals. For each compared output f(X, Θ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, Θ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.

Utilizing machine learning models to provide cognitive speaker fractionalization with empathy recognition

A device may receive audio data identifying a plurality of speakers and may process the audio data, with a plurality of clustering models, to identify a plurality of speaker segments. The device may determine a plurality of diarization error rates for the plurality of speaker segments and may identify a plurality of errors in the plurality of speaker segments. The device may select rectification models to rectify the plurality of errors and may segment and/or re-segment the audio data with the rectification models to generate re-segmented audio data. The device may determine a plurality of modified diarization error rates for the plurality of speaker segments based on the re-segmented audio data and may select one of the plurality of speaker segments based on the plurality of modified diarization error rates. The device may calculate an empathy score based on the selected speaker segment and may perform actions based on the empathy score.

Utilizing machine learning models to provide cognitive speaker fractionalization with empathy recognition

A device may receive audio data identifying a plurality of speakers and may process the audio data, with a plurality of clustering models, to identify a plurality of speaker segments. The device may determine a plurality of diarization error rates for the plurality of speaker segments and may identify a plurality of errors in the plurality of speaker segments. The device may select rectification models to rectify the plurality of errors and may segment and/or re-segment the audio data with the rectification models to generate re-segmented audio data. The device may determine a plurality of modified diarization error rates for the plurality of speaker segments based on the re-segmented audio data and may select one of the plurality of speaker segments based on the plurality of modified diarization error rates. The device may calculate an empathy score based on the selected speaker segment and may perform actions based on the empathy score.

PERCEPTUAL OPTIMIZATION OF MAGNITUDE AND PHASE FOR TIME-FREQUENCY AND SOFTMASK SOURCE SEPARATION SYSTEMS

A method comprises: obtaining softmask values for frequency bins of time-frequency tiles representing an audio signal; reducing, or expanding and limiting, the softmask values; and applying the reduced, or expanded and limited, softmask values to the frequency bins to create a time-frequency representation of an estimated target source. An alternative method comprises, for each time-frequency tile: obtaining softmask values; applying the softmask values to the frequency bins to create a time-frequency domain representation of an estimated target source; obtaining a panning parameter and a source concentration estimates for the target source; determining, using the panning parameter estimate and the softmask values, a magnitude for the time-frequency representation of the estimated target source; determining, using the panning parameter estimate and the source phase concentration estimate, a phase for the time-frequency representation of the estimated target source; and combining the magnitude and the phase.

Speech recognition method, electronic device, and computer storage medium

A speech recognition method includes segmenting captured voice information to obtain a plurality of voice segments, and extracting voiceprint information of the voice segments; matching the voiceprint information of the voice segments with a first stored voiceprint information to determine a set of filtered voice segments having voiceprint information that successfully matches the first stored voiceprint information; combining the set of filtered voice segments to obtain combined voice information, and determining combined semantic information of the combined voice information; and using the combined semantic information as a speech recognition result when the combined semantic information satisfies a preset rule.