G10L21/0208

HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230046518 · 2023-02-16 ·

This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.

Wideband DOA Improvements for Fixed and Dynamic Beamformers
20230050677 · 2023-02-16 · ·

This disclosure describes an apparatus and method of an embodiment of an invention that is improves Direction of Arrival (DOA) determinations. This embodiment of the apparatus includes a plurality of microphones coupled together as a microphone array used for beamforming, the plurality of microphones are positioned at predetermined locations and produce audio signals to be used to form a directional pickup pattern; a processor, memory, storage, and a power supply operably coupled to the microphone array, the processor configured to execute the following steps: processing an algorithm for a Direction of Arrival (DOA) determination; supplemental processing that improves the DOA processing.

Creating a Printed Publication, an E-Book, and an Audio Book from a Single File
20230049537 · 2023-02-16 ·

As an example, a server may receive, from a computing device, a submission created by an author. The submission includes book data associated with a book and author data associated with the author. The author data includes incarceration data indicating whether the author was incarcerated. The server may determine, based on the author data and the book data, that the submission is publishable. The server may create, based on the book data, a printable book, an e-book, and an audio book and make one or more of the printable book, the e-book, and the audio book available for acquisition.

CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
20230050954 · 2023-02-16 ·

A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.

EXTRANEOUS VOICE REMOVAL FROM AUDIO IN A COMMUNICATION SESSION
20230047187 · 2023-02-16 ·

The technology disclosed herein enables removal of extraneous voices from audio in a communication session. In a particular embodiment, a method includes receiving audio captured from an endpoint operated by a user on a communication session. The method further includes identifying an extraneous voice in the audio, wherein the voice is from a person other than the user, and removing the extraneous voice from the audio. After removing the extraneous voice, the method includes transmitting the audio to another endpoint on the communication session.

Methods and systems for improved signal decomposition

A method for improving decomposition of digital signals using training sequences is presented. A method for improving decomposition of digital signals using initialization is also provided. A method for sorting digital signals using frames based upon energy content in the frame is further presented. A method for utilizing user input for combining parts of a decomposed signal is also presented.

Methods and systems for improved signal decomposition

A method for improving decomposition of digital signals using training sequences is presented. A method for improving decomposition of digital signals using initialization is also provided. A method for sorting digital signals using frames based upon energy content in the frame is further presented. A method for utilizing user input for combining parts of a decomposed signal is also presented.

In-vehicle speech processing apparatus

An in-vehicle apparatus is connectable to a device that includes a voice assistant function. The in-vehicle apparatus includes: a voice detector that performs voice recognition of an audio signal input from a microphone and that controls functions of the in-vehicle apparatus based on a result of the voice recognition; and an interface that communicates with the device. When being informed of a detection of a predetermined word in the audio signal as the result of the voice recognition of the audio signal performed by the voice detector, the interface sends to the device, not via the voice detector, the audio signal input from the microphone. The predetermined word is for activating the voice assistant function of the device.

Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition

A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

Method and System for Dereverberation of Speech Signals

A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.