Patent classifications
G10L2021/02082
Audible howling control systems and methods
An audio system includes: a speaker; a microphone that generates a microphone signal based on sound output from the speaker; a mixer module configured to generate a mixed signal by mixing the microphone signal with an audio signal; a filter module configured to filter the mixed signal to produce a filtered signal and to apply the filtered signal to the speaker; and a detector module configured to determine a howling frequency in the microphone signal attributable to sound output from the speaker, where the filter module is configured to decrease a magnitude of the filtered signal at the howling frequency.
Dynamic Player Selection for Audio Signal Processing
In one aspect, a first playback device is configured to (i) receive a set of voice signals, (ii) process the set of voice signals using a first set of audio processing algorithms, (iii) identify, from the set of voice signals, at least two voice signals that are to be further processed, (iv) determine that the first playback device does not have a threshold amount of computational power available, (v) receive an indication of an available amount of computational power of a second playback device, (vi) send the at least two voice signals to the second playback device, (vii) cause the second playback device to process the at least two voice signals using a second set of audio processing algorithms, (viii) receive, from the second playback device, the processed at least two voice signals, and (ix) combine the processed at least two voice signals into a combined voice signal.
VOICE REINFORCEMENT IN MULTIPLE SOUND ZONE ENVIRONMENTS
Microphone signal is received from at least one microphone. AEC produces an echo cancelled microphone signal using first adaptive filters to estimate and cancel feedback that is a result of the environment. AFC produces a processed microphone signal using second adaptive filters to estimate and cancel feedback resulting from application of the reinforced voice signal within the environment. The uttered speech is reinforced in the processed microphone signal to produce the reinforced voice signal. The reinforced voice signal and the audio signal is applied to the loudspeakers. A step size of adjustment of the second adaptive filters may be increased responsive to detection of reverberation in the microphone signal. The reverberation that is used to control the step size of the second adaptive filters may be added artificially. This may provide multiple benefits including improving adjustment of the second adaptive filters and also improving the sound impression of the voice.
METHOD AND SYSTEM FOR MITIGATING UNWANTED AUDIO NOISE IN A VOICE ASSISTANT-BASED COMMUNICATION ENVIRONMENT
A method for mitigating unwanted audio noise in internet of things (IoT) based communication environment is provided. The method includes identifying and pairing one or more IoT devices with a voice assistant device, and then dividing the one or more paired IoT devices into a plurality of clusters. The method further includes detecting a user's location with respect to a location of the voice assistant device and then determining a cluster among the plurality of clusters corresponding to the user's location based on the detected user's location and thereafter using a recurrent neural networks (RNN) model, predicting an optimal sound output of the voice assistance device that is audible at the detected user's location. The method furthermore includes correcting the predicted optimal sound output of the voice assistance device using a sound parameter value associated with the determined cluster and a phase shift of the predicted optimal sound output.
Echo detection
A method includes receiving a microphone audio signal and a playout audio signal, and determining a frequency representation of the microphone audio signal and a frequency representation of the playout audio signal. For each frequency representation, the method also includes determining features based on the frequency representation. Each feature corresponds to a pair of frequencies of the frequency representation and a period of time between the pair of frequencies. The method also includes determining that a match occurs between a first feature based on the frequency representation of the microphone audio signal and a second feature based on the frequency representation of the playout audio signal, and determining that a delay value between the first feature and the second feature corresponds to an echo within the microphone audio signal.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.
Microphone array device, conference system including microphone array device and method of controlling a microphone array device
A microphone array device including microphone capsules and at least one processing unit configured to receive output signals of the microphone capsules, dynamically steer an audio beam based on the received output signal of the microphone capsules, and generate and provide an audio output signal based on the received output signal of the microphone capsules. The processing unit is configured to operate in a dynamic beam mode where at least one focused audio beam is formed that points towards a detected audio source and in a default beam mode where a broader audio beam is formed that covers substantially a default detection area. The microphone array may be incorporated into a conference system.
Hearing device comprising a recurrent neural network and a method of processing an audio signal
A hearing device, e.g. a hearing aid or a headset, configured to be worn by a comprises an input unit for providing at least one electric input signal in a time-frequency representation; and a signal processor comprising a target signal estimator for providing an estimate of the target signal; a noise estimator for providing an estimate of the noise; and a gain estimator for providing respective gain values in dependence of said target signal estimate and said noise estimate. The gain estimator comprises a trained neural network, wherein the outputs of the neural network comprise real or complex valued gains, or separate real valued gains and real valued phases. The signal processor is configured—at a given time instance t—to calculate changes Δx(i,t)=x(i,t)−{circumflex over (x)}(i,t−1), and Δh(j,t−1)=h(j,t−1)−ĥ(j,t−2) to an input vector x(t) and to the hidden state vector h(t−1), respectively, from one time instance, t−1, to the next, t, and where {circumflex over (x)}(i,t−1) and ĥ(j,t−2) are estimated values of x(i,t−1) and h(j,t−2), respectively, where indices i, j refers to the i.sup.th input neuron and the j.sup.th neuron of the hidden state, respectively, where 1≤i≤N.sub.ch,x and 1≤j≤N.sub.ch,oh, wherein N.sub.ch,x and N.sub.ch,oh is the number of processing channels of the input vector x and the hidden state vector h, respectively, and wherein the signal processor is further configured to provide that the number of updated channels among said N.sub.ch,x and said N.sub.ch,oh processing channels of the modified gated recurrent unit for said input vector x(t) and said hidden state vector h(t−1), respectively, at said given time instance t is limited to a number of peak values N.sub.p,x, and N.sub.p,oh, respectively, where N.sub.p,x is smaller than N.sub.ch,x, and N.sub.p,oh, is smaller than N.sub.ch,oh.
SYSTEM FOR DYNAMICALLY ADJUSTING A SOUNDMASK SIGNAL BASED ON REALTIME AMBIENT NOISE PARAMETERS WHILE MAINTAINING ECHO CANCELLER CALIBRATION PERFORMANCE
A system and method are provided for dynamic sound mask adjustment. A sound mask is used for obtaining an impulse response measurement that adjusts a generated sound mask dynamically based on real-time ambient noise parameters, while maintaining echo canceller calibration performance. The system includes a dynamic sound mask generator that includes a noise accumulator and monitor that includes a processor and memory including instructions executed by the processor for performing the dynamic sound mask adjustment. If the sound mask is not in the hysteresis range, the current sound mask level and iteration update rate are adjusted. if the sound mask is in the hysteresis range, the current sound mask level and iteration update rate are maintained.
Echo estimation and management with adaptation of sparse prediction filter set
Methods for echo estimation or echo management (echo suppression or cancellation) on an input audio signal, with at least one of adaptation of a sparse prediction filter set, modification (for example, truncation) of adapted prediction filter impulse responses, generation of a composite impulse response from adapted prediction filter impulse responses, or use of echo estimation and/or echo management resources in a manner determined at least in part by classification of the input audio signal as being (or not being) echo free. Other aspects are systems configured to perform any embodiment of any of the methods.