Patent classifications
H04R2430/20
Deep multi-channel acoustic modeling using frequency aligned network
Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that includes a frequency aligned network (FAN) architecture. Thus, the first model may perform spatial filtering to generate a first feature vector by processing individual frequency bins separately, such that multiple frequency bins are not combined. The first feature vector may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.
ACCIDENTAL VOICE TRIGGER AVOIDANCE USING THERMAL DATA
Methods and systems for processing voice commands are disclosed. A voice controlled device may receive audio data comprising a voice command. Location information indicative of the source of the audio data may be determined. One or more devices may be caused to determine signals based on the location information. The one or more devices may receive thermal data in response to the signals. The thermal data may be analyzed to determine if the thermal data indicates the presence of a person at the expected location. If a person is detected, then the audio data may processed to cause the voice command to be executed.
Fiber microphone
A microphone, comprising at least two electrodes, spaced apart, configured to have a magnetic field within a space between the at least two electrodes; a conductive fiber, suspended between the at least two electrodes; in an air or fluid space subject to waves; wherein the conductive fiber has a radius and length such that a movement of at least a central portion of the conductive fiber approximates an oscillating movement of air or fluid surrounding the conductive fiber along an axis normal to the conductive fiber. An electrical signal is produced between two of the at least two electrodes, due to a movement of the conductive fiber within a magnetic field, due to viscous drag of the moving air or fluid surrounding the conductive fiber. The microphone may have a noise floor of less than 69 dBA using an amplifier having an input noise of 10 nV/√Hz.
Hearing device with omnidirectional sensitivity
A method performed by a first hearing device comprising microphone(s) configured to generate a first input signal, a communication unit configured to receive a second input signal from a second hearing device, an output unit, and a processor, the method comprising: generating a first intermediate signal including or based on a first weighted combination of the first input signal and the second input signal, wherein the first weighted combination is based on a first gain value and/or a second gain value; and generating an output signal for the output unit based on the first intermediate signal; wherein one or both of the first gain value and the second gain value are determined in accordance with an objective of making a power of the first input signal and a power of the second input signal differ by a preset power level difference greater than 2 dB in the weighted combination.
Intelligent audio system using multiple sensor modalities
Embodiments include an audio system comprising an audio device, a speaker, and a processor. The audio system is configured to receive data from one or more sensors corresponding to persons in a room and/or characteristics of a room, and responsively take action to modify one or more characteristics of the audio system, share the information with other systems or devices, and track data over time to determine patterns and trends in the data.
SOUND PRODUCING DEVICE AND METHOD FOR DRIVING THE SAME, DISPLAY PANEL AND DISPLAY APPARATUS
The present disclosure provides a sound producing device, a method for driving the sound producing device, a display panel and a display apparatus. The sound producing device includes a recognition element, a directional sound production element and a control element, where the recognition element is connected with the control element and is configured to acquire information relating to a person in a preset range and transmit the acquired information relating to the person to the control element; the control element is connected with the directional sound production element and is configured to acquire a corresponding audio signal according to the acquired information relating to the person and control the directional sound production element to send out a sound wave according to the acquired audio signal.
HEARING DEVICE COMPRISING AN INPUT TRANSDUCER IN THE EAR
A hearing aid configured to be worn at, and/or in, an ear of a user, comprises a forward path for processing sound from the environment of the user. The forward path comprises a) at least one first microphone providing at least one first electric input signal representing said sound as received at the respective at least one first microphones, said at least one first microphone being located away from a first ear canal of the user, b) an audio signal processor for processing said at least one first electric input signal, or a signal or signals originating therefrom, and for providing a processed signal, c) an output transducer for providing stimuli perceivable as sound to the user in dependence of said processed signal, and d) at least one second microphone connected to said audio signal processor, the at least one second microphone being configured to provide at least one second electric input signal representing said sound as received at the at least one second microphone, the at least one second microphone being located at or in said first ear canal of the user, and e) a feature extractor for extracting acoustic characteristics of said ear of the user from said at least one second electric input signal, or a signal originating therefrom. The hearing aid is configured to include said acoustic characteristics in the processed signal. The invention may e.g. provide improved sound localization in hearing aids.
FILTER COEFFICIENT OPTIMIZATION APPARATUS, FILTER COEFFICIENT OPTIMIZATION METHOD, AND PROGRAM
Provided is a filter coefficient optimization technology that makes it possible to design a stable beamformer having a good quality by considering the relationship of a filter coefficient between adjacent frequency bins. A filter coefficient optimization apparatus includes an optimization unit that calculates an optimum value of a filter coefficient w={w.sub.1, . . . , w.sub.F} (w.sub.f is a filter coefficient of a frequency bin f) of a beamformer that emphasizes sound (target sound) from D sound source, a.sub.f,d being an array manifold vector in the frequency bin f corresponding to a sound wave that comes from an angular direction θ.sub.d in which a sound source d exists, the sound wave being a plane wave, the optimization unit calculating the optimum value based on an optimization problem of a cost function defined using a sum of a sum of a cost function L.sub.MV_f(w.sub.f) and a predetermined regularization term, under a predetermined constraint condition, the predetermined regularization term being defined using a difference in phase between adjacent frequency bins relevant to a response w.sub.f.sup.Ha.sub.f,d of the beamformer in the frequency bin f for the angular direction θ.sub.d.
Time domain neural networks for spatial audio reproduction
A device for reproducing spatial audio using a machine learning model may include at least one processor configured to receive multiple audio signals corresponding to a sound scene captured by respective microphones of a device. The at least one processor may be further configured to provide the multiple audio signals to a machine learning model, the machine learning model having been trained based at least in part on a target rendering configuration. The at least one processor may be further configured to provide, responsive to providing the multiple audio signals to the machine learning model, multichannel audio signals that comprise a spatial reproduction of the sound scene in accordance with the target rendering configuration.
ADAPTIVE NOISE CANCELLING FOR CONFERENCING COMMUNICATION SYSTEMS
A communication system with a noise cancellation (NC) assembly providing adaptive or dynamic noise cancellation. The NC assembly includes a localizer module determining, during a communication session (active speaking or during idle times), a location of the active talker. The NC assembly includes a beam generator forming a beam in the determined direction of the active talker to enhance the active talker speech. Once the NC assembly has determined the position of the active talker, the NC assembly assigns a microphone of the microphone array or generated beam in that active direction to be the “active signal” source. The NC assembly assigns a second microphone or beam to be the noise source for NC purposes, and this source may be selected to be in acoustic shadow of the first microphone used as the active signal source or may be the farthest away in its position from the active talker's position.