Patent classifications
H04R3/005
Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network
In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.
MEMS CHIP
Disclosed is a MEMS chip that in certain embodiments includes a substrate with a back cavity, and a plate capacitor bank provided on the substrate; the plate capacitor bank at least includes a first plate capacitor structure and a second plate capacitor structure located below the first plate capacitor structure and arranged in parallel with the first plate capacitor structure; the first plate capacitor structure includes a first diaphragm and a first hack electrode; and the second plate capacitor structure includes a second. diaphragm and a second back electrode.
MATCHED AND EQUALIZED MICROPHONE OUTPUT OF AUTOMOTIVE MICROPHONE SYSTEMS
A vehicle microphone system may include at least two microphones forming a microphone array, at least one loudspeaker configured to emit audio signals. a processor coupled to a memory and programmed to receive incoming audio signals from the microphone array, determine at least one parameter for each channel of the microphone array, determine at least one filter to apply to at least one channel based on a difference between the parameters of each channel, and store the at least one filter in the memory.
AUDIO DEVICE AUTO-LOCATION
A method for estimating an audio device location in an environment may involve obtaining direction of arrival (DOA) data for each audio device of a plurality of audio devices in the environment and determining interior angles for each of a plurality of triangles based on the DOA data. Each triangle may have vertices that correspond with audio device locations. The method may involve determining a side length for each side of each of the triangles, performing a forward alignment process of aligning each of the plurality of triangles produce a forward alignment matrix and performing a reverse alignment process of aligning each of the plurality of triangles in a reverse sequence to produce a reverse alignment matrix. A final estimate of each audio device location may be based, at least in part, on values of the forward alignment matrix and values of the reverse alignment matrix.
MICROPHONE ARRAY SYSTEM WITH SOUND WIRE INTERFACE AND ELECTRONIC DEVICE
A microphone array system, comprises N microphones, including a first microphone . . . a Nth microphone, wherein N is a natural number greater than 2. Each of the N microphones is provided with: an acoustic transducer for picking up a sound signal and converting the sound signal into an electric signal; a voice activation detector, connected to a corresponding acoustic transducer, and configured to perform a voice activation detection on the electric signal and form an activation signal; a buffer memory, connected to the acoustic transducer, and configured to store a 1/N electric signal of a predetermined segment; a sound wire interface, connected to a corresponding acoustic transducer, the buffer memory, and the voice activation detector, wherein the sound wire interface is connected to an external master chip via a sound wire bus for outputting the activation signal to the external master chip.
SYSTEM AND METHOD FOR AUDIO TAGGING OF AN OBJECT OF INTEREST
Techniques for audio tagging of an object of interest are provided. An object of interest within a field of view of a first video camera may be identified at a first time. At least one audio tag representing a first sound created by the object of interest may be generated and associated with the object of interest. At a second time later than the first and at a second video camera, a second sound generated by an unidentified object that is not in the field of view of the second video camera may be detected. An audio tag representing the second tag may be generated. It may be determined that the object of interest and the unidentified object of interest are the same when the audio tag representing the first sound and the second sound are the same.
Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
Array microphone systems and methods that can automatically focus and/or place beamformed lobes in response to detected sound activity are provided. The automatic focus and/or placement of the beamformed lobes can be inhibited based on a remote far end audio signal. The quality of the coverage of audio sources in an environment may be improved by ensuring that beamformed lobes are optimally picking up the audio sources even if they have moved and changed locations.
Dynamically assigning multi-modality circumstantial data to assistant action requests for correlating with subsequent requests
Implementations set forth herein relate to an automated assistant that uses circumstantial condition data, generated based on circumstantial conditions of an input, to determine whether the input should affect an action been initialized by a particular user. The automated assistant can allow each user to manipulate their respective ongoing action without necessitating interruptions for soliciting explicit user authentication. For example, when an individual in a group of persons interacts with the automated assistant to initialize or affect a particular ongoing action, the automated assistant can generate data that correlates that individual to the particular ongoing action. The data can be generated using a variety of different input modalities, which can be dynamically selected based on changing circumstances of the individual. Therefore, different sets of input modalities can be processed each time a user provides an input for modifying an ongoing action and/or initializing another action.
Method and system for speech enhancement
A method and a system for speech enhancement including a time synchronization unit configured to synchronize microphone signals sent from at least two microphones; a source separation unit configured to separate the synchronized microphone signals and output a separated speech signal, which corresponds to a speech source; and a noise reduction unit including a feature extraction unit configured to extract a speech feature of the separated speech signal and a neural network configured to receive the speech feature and output a clean speech feature.
VEHICLE AVATAR DEVICES FOR INTERACTIVE VIRTUAL ASSISTANT
A system and method for providing avatar device status indicators for voice assistants in multi-zone vehicles. The method comprises: receiving at least one signal from a plurality of microphones, wherein each microphone is associated with one of a plurality of spatial zones, and one of a plurality of avatar devices; wherein the at least one signal further comprises a speech signal component from a speaker; wherein the speech signal component is a voice command or question; sending zone information associated with the speaker and with one of the plurality of spatial zones to an avatar; activating one the plurality of avatar devices in a respective one of the plurality of spatial zones associated with the speaker.