G10L15/28

Shared speech processing network for multiple speech applications

A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.

Systems and methods for associating playback devices with voice assistant services
11696074 · 2023-07-04 · ·

Systems and methods for media playback via a media playback system include detecting a first wake word via a first network microphone device of a first playback device, detecting a second wake word via a second network microphone device of a second playback device, and forming a bonded zone that includes the first playback device and the second playback device. In response to detecting the first wake word, a first voice first voice utterance following the first wake word is transmitted a first voice assistant service. In response to detecting the second wake word, a second voice utterance following the second wake word is transmitted to a second voice assistant service. Requested media content received from the first and/or second voice assistant service is played back via the first playback device and the second playback device in synchrony with one another.

Systems and methods for associating playback devices with voice assistant services
11696074 · 2023-07-04 · ·

Systems and methods for media playback via a media playback system include detecting a first wake word via a first network microphone device of a first playback device, detecting a second wake word via a second network microphone device of a second playback device, and forming a bonded zone that includes the first playback device and the second playback device. In response to detecting the first wake word, a first voice first voice utterance following the first wake word is transmitted a first voice assistant service. In response to detecting the second wake word, a second voice utterance following the second wake word is transmitted to a second voice assistant service. Requested media content received from the first and/or second voice assistant service is played back via the first playback device and the second playback device in synchrony with one another.

Hotphrase triggering based on a sequence of detections
11694685 · 2023-07-04 · ·

A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.

Discrete three-dimensional processor

A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.

Electronic device and method for controlling the electronic device

Disclosed are an electronic device capable of efficiently performing speech recognition and natural language understanding and a method for controlling thereof. The electronic device includes: a microphone; a non-volatile memory configured to store virtual assistant model data comprising data that is classified according to a plurality of domains and data that is commonly used for the plurality of domains; a volatile memory; and a processor configured to: based on receiving, through the microphone, a trigger input to perform speech recognition for a user speech, initiate loading the virtual assistant model data from the non-volatile memory into the volatile memory, load, into the volatile memory, first data from among the data classified according to the plurality of domains and, while loading the first data into the volatile memory, load at least a part of the data commonly used for the plurality of domains into the volatile memory.

MICROPHONE UNIT
20220415330 · 2022-12-29 ·

A microphone unit includes: an audio data acquisition unit that acquires speech as audio data; an audio data registration unit that registers verification audio data obtained by extracting a feature point from the audio data; an evaluation audio data acquisition unit that acquires speech that is input to a first microphone as evaluation audio data; a verification unit that verifies whether or not a speaker who uttered speech that is based on the evaluation audio data is a speaker who uttered speech that is based on the verification audio data, based on the verification audio data and a feature point extracted from the evaluation audio data; and a verification result output unit that outputs a result of verification performed by the verification unit.

MICROPHONE UNIT
20220415330 · 2022-12-29 ·

A microphone unit includes: an audio data acquisition unit that acquires speech as audio data; an audio data registration unit that registers verification audio data obtained by extracting a feature point from the audio data; an evaluation audio data acquisition unit that acquires speech that is input to a first microphone as evaluation audio data; a verification unit that verifies whether or not a speaker who uttered speech that is based on the evaluation audio data is a speaker who uttered speech that is based on the verification audio data, based on the verification audio data and a feature point extracted from the evaluation audio data; and a verification result output unit that outputs a result of verification performed by the verification unit.

Digital microphone interface circuit for voice recognition and including the same

Disclosed is an electronic device which includes an audio processing block for voice recognition in a low-power mode. The electronic device includes a digital microphone that receives a voice signal from a user and converts the received voice signal into a PDM signal, and a DMIC interface circuit. The DMIC interface circuit includes a PDM-PCM converting block that converts the PDM signal into a PCM signal, a maxscale gain tuning block that tunes a maxscale gain of the PCM signal received from the PDM-PCM converting block based on a distance information indicating a physical distance between the user and the electronic device acquired in advance of the converting of the PDM signal, and an anti-aliasing block that performs filtering for acquiring voice data of a target frequency band associated with a PCM signal output from the maxscale gain tuning block.

Digital microphone interface circuit for voice recognition and including the same

Disclosed is an electronic device which includes an audio processing block for voice recognition in a low-power mode. The electronic device includes a digital microphone that receives a voice signal from a user and converts the received voice signal into a PDM signal, and a DMIC interface circuit. The DMIC interface circuit includes a PDM-PCM converting block that converts the PDM signal into a PCM signal, a maxscale gain tuning block that tunes a maxscale gain of the PCM signal received from the PDM-PCM converting block based on a distance information indicating a physical distance between the user and the electronic device acquired in advance of the converting of the PDM signal, and an anti-aliasing block that performs filtering for acquiring voice data of a target frequency band associated with a PCM signal output from the maxscale gain tuning block.