IPIQ

G10L15/183

Wakeword detection

11694679 · 2023-07-04 ·

Amazon Technologies, Inc.

Techniques for processing incoming audio using multiple wakeword detectors are described. Audio data representing an utterance may be processed by different wakeword detectors that can detect different wakewords and are associated with different speech processing components. When a wakeword is detected by one of the wakeword detectors, it may be processed by the corresponding speech processing component.

Wakeword detection

11694679 · 2023-07-04 ·

Amazon Technologies, Inc.

Discrete three-dimensional processor

11695001 · 2023-07-04 ·

Guobiao Zhang

A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.

Discrete three-dimensional processor

11695001 · 2023-07-04 ·

Guobiao Zhang

AUTOMATED CONTEXT-SPECIFIC SPEECH-TO-TEXT TRANSCRIPTIONS

20230005472 · 2023-01-05 ·

Disclosed are various approaches for generating a text transcript of a soundtrack. The soundtrack can correspond to an event in a conferencing service. Language models can be trained on data that is specific to organizations, users within the organization, and metadata associated with an agenda for the event. The metadata can include texts, attachments, and other data associated with the event. The language models can be arranged into a convolutional neural network and output a text transcript. The text transcript can be used to retrain the language models for subsequent use.

AUTOMATED CONTEXT-SPECIFIC SPEECH-TO-TEXT TRANSCRIPTIONS

20230005472 · 2023-01-05 ·

RESPONDING TO A USER QUERY BASED ON CAPTURED IMAGES AND AUDIO

20230005471 · 2023-01-05 ·

A method for responding to a user query based on captured images and audio. An audio signal captured by at least one microphone is analyzed to determine at least one word. At least one image captured by at least one image sensor is analyzed to determine at least one identifier of at least one of a person, an object, a location, or an event represented in the image. The at least one word and the at least one identifier are stored in a database. A question is received from the user and is analyzed to determine at least one term. The database is searched to determine a correlation between the at least one term and the at least one word or between the at least one term and the at least one identifier. A response to the question is generated based on the correlation and is provided to the user.

RESPONDING TO A USER QUERY BASED ON CAPTURED IMAGES AND AUDIO

20230005471 · 2023-01-05 ·

Using context information with end-to-end models for speech recognition

11545142 · 2023-01-03 ·

Google Llc

A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.

Using context information with end-to-end models for speech recognition

11545142 · 2023-01-03 ·

Google Llc

Patent classifications

G10L15/183