G10L13/02

System answering of user inputs
11556575 · 2023-01-17 · ·

Techniques for structuring knowledge bases specific to a user or group of users and techniques for using the knowledge bases to answer user inputs are described. A knowledge base may be populated with information provided by users associated with the knowledge base. Users associated with a knowledge base may be proactive in providing content to the knowledge base and/or a system may solicit an answer to a user input from users associated with a particular knowledge base. When the system receives an answer, the system may populate the knowledge base with the answer and may output the answer to the user that originated the user input. The system may output user inputs to be answered using messages or by establishing two-way communication sessions.

WEARABLE SYSTEMS AND METHODS FOR SELECTIVELY READING TEXT
20230012272 · 2023-01-12 ·

Systems and methods are disclosed for selectively reading text. A system may comprise an image capture device, an audio capture device, and a processor. The processor may be configured to receive images captured by the image capture device and audio signals captured by the audio capture device. The processor may analyze the image to identify text represented in the image; identify, based on the image, a structural element of the text; identify a request to read a first portion of the text associated with the structural element, the request being identified by at least one of analyzing the audio signals to detect a spoken request or detecting a gesture in the plurality of images; and present the first portion of text to the user of the wearable device.

WEARABLE SYSTEMS AND METHODS FOR SELECTIVELY READING TEXT
20230012272 · 2023-01-12 ·

Systems and methods are disclosed for selectively reading text. A system may comprise an image capture device, an audio capture device, and a processor. The processor may be configured to receive images captured by the image capture device and audio signals captured by the audio capture device. The processor may analyze the image to identify text represented in the image; identify, based on the image, a structural element of the text; identify a request to read a first portion of the text associated with the structural element, the request being identified by at least one of analyzing the audio signals to detect a spoken request or detecting a gesture in the plurality of images; and present the first portion of text to the user of the wearable device.

Systems and methods for screenless computerized social-media access
11551680 · 2023-01-10 · ·

Systems and methods for screenless computerized social-media access may include (1) producing, via an audio speaker that is communicatively coupled to a computing device, a computer-generated verbal description of a social-media post provided via a social-media application, (2) detecting, via a microphone that is communicatively coupled to the computing device, an audible response to the social-media post from a user of the computing device, and (3) digitally responding to the social-media post in accordance with the detected audible response. Various other methods, systems, and computer-readable media are also disclosed.

Systems and methods for screenless computerized social-media access
11551680 · 2023-01-10 · ·

Systems and methods for screenless computerized social-media access may include (1) producing, via an audio speaker that is communicatively coupled to a computing device, a computer-generated verbal description of a social-media post provided via a social-media application, (2) detecting, via a microphone that is communicatively coupled to the computing device, an audible response to the social-media post from a user of the computing device, and (3) digitally responding to the social-media post in accordance with the detected audible response. Various other methods, systems, and computer-readable media are also disclosed.

Automatic Voiceover Generation

A method includes receiving a voice request to generate synthesized voiceover speech for a target advertisement having one or more advertising campaign attributes. The method also includes generating, based on the one or more advertising campaign attributes, a voiceover script that includes a sequence of text for the synthesized voiceover speech. The method also includes generating, using a text-to-speech (TTS) system, the synthesized voiceover speech. The TTS system is configured to receive, as input, the sequence of text for the voiceover script and generate, as output, the synthesized voiceover speech. Here, the synthesized voiceover speech has speech characteristics specified by a target TTS vertical. The method also includes overlaying the synthesized voiceover speech on the target advertisement.

Spoken language understanding models

Techniques for using a federated learning framework to update machine learning models for spoken language understanding (SLU) system are described. The system determines which labeled data is needed to update the models based on the models generating an undesired response to an input. The system identifies users to solicit labeled data from, and sends a request to a user device to speak an input. The device generates labeled data using the spoken input, and updates the on-device models using the spoken input and the labeled data. The updated model data is provided to the system to enable the system to update the system-level (global) models.

Spoken language understanding models

Techniques for using a federated learning framework to update machine learning models for spoken language understanding (SLU) system are described. The system determines which labeled data is needed to update the models based on the models generating an undesired response to an input. The system identifies users to solicit labeled data from, and sends a request to a user device to speak an input. The device generates labeled data using the spoken input, and updates the on-device models using the spoken input and the labeled data. The updated model data is provided to the system to enable the system to update the system-level (global) models.

Synthetic speech processing

A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.

Outside ordering system
11594223 · 2023-02-28 · ·

An ordering system can be positioned partially, or completely, outside in a retail environment with an ordering device located outside of a building on a site. The ordering device receiving a first audio stream concurrently with a second audio stream from an employee and proceeds to capture the first audio stream with a first port of an on-site computing device while capturing the second audio stream with a second port of the on-site computing device. A customer strategy can be executed with an intelligence module of the on-site computing device connected to the ordering device with the on-site customer strategy directing automated interactions with a first on-site customer to compile a retail order. The employee may communicate directly with the intelligence module via the second port without interrupting the first audio stream.