IPIQ

G10L13/027

SYSTEMS AND METHODS FOR PROVIDING AUDIBLE FLIGHT INFORMATION

20230046264 · 2023-02-16 ·

Dongfang ZHANG

Disclosed are methods and systems for providing audible flight information to an operator of an aircraft. A method, for example, may include receiving flight information detected by one or more sensors positioned on the aircraft, causing an image to be displayed on a display device, the image including a plurality of text items corresponding to the flight information, receiving a first operator selection indicative of one or more of the text items, parsing the one or more text items to generate a set of intermediate data, synthesizing audio data based on the intermediate data, and causing audible content corresponding to the audio data to be emitted by one or more audio emitting devices, wherein the audible content includes speech corresponding to the flight information.

Synthetic speech processing

11580955 · 2023-02-14 ·

Amazon Technologies, Inc.

A speech-processing system receives input data representing text. A first encoder processes segments of the text to determine embedding data representing the text, and a second encoder processes corresponding audio data to determine prosodic data corresponding to the text. The embedding and prosodic data is processed to create output data including a representation of speech corresponding to the text and prosody.

Synthetic speech processing

11580955 · 2023-02-14 ·

Amazon Technologies, Inc.

Systems and methods of handling speech audio stream interruptions

11580954 · 2023-02-14 ·

Qualcomm Incorporated

A device for communication includes one or more processors configured to receive, during an online meeting, a speech audio stream representing speech of a first user. The one or more processors are also configured to receive a text stream representing the speech of the first user. The one or more processors are further configured to selectively generate an output based on the text stream in response to an interruption in the speech audio stream.

Systems and methods of handling speech audio stream interruptions

11580954 · 2023-02-14 ·

Qualcomm Incorporated

Systems and methods for response selection in multi-party conversations with dynamic topic tracking

11580975 · 2023-02-14 ·

Salesforce.Com, Inc.

Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

Method and apparatus for generating speech

11580963 · 2023-02-14 ·

Samsung Electronics Co., Ltd.

A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.

Method and apparatus for generating speech

11580963 · 2023-02-14 ·

Samsung Electronics Co., Ltd.

Wearable speech input-based vision to audio interpreter

11551688 · 2023-01-10 ·

Snap Inc.

Stephen Pomes

An eyewear device with camera-based compensation that improves the user experience for user's having partial blindness or complete blindness. The camera-based compensation determines features, such as objects, and then converts the determined objects to audio that is indicative of the objects and that is perceptible to the eyewear user. The camera-based compensation may use a region-based convolutional neural network (RCNN) to generate a feature map including text that is indicative of objects in images captured by a camera. The feature map is then processed through a speech to audio algorithm featuring a natural language processor to generate audio indicative of the objects in the processed images.

Automatic Voiceover Generation

20230040015 · 2023-02-09 ·

Google Llc

A method includes receiving a voice request to generate synthesized voiceover speech for a target advertisement having one or more advertising campaign attributes. The method also includes generating, based on the one or more advertising campaign attributes, a voiceover script that includes a sequence of text for the synthesized voiceover speech. The method also includes generating, using a text-to-speech (TTS) system, the synthesized voiceover speech. The TTS system is configured to receive, as input, the sequence of text for the voiceover script and generate, as output, the synthesized voiceover speech. Here, the synthesized voiceover speech has speech characteristics specified by a target TTS vertical. The method also includes overlaying the synthesized voiceover speech on the target advertisement.

Patent classifications

G10L13/027