IPIQ

G10L15/16

Method and system for speech enhancement

11557306 · 2023-01-17 ·

Harman International Industries, Incorporated

A method and a system for speech enhancement including a time synchronization unit configured to synchronize microphone signals sent from at least two microphones; a source separation unit configured to separate the synchronized microphone signals and output a separated speech signal, which corresponds to a speech source; and a noise reduction unit including a feature extraction unit configured to extract a speech feature of the separated speech signal and a neural network configured to receive the speech feature and output a clean speech feature.

Method and system for speech enhancement

11557306 · 2023-01-17 ·

Harman International Industries, Incorporated

Training Speech Synthesis to Generate Distinct Speech Sounds

20230009613 · 2023-01-12 ·

Google Llc

A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

Training Speech Synthesis to Generate Distinct Speech Sounds

20230009613 · 2023-01-12 ·

Google Llc

Dental Device With Speech Recognition

20230008250 · 2023-01-12 ·

A dental device with a speech recognition module is provided, which is connected to a control device that controls at least part of the functions of the dental device. Based on the recognition result, the speech recognition module triggers a selected function of the dental device via the control device and has at least one microphone. An output module outputs information about the triggered function. The speech recognition module continuously listens via the microphone and has a code word module that activates or leaves active speech recognition for the temporally successive words when a code word is recognized and attempts to recognize them as predetermined control words each assigned to a function.

Dental Device With Speech Recognition

20230008250 · 2023-01-12 ·

Time asynchronous spoken intent detection

11544463 · 2023-01-03 ·

Intel Corporation

An embodiment of a spoken intent detection device includes technology to detect a phrase in an electronic representation of an audio stream based on a pre-defined vocabulary, associate a time stamp with the detected phrase, and classify a spoken intent based on a sequence of detected phrases and the respective associated time stamps. Other embodiments are disclosed and claimed.

Time asynchronous spoken intent detection

11544463 · 2023-01-03 ·

Intel Corporation

Natural language processing routing

11551681 · 2023-01-10 ·

Amazon Technologies, Inc.

Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.

Multimodal sentiment classification

11551042 · 2023-01-10 ·

Snap Inc.

Sentiment classification can be implemented by an entity-level multimodal sentiment classification neural network. The neural network can include left, right, and target entity subnetworks. The neural network can further include an image network that generates representation data that is combined and weighted with data output by the left, right, and target entity subnetworks to output a sentiment classification for an entity included in a network post.

Patent classifications

G10L15/16