G10L2015/221

Systems and methods for voice identification and analysis
11580986 · 2023-02-14 ·

Obtaining configuration audio data including voice information for a plurality of meeting participants. Generating localization information indicating a respective location for each meeting participant. Generating a respective voiceprint for each meeting participant. Obtaining meeting audio data. Identifying a first meeting participant and a second meeting participant. Linking a first meeting participant identifier of the first meeting participant with a first segment of the meeting audio data. Linking a second meeting participant identifier of the second meeting participant with a second segment of the meeting audio data. Generating a GUI indicating the respective locations of the first and second meeting participants, and the GUI indicating a first transcription of the first segment and a second transcription of the second segment. The first transcription is associated with the first meeting participant in the GUI, and the second transcription is associated with the second meeting participant in the GUI.

Systems and methods for response selection in multi-party conversations with dynamic topic tracking

Embodiments described herein provide a dynamic topic tracking mechanism that tracks how the conversation topics change from one utterance to another and use the tracking information to rank candidate responses. A pre-trained language model may be used for response selection in the multi-party conversations, which consists of two steps: (1) a topic-based pre-training to embed topic information into the language model with self-supervised learning, and (2) a multi-task learning on the pretrained model by jointly training response selection and dynamic topic prediction and disentanglement tasks.

System and method for context-enriched attentive memory network with global and local encoding for dialogue breakdown detection

A method, an electronic device and computer readable medium for dialogue breakdown detection are provided. The method includes obtaining a verbal input from an audio sensor. The method also includes generating a reply to the verbal input. The method additionally includes identifying a local context from the verbal input and a global context from the verbal input, additional verbal inputs previously received by the audio sensor, and previous replies generated in response to the additional verbal inputs. The method further includes identifying a dialogue breakdown in response to determining that the reply does not correspond to the local context and the global context. In addition, the method includes generating sound corresponding to the reply through a speaker when the dialogue breakdown is not identified.

Dental Device With Speech Recognition

A dental device with a speech recognition module is provided, which is connected to a control device that controls at least part of the functions of the dental device. Based on the recognition result, the speech recognition module triggers a selected function of the dental device via the control device and has at least one microphone. An output module outputs information about the triggered function. The speech recognition module continuously listens via the microphone and has a code word module that activates or leaves active speech recognition for the temporally successive words when a code word is recognized and attempts to recognize them as predetermined control words each assigned to a function.

Natural language processing routing

Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.

SELECTING ALTERNATES IN SPEECH RECOGNITION
20180012592 · 2018-01-11 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting alternates in speech recognition. In some implementations, data is received that indicates multiple speech recognition hypotheses for an utterance. Based on the multiple speech recognition hypotheses, multiple alternates for a particular portion of a transcription of the utterance are identified. For each of the identified alternates, one or more features scores are determined, the features scores are input to a trained classifier, and an output is received from the classifier. A subset of the identified alternates is selected, based on the classifier outputs, to provide for display. Data indicating the selected subset of the alternates is provided for display.

DISTRIBUTED SENSOR DATA PROCESSING USING MULTIPLE CLASSIFIERS ON MULTIPLE DEVICES
20230230597 · 2023-07-20 ·

According to an aspect, a method for distributed sound/image recognition using a wearable device includes receiving, via at least one sensor device, sensor data, and detecting, by a classifier of the wearable device, whether or not the sensor data includes an object of interest. The classifier configured to execute a first machine learning (ML) model. The method includes transmitting, via a wireless connection, the sensor data to a computing device in response to the object of interest being detected within the sensor data, where the sensor data is configured to be used by a second ML model on the computing device or a server computer for further sound/image classification.

METHOD AND APPARATUS FOR CORRECTING VOICE DIALOGUE
20230223015 · 2023-07-13 · ·

Disclosed are method and apparatus for correcting voice dialogue, including: recognizing first text information of a dialogue speech input by a user, including a first semantic keyword determined from a plurality of candidate terms; feeding back a first result with the first semantic keyword to the user based on the first text information; feeding back the plurality of candidate terms to the user in response to the user's selection of the first semantic keyword from the first result; and receiving a second semantic keyword input by the user, correcting the first text information based on the second semantic keyword, determining corrected second text information, and feeding back a second result with the second semantic keyword to the user based on the second text information. The problem of true ambiguity can be solved, while improving the fault tolerance and processing capability of the dialogue apparatus for corresponding errors.

SYSTEMS AND METHODS FOR HANDLING CONTEXTUAL QUERIES
20230214417 · 2023-07-06 ·

Systems and methods for facilitating contextual queries based on media samples automatically captured by a computing device are disclosed herein. A server receives from a computing device over a communication network, a media sample of a media asset automatically captured by the computing device. The server obtains contextual information corresponding to the captured media sample. The server stores the media sample in a memory indexed by the contextual information. The server receives, from the computing device over the communication network, a query that includes a criterion but lacks an identifier of the media asset. The server identifies the media sample in the memory by matching the query criterion to the contextual information. The server generates a reply to the query based on the identifying of the media sample and communicates the reply to the computing device over the communication network.

Systems and Methods for Implementing Smart Assistant Systems

In one embodiment, a system includes an automatic speech recognition (ASR) module, a natural-language understanding (NLU) module, a dialog manager, one or more agents, an arbitrator, a delivery system, one or more processors, and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to receive a user input, process the user input using the ASR module, the NLU module, the dialog manager, one or more of the agents, the arbitrator, and the delivery system, and provide a response to the user input.