G10L2015/225

Server-provided visual output at a voice interface device
11521469 · 2022-12-06 · ·

A method at an electronic device with an array of indicator lights includes: obtaining first visual output instructions stored at the electronic device, where the first visual output instructions control operation of the array of indicator lights based on operating state of the electronic device; receiving a voice input; obtaining from a remote system a response to the voice input and second visual output instructions, where the second visual output instructions are provided by the remote system along with the response in accordance with a determination that the voice input satisfies one or more criteria; executing the response; and displaying visual output on the array of indicator lights in accordance with the second visual output instructions, where otherwise in absence of the second visual output instructions the electronic device displays visual output on the array of indicator lights in accordance with the first visual output instructions.

Hybrid learning system for natural language understanding

An agent automation system includes a memory configured to store a natural language understanding (NLU) framework and a processor configured to execute instructions of the NLU framework to cause the agent automation system to perform actions. These actions comprise: generating an annotated utterance tree of an utterance using a combination of rules-based and machine-learning (ML)-based components, wherein a structure of the annotated utterance tree represents a syntactic structure of the utterance, and wherein nodes of the annotated utterance tree include word vectors that represent semantic meanings of words of the utterance; and using the annotated utterance tree as a basis for intent/entity extraction of the utterance.

CLIENT DEVICE BASED DIGITAL ASSISTANT REQUEST DISAMBIGUATION
20220383872 · 2022-12-01 ·

An example process includes at an electronic device: receiving a natural language input including a user request corresponding to an ambiguous entity; transmitting a representation of the user request to an external electronic device; receiving, from the external electronic device, disambiguation data, the disambiguation data including: a plurality of disambiguation options for the ambiguous entity; and respective semantic information for each of the plurality of disambiguation options; determining, based on comparing the disambiguation data to user information stored on the electronic device, whether to request user input; in accordance with a determination to not request user input: initiating a task based on a selected disambiguation option of the plurality of disambiguation options; and in accordance with a determination to request user input: providing an output indicative of a request for user input to disambiguate the ambiguous entity.

EXPLAINING ANOMALOUS PHONETIC TRANSLATIONS

A method includes: receiving, by a computing device, a digital voice stream; receiving, by the computing device, converted text that represents the digital voice stream; identifying, by the computing device, an erroneously converted portion of the converted text; selecting, by the computing device, the erroneously converted portion for explainability processing; parsing, by the computing device, the erroneously converted portion into parts based on a predetermined parsing level; collecting, by the computing device, supplementary input data related to the erroneously converted portion; and determining, by the computing device and based on the supplemental input data, a reason why the erroneously converted portion was erroneously converted.

IDENTIFICATION OF ANOMALIES IN AIR TRAFFIC CONTROL COMMUNICATIONS

A processor may identify an anomaly in one or more communications. A processor may monitor the one or more communications for an utterance. A processor may perform natural language processing (NLP) on the utterance. A processor may generate an understanding of the utterance using natural language understanding (NLU). A processor may detect the anomaly from the understanding of the utterance. A processor may execute a response, responsive to detecting the anomaly.

Activation of remote devices in a networked system

The present disclosure is generally directed to the generation of voice-activated data flows in interconnected network. The voice-activated data flows can include input audio signals that include a request and are detected at a client device. The client device can transmit the input audio signal to a data processing system, where the input audio signal can be parsed and passed to the data processing system of a service provider to fulfill the request in the input audio signal. The present solution is configured to conserve network resources by reducing the number of network transmissions needed to fulfill a request.

Global-to-local memory pointer networks for task-oriented dialogue

A system and corresponding method are provided for generating responses for a dialogue between a user and a computer. The system includes a memory storing information for a dialogue history and a knowledge base. An encoder may receive a new utterance from the user and generate a global memory pointer used for filtering the knowledge base information in the memory. A decoder may generate at least one local memory pointer and a sketch response for the new utterance. The sketch response includes at least one sketch tag to be replaced by knowledge base information from the memory. The system generates the dialogue computer response using the local memory pointer to select a word from the filtered knowledge base information to replace the at least one sketch tag in the sketch response.

WEARABLE SYSTEMS AND METHODS FOR LOCATING AN OBJECT
20220374069 · 2022-11-24 ·

Systems and methods are disclosed for locating an object for a user. A system may comprise an image capture device, an audio capture device, and a processor. The processor may be configured to receive images captured by the image capture device and audio signals received by the audio capture device. The processor may analyze the audio signals to identify a descriptor word describing the object and retrieve a visual characteristic of the object based on the descriptor word. The processor may then determine a location of the object in the images based on the visual characteristic, determine a location of a hand of the user in the images, and determine a direction between the hand and the object. The processor may then determine feedback indicative of the direction and provide the feedback to the user.

INTELLIGENT VOICE RECOGNITION METHOD AND APPARATUS
20220375469 · 2022-11-24 · ·

An intelligent voice recognition method and apparatus are disclosed. An intelligent voice recognition apparatus according to one embodiment of the present invention recognizes speech of the user and outputs a response determined on the basis of the speech, wherein, when a plurality of candidate responses related to the speech exist, the response is determined from among the plurality of candidate responses on the basis of device state information about the voice recognition apparatus, and thus ambiguity in a conversation between a user and the voice recognition apparatus can be reduced so that more natural conversation processing is possible. The intelligent voice recognition apparatus and/or an artificial intelligence (AI) apparatus of the present invention can be associated with an AI module, a drone (an unmanned aerial vehicle (UAV)), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.

SYSTEM METHOD AND APPARATUS FOR COMBINING WORDS AND BEHAVIORS
20220375468 · 2022-11-24 · ·

A system and method for integrating audio data collected, such as audio data and analytical data, to perform behavioral analysis on the audio data, using an application of acoustic signal processing and machine learning algorithms, by converting the audio data to text data and performing behavioral analysis on the text data. The behavioral analysis data from the audio application of acoustic signal processing is combined with machine learning algorithms and speech to text data to provide a call agent with feedback to assist in the next best action or insight into customer behaviors.