G10L2015/223

Robot and method for recognizing wake-up word thereof
11577379 · 2023-02-14 · ·

Provided is a robot including a microphone configured to acquire a sound signal corresponding to a sound generated near the robot, a camera, an output interface including at least one of a display configured to output a wake-up screen or a speaker configured to output a wake-up sound when the robot wakes up, and a processor configured to recognize whether the acquired sound includes a voice of a person, activate the camera when the sound includes a voice of a person, recognize whether a person is present in an image acquired by the activated camera, set a wake-up word recognition sensitivity based on a recognition result as to whether a person is present, and recognize whether a wake-up word is included voice data of a user acquired through the microphone based on the set wake-up word recognition sensitivity.

Machine learning for interpretation of subvocalizations

Provided is an in-ear device and associated computational support system that leverages machine learning to interpret sensor data descriptive of one or more in-ear phenomena during subvocalization by the user. An electronic device can receive sensor data generated by at least one sensor at least partially positioned within an ear of a user, wherein the sensor data was generated by the at least one sensor concurrently with the user subvocalizing a subvocalized utterance. The electronic device can then process the sensor data with a machine-learned subvocalization interpretation model to generate an interpretation of the subvocalized utterance as an output of the machine-learned subvocalization interpretation model.

Robot teaching device
11580972 · 2023-02-14 · ·

A robot teaching device includes: a display device; an operation key formed of a hard key or a soft key and including an input changeover switch; a microphone; a voice recognition section; a correspondence storage section storing each of a plurality of types of commands and a recognition target word in association with each other; a recognition target word determination section configured to determine whether a phrase represented by character information includes the recognition target word; and a command execution signal output section configured to switch, in response to the input changeover switch being operated, between a first operation in which a signal for executing the command corresponding to an operation to the operation key is outputted and a second operation in which a signal for executing the command associated with the recognition target word represented by the character information is outputted.

Method for exiting a voice skill, apparatus, device and storage medium

A method for exiting a voice skill, an apparatus, a device, and a storage medium are provided by embodiments of the present disclosure, wherein a user voice instruction is received; a target exit intention corresponding to the user voice instruction is identified according to the user voice instruction and a grammar rule of a preset exit intention; and a corresponding operation is executed on a current voice skill of a device according to the target exit intention. The embodiments of the present disclosure refine and expand the user's exit intention. After the target exit intention to which the user voice instruction belongs is identified, the corresponding operation is executed according to the target exit intention so as to meet the users' different exit requirements for the voice skills, enhance the fluency and convenience of user interaction with the device and improve the user's exit experience when using the voice skills.

Generating input alternatives

Exemplary embodiments relate to a system for recovering a conversation between a user and the system when the system is unable to properly respond to a user's input. The system may process the user input and determine an error condition exists. The system may query one or more storage systems to identify candidate text data based on their semantic similarity to the user input. The storage systems may store data related to past frequently entered inputs and/or user-generated inputs. Alternative text data is selected from the candidate text data, and presented to the user for confirmation.

Configurable conversation engine for executing customizable chatbots

A conversation engine performs conversations with users using chatbots customized for performing a set of tasks that can be performed using an online system. The conversation engine loads a chatbot configuration that specifies the behavior of a chatbot including the tasks that can be performed by the chatbot, the types of entities relevant to each task, and so on. The conversation may be voice based and use natural language. The conversation engine may load different chatbot configurations to implement different chatbots. The conversation engine receives a conversation engine configuration that specifies the behavior of the conversation engine across chatbots. The system may be a multi-tenant system that allows customization of the chatbots for each tenant.

Method and device for recognizing speech in vehicle

The present disclosure relates to a method and a device for recognizing speech in a vehicle. The method for recognizing the speech in the vehicle may include collecting one or more types of information, determining information to be linked with each other for speech recognition based on an information processing priority predefined corresponding to each type of the collected information, analyzing the determined information to perform the speech recognition for a signal input through a microphone, and extracting at least one of a wake up voice or a command voice through the speech recognition to control the vehicle. Therefore, the present disclosure has an advantage of more accurately performing the speech recognition by linking collected various information in the vehicle with each other.

Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network
11580979 · 2023-02-14 · ·

In some embodiments, methods and systems for pushing audiovisual playlists based on a text-attentional convolutional neural network include a local voice interactive terminal, a dialog system server and a playlist recommendation engine, where the dialog system server and the playlist recommendation engine are respectively connected to the local voice interactive terminal. In some embodiments, the local voice interactive terminal includes a microphone array, a host computer connected to the microphone array, and a voice synthesis chip board connected to the microphone array. In some embodiments, the playlist recommendation engine obtains rating data based on a rating predictor constructed by the neural network; the host computer parses the data into recommended playlist information; and the voice terminal synthesizes the results and pushes them to a user in the form of voice.

Multi-user devices in a connected home environment
11580973 · 2023-02-14 · ·

A device implementing a system for providing content in response to a request includes a processor configured to receive a voice request for content associated with a home environment, the voice request corresponding to a user account, and determine, based on the voice request, not to provide the content via the device. The processor is further configured to select, in response to the determining, a second device from among multiple devices associated with the home environment, wherein the selecting is based at least in part on configuration settings associated with the home environment, and provide for the second device to output the content based on a profile of the user account.

SELECTIVELY ACTIVATING ON-DEVICE SPEECH RECOGNITION, AND USING RECOGNIZED TEXT IN SELECTIVELY ACTIVATING ON-DEVICE NLU AND/OR ON-DEVICE FULFILLMENT

Implementations can reduce the time required to obtain responses from an automated assistant by, for example, obviating the need to provide an explicit invocation to the automated assistant, such as by saying a hot-word/phrase or performing a specific user input, prior to speaking a command or query. In addition, the automated assistant can optionally receive, understand, and/or respond to the command or query without communicating with a server, thereby further reducing the time in which a response can be provided. Implementations only selectively initiate on-device speech recognition responsive to determining one or more condition(s) are satisfied. Further, in some implementations, on-device NLU, on-device fulfillment, and/or resulting execution occur only responsive to determining, based on recognized text form the on-device speech recognition, that such further processing should occur. Thus, through selective activation of on-device speech processing, and/or selective activation of on-device NLU and/or on-device fulfillment, various client device resources are conserved.