G10L2015/221

Systems and methods to translate a spoken command to a selection sequence

Systems and methods to translate a spoken command to a selection sequence are disclosed. Exemplary implementations may: obtain audio information representing sounds captured by a client computing platform; analyze the sounds to determine spoken terms; determine whether the spoken terms include one or more of the terms that are correlated with the commands; responsive to determining that the spoken terms are terms that are correlated with a particular command stored in the electronic storage, perform a set of operations that correspond to the particular command; responsive to determine that the spoken terms are not the terms correlated with the commands stored in the electronic storage, determining a selection sequence that causes a result subsequent to the analysis of the sounds; correlate the spoken terms with the selection sequence; store the correlation of the spoken terms with the selection sequence; and perform the selection sequence to cause the result.

Image analysis for results of textual image queries

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for analyzing images for generating query responses. One of the methods includes determining, using a textual query, an image category for images responsive to the textual query, and an output type that identifies a type of requested content; selecting, using data that associates a plurality of images with a corresponding category, a subset of the images that each belong to the image category, each image in the plurality of images belonging to one of the two or more categories; analyzing, using the textual query, data for the images in the subset of the images to determine images responsive to the textual query; determining a response to the textual query using the images responsive to the textual query; and providing, using the output type, the response to the textual query for presentation.

ARRANGING AND/OR CLEARING SPEECH-TO-TEXT CONTENT WITHOUT A USER PROVIDING EXPRESS INSTRUCTIONS

Implementations described herein relate to an application and/or automated assistant that can identify arrangement operations to perform for arranging text during speech-to-text operations—without a user having to expressly identify the arrangement operations. In some instances, a user that is dictating a document (e.g., an email, a text message, etc.) can provide a spoken utterance to an application in order to incorporate textual content. However, in some of these instances, certain corresponding arrangements are needed for the textual content in the document. The textual content that is derived from the spoken utterance can be arranged by the application based on an intent, vocalization features, and/or contextual features associated with the spoken utterance and/or a type of the application associated with the document, without the user expressly identifying the corresponding arrangements. In this way, the application can infer content arrangement operations from a spoken utterance that only specifies the textual content.

Cooking apparatus and cooking system

Disclosed herein is a cooking system: a cooking apparatus configured to input and output a speech, transmit speech data corresponding to the speech, and cook food in a cooking chamber; a first server configured to perform communication with the cooking apparatus, when speech data is received from the cooking apparatus, perform speech recognition based on the received speech data, transmit response information to the speech recognition to the cooking apparatus, obtain a menu requested by the user based on the received speech data, and transmit a cooking time and a cooking temperature for the obtained menu to the cooking apparatus; and a second server configured to store information about at least one recipe for a plurality of menus, perform communication with the first server, and transmit the information about at least one recipe to the first server.

DISPLAY APPARATUS, VOICE ACQUIRING APPARATUS AND VOICE RECOGNITION METHOD THEREOF

Disclosed are a display apparatus, a voice acquiring apparatus and a voice recognition method thereof, the display apparatus including: a display unit which displays an image; a communication unit which communicates with a plurality of external apparatuses; and a controller which includes a voice recognition engine to recognize a user's voice, receives a voice signal from a voice acquiring unit, and controls the communication unit to receive candidate instruction words from at least one of the plurality of external apparatuses to recognize the received voice signal.

ROBUST USEFUL AND GENERAL TASK-ORIENTED VIRTUAL ASSISTANTS

The disclosure deals with a system and method for improved general task-oriented virtual assistants (VAs). The presently disclosed framework incorporates discovery of knowledge from online sources to accomplish tasks (open world), user-specific knowledge for personalization, and domain-specific knowledge for context adaptation to recommend and assist the users over procedural tasks such as cooking and Do-it-Yourself (DIY) tasks. The approach also focuses on content curation for fault-tolerant execution to ensure the end goal is reached despite common failures.

Display apparatus and method for registration of user command

An apparatus including a user input receiver; a user voice input receiver; a display; and a processor. The processor is configured to: (a) based on a user input being received through the user input receiver, perform a function corresponding to voice input state for receiving a user voice input; (b) receive a user voice input through the user voice input receiver; (c) identify whether or not a text corresponding to the received user voice input is related to a pre-registered voice command or a prohibited expression; and (d) based on the text being related to the pre-registered voice command or the prohibited expression, control the display to display an indicator that the text is related to the pre-registered voice command or the prohibited expression. A method and non-transitory computer-readable medium are also provided.

Contact resolution for communications systems

Methods and systems for performing contact resolution are described herein. When initiating a communications session using a voice activated electronic device, a contact name may be resolved to determine an appropriate contact with which the communications session may be directed to. Contacts from an individual's contact list may be queried to determine a listing of probable contacts associated with the contact name, and contact identifiers associated with the contact may be determined. Using one or more rules for disambiguating between similar contact names, a single contact may be identified, and a communications session with that contact may be initiated.

SERVER, SPECIFYING SYSTEM, SPECIFYING METHOD, AND STORAGE MEDIUM STORING SPECIFYING PROGRAM
20230095266 · 2023-03-30 · ·

In order to specify, through a simple operation, members of an organization to which a user belongs, this server comprises: an extraction means which extracts an instruction on the basis of a voice input by the user belonging to the organization; a processing means which processes the instruction on the basis of the attributes of the user in the organization; and a specifying means which specifies at least one among a plurality of the members belonging to the organization on the basis of the processed instruction.

DISPLAY APPARATUS AND METHOD FOR REGISTRATION OF USER COMMAND

A display apparatus includes an input unit configured to receive a user command; an output unit configured to output a registration suitability determination result for the user command; and a processor configured to generate phonetic symbols for the user command, analyze the generated phonetic symbols to determine registration suitability for the user command, and control the output unit to output the registration suitability determination result for the user command. Therefore, the display apparatus may register a user command which is resistant to misrecognition and guarantees high recognition rate among user commands defined by a user.