Patent classifications
G10L2015/223
SECOND TRIGGER PHRASE USE FOR DIGITAL ASSISTANT BASED ON NAME OF PERSON AND/OR TOPIC OF DISCUSSION
In one aspect, a device may include at least one processor and storage accessible to the at least one processor. The storage includes instructions executable by the at least one processor to correlate a first trigger phrase for a digital assistant to a name of a person within a proximity to the device and/or a topic of discussion. Based on the correlation, the instructions are executable to set the digital assistant to decline to monitor for utterance of the first trigger phrase and instead monitor for utterance of a second trigger phrase that is different from the first trigger phrase.
Voice interaction scripts
This disclosure describes systems and methods that identify activities for which scripts can be built to perform an activity when requested by a user. The scripts can be voice-activated by a defined customized voice command and can include delivery preferences. The user's identity can be verified by analyzing voice biometrics of the customized voice command. After performance of the activity, results can be delivered to the device in the format indicated in the script.
Enabling speech interactions on web-based user interfaces
Web content with a speech interaction user interface capability is provided. Interactable elements of the web content are identified. For each of the interactable elements, one or more associated identifiers are determined and associated with a corresponding interactable element of the identified interactable elements in a data structure. A speech input is received from a user. Using the data structure, one of the interactable elements is matched to the received speech input. An action is automatically performed on the matched interactable element.
Electronic apparatus and controlling method thereof
An electronic apparatus includes a memory, a communication interface, and a processor configured to receive, from an external device through the communication interface, information corresponding to a user voice input obtained by the external device, perform a function corresponding to a trigger recognition on the user voice input based on trigger information corresponding to a trigger stored in the memory, and based on the user voice input not including the trigger corresponding to the trigger information based on the trigger recognition, perform a function corresponding to a voice recognition on the user voice input based on the information corresponding to the user voice input obtained by the external device, wherein the information corresponding to the user voice input obtained by the external device includes similarity information between the user voice input obtained by the external device and the trigger information.
Customizing search results in a multi-content source environment
Described herein are various embodiments for customizing search results in a multi-content source environment. An embodiment operates by receiving input corresponding to a search from a user and retrieving a content history indicating which content was previously viewed by the user. It is determined that the content of the content history is organized into one or more preconfigured categories. A new category of content is generated based on the content history for the user. The content of the content history for user is arranged based on both the new category and at least a subset of the one or more preconfigured categories. The arranged content is displayed in a manner customized to the user.
Systems and methods for parsing multiple intents in natural language speech
A system for parsing separate intents in natural language speech configured to (i) receive, from the user computer device, a verbal statement of the user including a plurality of words; (ii) translate the verbal statement into text; (iii) label each of the plurality of words in the verbal statement; (iv) detect one or more potential splits in the verbal statement; (v) divide the verbal statement into a plurality of intents based upon the one or more potential splits; and (vi) generate a response based upon the plurality of intents.
Voice controlled remote thermometer
A wireless or remote thermometer connected with an artificial intelligence (AI) system. The thermometer may be in communication with a voice-activated AI system to implement operation thereof, such as a cloud-based AI system implemented on a smart audio interface. User-accessible controls for the thermometer may be activated using the voice-activated AI system on the audio interface. The thermometer may include a wireless transceiver for communication with a user. The thermometer collects temperature measurement data to remotely monitor the temperature of food or other materials. The thermometer connects and communicates wirelessly with a receiver unit, such as user smartphone, tablet, or other computerized device. The thermometer unit sends data, alerts or notifications to the delegated receiver, smart device, and/or audio interface, to the user. Communication between the thermometer and receiver unit may be through one or more communication pathways, which may be selected to provide delivery to the user device.
Personal Voice-Based Information Retrieval System
The present invention relates to a system for retrieving information from a network such as the Internet. A user creates a user-defined record in a database that identifies an information source, such as a web site, containing information of interest to the user. This record identifies the location of the information source and also contains a recognition grammar based upon a speech command assigned by the user. Upon receiving the speech command from the user that is described within the recognition grammar, a network interface system accesses the information source and retrieves the information requested by the user.
SYSTEM AND METHOD FOR CONTINUOUS MULTIMODAL SPEECH AND GESTURE INTERACTION
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.
DEVICE INCLUDING SPEECH RECOGNITION FUNCTION AND METHOD OF RECOGNIZING SPEECH
A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker.