Patent classifications
G10L15/1822
Systems and methods to automatically perform actions based on media content
Systems and methods are provided for automatically responding to network connectivity issues in a media stream. One example method includes transmitting, from a first computing device, a media stream to one or more secondary computing devices. A network connectivity issue between the first computing device and one or more of the secondary computing devices is detected. If a network connectivity issue is detected, a notification is transmitted to one or more of the secondary computing devices.
Enabling speech interactions on web-based user interfaces
Web content with a speech interaction user interface capability is provided. Interactable elements of the web content are identified. For each of the interactable elements, one or more associated identifiers are determined and associated with a corresponding interactable element of the identified interactable elements in a data structure. A speech input is received from a user. Using the data structure, one of the interactable elements is matched to the received speech input. An action is automatically performed on the matched interactable element.
Determining topics and action items from conversations
Embodiments are directed to organizing conversation information. Two or more machine learning (ML) models and a plurality of sentences provided from a conversation may be employed to generate insight scores for each sentence such that each insight score correlates to a probability that its sentence includes one or more of an action or a question. In response to one or more sentences having insight scores that exceed a threshold value an information score and a definiteness score may be determined for the one or more sentences. And one or more insights associated with the conversation may be generated based on the one or more sentences. A report may be generated that associates the one or more insights with one or more portions of the conversation that include the one or more sentences that are associated with the insights.
Systems and methods for parsing multiple intents in natural language speech
A system for parsing separate intents in natural language speech configured to (i) receive, from the user computer device, a verbal statement of the user including a plurality of words; (ii) translate the verbal statement into text; (iii) label each of the plurality of words in the verbal statement; (iv) detect one or more potential splits in the verbal statement; (v) divide the verbal statement into a plurality of intents based upon the one or more potential splits; and (vi) generate a response based upon the plurality of intents.
STATE MACHINE BASED CONTEXT-SENSITIVE SYSTEM FOR MANAGING MULTI-ROUND DIALOG
The present invention discloses a state machine based context-sensitive multi-round dialog management system, comprising: an input module, for receiving multi-modal input information from a user; an intention identification engine module, for identifying intention information in the multi-modal input information; an intention module, for bringing multiple intention information identified by the intention identification engine module into one-to-one correspondence with multiple intention sub-modules at back ends; a state machine module, comprising a plurality of state machines for managing a relevant context in the dialog management system and providing support for an output result; an instruction parsing engine module, comprising a plurality of instruction parsing engine sub-modules for parsing corresponding intention information and acquiring the parsed multiple intention information; and an output module, for acquiring policy information according to the results from the parsing engine module and the intention identification module, and transmitting the policy information to the state machine module.
REDUCING THE NEED FOR MANUAL START/END-POINTING AND TRIGGER PHRASES
Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a virtual assistant based on contextual information. The determination can be made using a rule-based system or a probabilistic system. If it is determined that the spoken user input was intended for the virtual assistant, the spoken user input can be processed and an appropriate response can be generated. If it is instead determined that the spoken user input was not intended for the virtual assistant, the spoken user input can be ignored and/or no response can be generated.
CONCEPT-BASED SEARCH AND CATEGORIZATION
A system and method for concept-based search and categorization that uses a lexical database to take a search term and from this to build a set of concepts and related terms and then searches stemmed or lemmatized text from a call transcription, email or chat message to perform categorization based on these concepts.
Locally distributed keyword detection
In one aspect, a playback device includes at least one microphone configured to detect a voice input and generate sound input data. The playback device detects a first command keyword in the detected sound and, in response, makes a first determination, via a first local natural language unit (NLU), whether the input sound data includes at least one keyword within a first predetermined library of keywords. The playback device receives an indication of a second determination made by a second NLU that the input sound data includes at least one keyword from a second predetermined library of keywords. The playback device compares the results of the first determination and the second determination and, based on the comparison, foregoes further processing of the input sound data.
Electronic device and method for providing conversational service
A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.
Method and apparatus for speech analysis
Disclosed are method and apparatus for speech analysis. The speech analysis apparatus and a server are capable of communicating with each other in a 5G communication environment by executing mounted artificial intelligence (AI) algorithms and/or machine learning algorithms. The speech analysis method and apparatus may collect and analyze speech data to build a database of structured speech data.