G10L2015/225

SYSTEMS AND METHOD FOR VISUAL-AUDIO PROCESSING FOR REAL-TIME FEEDBACK
20230080660 · 2023-03-16 ·

Embodiments of the present disclosure provide for using an ensemble of trained machine learning algorithms to perform facial detection, audio analysis, and keyword modeling for video meetings/calls between two more user. The ensemble of trained machine learning models can process the video to divide the video into video, audio, and text components, which can be provided as inputs to the machine learning models. The outputs of the trained machine learning models can be used to generate responsive feedback that is relevant to topic of the meeting/call and/or to the engagement and emotional state of the user(s).

VEHICLE AND CONTROL METHOD THEREOF
20230081386 · 2023-03-16 ·

Provided is a vehicle, comprising: a voice receiver configured to acquire a user voice input; a speaker provided inside the vehicle; a display provided inside the vehicle; and a controller configured to control the voice receiver, the speaker and the display, wherein the controller is configured to: acquire a control command from the user voice input, wait for receiving a subsequent control command for a predetermined waiting time after operating a control target based on the control command, and control at least one of the speaker or the display to provide a feedback for notifying that the control target is in a subsequent controllable state.

DETERMINING MULTILINGUAL CONTENT IN RESPONSES TO A QUERY
20230084294 · 2023-03-16 ·

Implementations relate to determining multilingual content to render at an interface in response to a user submitted query. Those implementations further relate to determining a first language response and a second language response to a query that is submitted to an automated assistant. Some of those implementations relate to determining multilingual content that includes a response to the query in both the first and second languages. Other implementations relate to determining multilingual content that includes a query suggestion in the first language and a query suggestion in a second language. Some of those implementations relate to pre-fetching results for the query suggestions prior to rendering the multilingual content.

Information processing apparatus and non-transitory computer readable medium storing program
11606629 · 2023-03-14 · ·

An information processing apparatus includes an acquisition unit that acquires voice data and image data, respectively, a display control unit that performs control to display the image data acquired by the acquisition unit in synchronization with the voice data, a reception unit that receives a display element to be added for display to a specific character in the image data displayed by the display control unit, and a setting unit that sets a playback period in which the specific character in the voice data is played back, as a display period of the display element received by the reception unit in the image data.

BACKGROUND AUDIO IDENTIFICATION FOR SPEECH DISAMBIGUATION
20230125170 · 2023-04-27 · ·

Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

SYSTEM AND METHOD FOR PROVIDING A VIRTUAL SPEECH AGENT FOR SIMULATED CONVERSATIONS AND CONVERSATIONAL FEEDBACK
20220335940 · 2022-10-20 ·

A system for improving conversational skills using a virtual speech agent is disclosed, including a virtual speech agent to execute a phone call between the virtual agent and a user. The virtual speech agent and user engage in a back-and-forth conversation, wherein the virtual speech agent generates a summary and a feedback report in view of the conversation.

SYSTEMS AND METHODS TO TRANSLATE A SPOKEN COMMAND TO A SELECTION SEQUENCE

Systems and methods to translate a spoken command to a selection sequence are disclosed. Exemplary implementations may: obtain audio information representing sounds captured by a client computing platform; analyze the sounds to determine spoken terms; determine whether the spoken terms include one or more of the terms that are correlated with the commands; responsive to determining that the spoken terms are terms that are correlated with a particular command stored in the electronic storage, perform a set of operations that correspond to the particular command; responsive to determine that the spoken terms are not the terms correlated with the commands stored in the electronic storage, determining a selection sequence that causes a result subsequent to the analysis of the sounds; correlate the spoken terms with the selection sequence; store the correlation of the spoken terms with the selection sequence; and perform the selection sequence to cause the result.

Graph based prediction for next action in conversation flow

One embodiment provides a method for predicting a next action in a conversation system that includes obtaining, by a processor, information from conversation logs and a conversation design. The processor further creates a dialog graph based on the conversation design. Weights and attributes for edges in the dialog graph are determined based on the information from the conversation logs and adding user input and external context information to an edge attributes set. An unrecognized user input is analyzed and a next action is predicted based on dialog nodes in the dialog graph and historical paths. A guiding conversation response is generated based on the predicted next action.

Preventing access to potentially hazardous environments

A method, computer system, and a computer program product for managing a plurality of electronic devices controlling access to one or more hazards in a physical environment. The present invention may include detecting a plurality of brain-wave patterns associated with a user. The present invention may then, in response to detecting motion of the user and the plurality of brain-wave patterns associated with the user matching a pattern, include operating at least one electronic device to disable access to a corresponding hazard. The present invention may further include operating the at least one electronic device to enable access to the corresponding hazard based on receiving input from the user to enable the at least one electronic device, wherein the input being received from the user is in response to prompting the user to correctly respond to a question previously answered by the user.

Generating automated assistant responses and/or actions directly from dialog history and resources

Training and/or utilizing a single neural network model to generate, at each of a plurality of assistant turns of a dialog session between a user and an automated assistant, a corresponding automated assistant natural language response and/or a corresponding automated assistant action. For example, at a given assistant turn of a dialog session, both a corresponding natural language response and a corresponding action can be generated jointly and based directly on output generated using the single neural network model. The corresponding response and/or corresponding action can be generated based on processing, using the neural network model, dialog history and a plurality of discrete resources. For example, the neural network model can be used to generate a response and/or action on a token-by-token basis.