G10L15/083

Method and apparatus for operating smart terminal

Embodiments of the present disclosure relate to a method and apparatus for operating a smart terminal. The method for operating a smart terminal includes: parsing a currently inputted user voice command to a current text command; querying a historical text command record matching the current text command from a historical text command record set corresponding to a historical user voice command set; parsing the queried historical text command record to obtain a user's intention; sending a user request generated based on the user's intention to a resource database; and commanding an action component to take an action based on a response returned from the resource database and the user's intention.

Determining whether to automatically resume first automated assistant session upon cessation of interrupting second session

Determining whether, upon cessation of a second automated assistant session that interrupted and supplanted a prior first automated assistant session: (1) to automatically resume the prior first automated assistant session, or (2) to transition to an alternative automated assistant state in which the prior first session is not automatically resumed. Implementations further relate to selectively causing, based on the determining and upon cessation of the second automated assistant session, either the automatic resumption of the prior first automated assistant session that was interrupted, or the transition to the state in which the first session is not automatically resumed.

Audio content recognition method and apparatus, and device and computer-readable medium

Embodiments of the present disclosure disclose an audio content recognition method and apparatus, an electronic device and a non-transitory computer-readable medium. A specific implementation of the method includes: obtaining a voice fragment collection and a non-voice fragment collection by segmenting audio; determining a type and language information of each voice fragment in the voice fragment collection; obtaining, for each voice fragment in the voice fragment collection, a first recognition result by performing voice recognition on the voice fragment based on the type and the language information of the voice fragment. In the implementation, speaking and music fragments in the audio are recognized by different models, so that two audio contents may both have better recognition effects. Moreover, audio of different language contents is recognized by using different models, thereby further improving a voice recognition effect.

SARCASM-SENSITIVE SPOKEN DIALOG SYSTEM
20210343280 · 2021-11-04 ·

A dialog system and a method of using the dialog system is disclosed. The method may comprise: receiving audible human speech from a user; determining that the audible human speech comprises sarcasm information; providing an input to a neural network, wherein the input comprises speech data input associated with the audible human speech, an embedding vector associated with the sarcasm information, and a one-hot vector; and based on the input, determining an audible response to the human speech.

GENERATING CUSTOMIZED MEETING INSIGHTS BASED ON USER INTERACTIONS AND MEETING MEDIA
20230291595 · 2023-09-14 ·

Methods, systems, and non-transitory computer readable storage media are disclosed for generating meeting insights based on media data and device input data. In one or more embodiments, the system analyzes media data and inputs to client devices associated with a meeting to determine a portion of the meeting that is relevant for a user. In one or more embodiments, the system generates a meeting summary, meeting highlights, or action items related to the media data to provide to the client device. In one or more embodiments, the system also uses the summary, highlights, or action items to train a machine-learning model for use with future meetings.

PERFORMING SUBTASK(S) FOR A PREDICTED ACTION IN RESPONSE TO A SEPARATE USER INTERACTION WITH AN AUTOMATED ASSISTANT PRIOR TO PERFORMANCE OF THE PREDICTED ACTION

Implementations herein relate to pre-caching data, corresponding to predicted interactions between a user and an automated assistant, using data characterizing previous interactions between the user and the automated assistant. An interaction can be predicted based on details of a current interaction between the user and an automated assistant. One or more predicted interactions can be initialized, and/or any corresponding data pre-cached, prior to the user commanding the automated assistant in furtherance of the predicted interaction. Interaction predictions can be generated using a user-parameterized machine learning model, which can be used when processing input(s) that characterize a recent user interaction with the automated assistant. The predicted interaction(s) can include action(s) to be performed by third-party application(s).

Objective training and evaluation

A system and method configured to generate a simulated caller dialog including a caller intended issue for a scenario for testing a customer service representative (CSR). A simulated caller dialog is presented to the CSR and a CSR response to the simulated caller dialog is received and includes a CSR interpretation of the caller intended issue to the simulated caller dialog. An understanding determination result based on an intent determination recognition score is generated by an intent determination recognition model is generated in response to a comparison of the CSR interpretation of the caller intended issue matching the caller intended issue in the simulated caller dialog. A CSR score is generated for the scenario based on the understanding determination result. The CSR score is recorded to a database.

Intelligence-driven virtual assistant for automated idea documentation

An intelligence-driven virtual assistant for automated documentation of new ideas is provided. During a brainstorming session, one or more user participants may discuss and identify one or more ideas. Such ideas may be tracked, catalogued, analyzed, developed, and further expanded upon through use of an intelligence-driven virtual assistant. Such virtual assistant may capture user input data embodying one or more new ideas and intelligently process the same in accordance with creativity tool workflows. Such workflows may further guide development and expansion upon a given idea, while continuing to document, analyze, and identify further aspects to develop and expand.

Speech control method, terminal device, and storage medium

A speech control method, for a terminal device, includes: receiving an input speech control instruction, obtaining a recognition result of the speech control instruction; searching for an execution object matching the recognition result step by step within a preset search range; and responding to the speech control instruction based on a search result; in which the preset search range at least includes any one of: a current interface of the terminal device when receiving the speech control instruction, at least one application currently running on the terminal device when receiving the speech control instruction, and a system of the terminal device.

Sound identification system for privacy

A “SOUND ID SYSTEM” comprise a Device with a microphone that continuously monitors the surrounding environment. Pre-recorded personalized or unique sound patterns, signatures, and/or key words, and/or their derivatives, are stored in the Device. When such a pre-determined pre-recorded sound pattern, signature, or key words, or their derivatives, are recognized by the Device, a corresponding pre-determined action or action sequence will be sent to the Cloud, or other devices, to execute. Such other devices can attach, co-habitat, and wired/wirelessly connect to the Device, or share some components with the Device. If the Device does not recognize any particular pre-recorded sound pattern, signature or key words, or their derivatives, then the Device will not transmit to disclose to any outside device or other entity, any monitored sound information.