Patent classifications
G10L15/083
Systems and methods for voice topic spotting
A voice topic spotting system includes a learning module and a voice topic classifier module. The learning module receives training audio segments with topic labels and generates a fast keyword filter model based on a set of topic-indicative words and generates a topic identification model based on a training set of topic keyword-containing lattices. The voice topic classifier module includes an automatic speech recognition engine arranged to identify one or more keywords included in a received audio segment and output the one or more keywords. A fast keyword filter, implements the fast keyword model to output the received audio segment if a topic-indicative word is detected in the audio segment. A decoder generates a topic keyword-containing lattice associated with the audio segment. A voice topic classifier implements the voice topic identification model to determine a topic associated with received audio segment.
Methods and systems for reducing latency in automated assistant interactions
Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.
Method and apparatus for selecting voice-enabled device and intelligent computing device for controlling the same
A method and apparatus for selecting a voice-enabled device are disclosed. In the voice-enabled device selecting method, when recognition situation information is obtained from one device, voice signals may be obtained from that device and second devices registered to the same account as that device, and a device that will respond to the wakeup word may be selected based on the voice signals. Thus, even if the closest device the user intends to activate is not able to recognize a wakeup word spoken by the user due to the device's microphone position, that device may be selected as a voice-enabled device. At least one of a voice enable device selecting apparatus, an intelligent computing device, an IoT device, and a server controlling the voice enable device selecting apparatus may be associated with an artificial intelligence (AI) module, an unmanned aerial vehicle (UAV) (or drone), a robot, an augmented reality (AR) device, a virtual reality (VR) device, and a device related to a 5G service.
Intelligence-driven virtual assistant for automated idea documentation
An intelligence-driven virtual assistant for automated documentation of new ideas is provided. During a brainstorming session, one or more user participants may discuss and identify one or more ideas. Such ideas may be tracked, catalogued, analyzed, developed, and further expanded upon through use of an intelligence-driven virtual assistant. Such virtual assistant may capture user input data embodying one or more new ideas and intelligently process the same in accordance with creativity tool workflows. Such workflows may further guide development and expansion upon a given idea, while continuing to document, analyze, and identify further aspects to develop and expand.
Speech recognition hypothesis generation according to previous occurrences of hypotheses terms and/or contextual data
Implementations set forth herein relate to speech recognition techniques for handling variations in speech among users (e.g. due to different accents) and processing features of user context in order to expand a number of speech recognition hypotheses when interpreting a spoken utterance from a user. In order to adapt to an accent of the user, terms common to multiple speech recognition hypotheses can be filtered out in order to identify inconsistent terms apparent in a group of hypotheses. Mappings between inconsistent terms can be stored for subsequent users as term correspondence data. In this way, supplemental speech recognition hypotheses can be generated and subject to probability-based scoring for identifying a speech recognition hypothesis that most correlates to a spoken utterance provided by a user. In some implementations, prior to scoring, hypotheses can be supplemented based on contextual data, such as on-screen content and/or application capabilities.
SYSTEMS AND METHODS FOR SCRIPTED AUDIO PRODUCTION
A scripted audio production system in which the scripted audio production computerized process decreases production time by improving computerized processes and technological systems for pronunciation research and script preparation, narration, editing, proofing and mastering. The system enables the user to upload their manuscript and recorded audio of the narration of the manuscript to the system. The system then compares the recorded audio against previously uploaded manuscript and any mistakes or deviations from the manuscript are highlighted or otherwise indicated to the user. In other embodiments, after uploading the manuscript, the system enables the user to press “record,” and as soon as they start speaking, the scripted audio production technology system tracks the point within the manuscript from where they are reading. Any mistakes or deviations from the script are automatically highlighted. The narrator may then stop, and re-record a sentence after a mistake. The system automatically pieces together the last-read audio into a clean file without the need for significant user interaction. The process may also be performed on the recorded audio by the narrator first uploading the audio and manuscript to the scripted audio production technology system.
CONTEXT-SENSITIVE DYNAMIC UPDATE OF VOICE TO TEXT MODEL IN A VOICE-ENABLED ELECTRONIC DEVICE
A voice to text model used by a voice-enabled electronic device is dynamically and in a context-sensitive manner updated to facilitate recognition of entities that potentially may be spoken by a user in a voice input directed to the voice-enabled electronic device. The dynamic update to the voice to text model may be performed, for example, based upon processing of a first portion of a voice input, e.g., based upon detection of a particular type of voice action, and may be targeted to facilitate the recognition of entities that may occur in a later portion of the same voice input, e.g., entities that are particularly relevant to one or more parameters associated with a detected type of voice action.
METHOD AND SYSTEM OF AUTOMATIC SPEECH RECOGNITION WITH HIGHLY EFFICIENT DECODING
A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.
COMMAND KEYWORDS WITH INPUT DETECTION WINDOWING
A device, such as Network Microphone Device or a playback device, receives an indication of a track change associated with a playback queue output by a media playback system. In response, an input detection window is opened for a given time period. During the given time period the device is arranged to receive an input sound data stream representing sound detected by a microphone. The input sound data stream is analyzed for a plurality of command keywords and/or a wake-word for a Voice Assistant Service (VAS) and, based on the analysis, it is determined that the input sound data stream includes voice input data comprising a command keyword or a wake-word for a VAS. In response, the device takes appropriate action such as causing the media playback system to perform a command corresponding to the command keyword or sending at least part of the input sound data stream to the VAS.
INTERACTIVE GROUP SESSION COMPUTING SYSTEMS AND RELATED METHODS
Assistive technologies are herein provided to assist leaders in engaging one or more group participants using a combination of private data specific to a participant and public data specific to a participant. The system includes: a group bot that has public group data and private group data, a first bot for a first participant that has private data and public data associated with the first participant, and a leader bot for a leader. The leader bot is data interactive with the group bot and the first bot, and can cause the first bot to appropriately serve private data on a permissioned private device of the first participant and to serve public data on a permissioned group output device.