Patent classifications
G10L25/00
Personalized talking detector for electronic device
The present disclosure provides for improved hot word detection in electronic devices, particularly small form factor devices such as wearables. The electronic device includes an accelerometer onboard to pick up voice in noisy conditions, and utilizes the accelerometer to confirm that a particular user intended to activate the hot word detection, thereby reducing false detection of other people's voices.
System and method for controlling devices through voice interaction
A method and a system for controlling devices through voice interaction are disclosed. In an embodiment, the method may include identifying at least one feature of a target device and an action to be performed on the at least one feature, based on an intent and an object determined from a voice input received from a user. The method may further include determining a correspondence between the at least one feature and the action to be performed using a trained neural network. The method may further include comparing a current operational state of the at least one feature with an operational threshold of the at least one feature. The method may further include performing the action on the at least one feature based on the determined correspondence, when the current operational state is within limits of the operational threshold.
Multi-modal conversational agent platform
A method includes receiving data characterizing an utterance of a query associated with a tenant; providing, to an automated speech recognition engine, the received data and a profile selected from a plurality of profiles based on the tenant, the profile configuring the automated speech recognition engine to process the received data; receiving, from the automated speech recognition engine, a text string characterizing the query; and processing, via an ensemble of natural language agents configured based on the tenant, the text string characterizing the query to determine a textual response to the query, the textual response including at least one word from a first lexicon associated with the tenant. Related systems, methods, apparatus, and computer readable mediums are also described.
SYSTEMS AND METHODS FOR INFERRING THE LANGUAGE OF MEDIA CONTENT ITEM
An electronic device associated with a media-providing service obtains metadata for a collection of media content items. The metadata specifies an initial value for a language of the audio of a respective media content item. The electronic device obtains a listening history for users of the media-providing service. The listening history specifies which media content items of the collection of media content items a respective user has listened to. The electronic device determines, for a first user, one or more languages corresponding to the first user based on the initial values of the languages of the audio of the media content items that the respective user has listened to. The electronic device determines, for the respective media content item, an updated value for the language of the audio based on the one or more languages corresponding to the users that have listened to the respective media content item.
Learning offline voice commands based on usage of online voice commands
Methods, systems, apparatus, including computer programs encoded on a computer storage medium, for a user device to learn offline voice actions. In one aspect, the method includes actions of detecting, by the user device, an utterance at a first time when the user device is connected to a server by a network, providing, by the user device, the utterance to the server using the network, receiving, by the user device and from the server, an update to the grammar of the user device, detecting, by the user device, a subsequent utterance of the utterance at a second time when the user device is not connected to the server by a network, and in response to detecting, by the user device, the subsequent utterance of the utterance at the second time, identifying, by the user device, an operation to perform based on (i) the subsequent utterance, and (ii) the updated grammar.
Systems and methods for manipulating electronic content based on speech recognition
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.
Learning offline voice commands based on usage of online voice commands
Methods, systems, apparatus, including computer programs encoded on a computer storage medium, for a user device to learn offline voice actions. In one aspect, the method includes actions of detecting, by the user device, an utterance at a first time when the user device is connected to a server by a network, providing, by the user device, the utterance to the server using the network, receiving, by the user device and from the server, an update to the grammar of the user device, detecting, by the user device, a subsequent utterance of the utterance at a second time when the user device is not connected to the server by a network, and in response to detecting, by the user device, the subsequent utterance of the utterance at the second time, identifying, by the user device, an operation to perform based on (i) the subsequent utterance, and (ii) the updated grammar.
Determining a target device for voice command interaction
Systems, methods, and devices for determining a target device for a voice command are provided. A voice command is detected at a plurality of devices. A weight is determined for the detected voice command at each device of the plurality of devices. The determined weight is exchanged among the plurality of devices. A highest weight among the exchanged weights is determined. The device associated with the highest weight is determined as the target device for the voice command.
Development of voice and other interaction applications
Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model.
Utilizing a dynamic memory network for state tracking
The present disclosure relates to generating digital responses based on digital dialog states generated by a neural network having a dynamic memory network architecture. For example, in one or more embodiments, the disclosed system provides a digital dialog having one or more segments to a dialog state tracking neural network having a dynamic memory network architecture that includes a set of multiple memory slots. In some embodiments, the dialog state tracking neural network further includes update gates and reset gates used in modifying the values stored in the memory slots. For instance, the disclosed system can utilize cross-slot interaction update/reset gates to accurately generate a digital dialog state for each of the segments of digital dialog. Subsequently, the system generates a digital response for each segment of digital dialog based on the digital dialog state.