Patent classifications
G10L15/34
Voice control method, cloud server and terminal device
A voice control method that includes: a terminal device receiving voice information; the terminal device querying a control instruction corresponding to the voice information from a local voice library; when no control instruction corresponding to the voice information is found in the local voice library, the terminal device uploading the voice information onto a cloud server; the cloud server querying the control instruction corresponding to the voice information from a cloud voice library; when the control instruction corresponding to the voice information is found in the cloud voice library, the cloud server sending the control instruction to the terminal device; the terminal device receiving the control instruction, and executing a corresponding operation on the basis of the control instruction. The present disclosure improves the response speed of a terminal device, and improves user experience.
Voice control method, cloud server and terminal device
A voice control method that includes: a terminal device receiving voice information; the terminal device querying a control instruction corresponding to the voice information from a local voice library; when no control instruction corresponding to the voice information is found in the local voice library, the terminal device uploading the voice information onto a cloud server; the cloud server querying the control instruction corresponding to the voice information from a cloud voice library; when the control instruction corresponding to the voice information is found in the cloud voice library, the cloud server sending the control instruction to the terminal device; the terminal device receiving the control instruction, and executing a corresponding operation on the basis of the control instruction. The present disclosure improves the response speed of a terminal device, and improves user experience.
Speech recognition using multiple voice-enabled devices
Techniques for using multiple voice-enabled devices in a user environment to reduce the latency for obtaining responses to user utterances from a remote system. The voice-enabled devices may each establish connections with the remote system to have the remote system perform supplemental speech processing for utterances the devices are unable to process locally. One voice-enabled device may have a higher-latency connection to the remote system, and another voice-enabled device may have a lower-latency connection to the remote system. The lower-latency device may send an utterance to the remote system before the higher-latency device is able, and the remote system may begin processing the utterance faster than if the lower-latency device sent the utterance. The remote system may then provide a response for the utterance to the higher-latency device in less time than if the remote system had to wait for the utterance from the higher-latency device.
Speech recognition using multiple voice-enabled devices
Techniques for using multiple voice-enabled devices in a user environment to reduce the latency for obtaining responses to user utterances from a remote system. The voice-enabled devices may each establish connections with the remote system to have the remote system perform supplemental speech processing for utterances the devices are unable to process locally. One voice-enabled device may have a higher-latency connection to the remote system, and another voice-enabled device may have a lower-latency connection to the remote system. The lower-latency device may send an utterance to the remote system before the higher-latency device is able, and the remote system may begin processing the utterance faster than if the lower-latency device sent the utterance. The remote system may then provide a response for the utterance to the higher-latency device in less time than if the remote system had to wait for the utterance from the higher-latency device.
CONTEXTUAL ATTENTION ACROSS DIVERSE ARTIFICIAL INTELLIGENCE VOICE ASSISTANCE SYSTEMS
An artificial intelligence that registers users to a common distributed computing platform that provides access to a plurality of voice assistants. A first command issued by a user is received by a first voice assistant of the plurality of voice assistants at a first location. A context database is built by storing the at least the first command on the distributed computing platform. A second command issued by the user is received by a second voice assistant of the plurality of first assistants at a second location. A context assigning engine on the common distributed computing platform analyzes the second command in comparison with the at least the first command on the context database. Similarity between the first command and the second command provides context. The second voice assistant receiving the context from the context assigning engine employs the context to answer the second command.
CONTEXTUAL ATTENTION ACROSS DIVERSE ARTIFICIAL INTELLIGENCE VOICE ASSISTANCE SYSTEMS
An artificial intelligence that registers users to a common distributed computing platform that provides access to a plurality of voice assistants. A first command issued by a user is received by a first voice assistant of the plurality of voice assistants at a first location. A context database is built by storing the at least the first command on the distributed computing platform. A second command issued by the user is received by a second voice assistant of the plurality of first assistants at a second location. A context assigning engine on the common distributed computing platform analyzes the second command in comparison with the at least the first command on the context database. Similarity between the first command and the second command provides context. The second voice assistant receiving the context from the context assigning engine employs the context to answer the second command.
Electronic device and method for controlling the electronic device
Disclosed are an electronic device capable of efficiently performing speech recognition and natural language understanding and a method for controlling thereof. The electronic device includes: a microphone; a non-volatile memory configured to store virtual assistant model data comprising data that is classified according to a plurality of domains and data that is commonly used for the plurality of domains; a volatile memory; and a processor configured to: based on receiving, through the microphone, a trigger input to perform speech recognition for a user speech, initiate loading the virtual assistant model data from the non-volatile memory into the volatile memory, load, into the volatile memory, first data from among the data classified according to the plurality of domains and, while loading the first data into the volatile memory, load at least a part of the data commonly used for the plurality of domains into the volatile memory.
Electronic device and method for controlling the electronic device
Disclosed are an electronic device capable of efficiently performing speech recognition and natural language understanding and a method for controlling thereof. The electronic device includes: a microphone; a non-volatile memory configured to store virtual assistant model data comprising data that is classified according to a plurality of domains and data that is commonly used for the plurality of domains; a volatile memory; and a processor configured to: based on receiving, through the microphone, a trigger input to perform speech recognition for a user speech, initiate loading the virtual assistant model data from the non-volatile memory into the volatile memory, load, into the volatile memory, first data from among the data classified according to the plurality of domains and, while loading the first data into the volatile memory, load at least a part of the data commonly used for the plurality of domains into the volatile memory.
MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES
An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.
METHOD FOR HUMAN-COMPUTER INTERACTION, APPARATUS FOR HUMAN-COMPUTER INTERACTION, DEVICE, AND STORAGE MEDIUM
The present disclosure provides a method for a human-computer interaction, an apparatus for a human-computer interaction, a device, and a storage medium, and the present disclosure relates to the field of artificial intelligence, such as deep learning and voice. A specific implementation includes: acquiring a voice command; performing voice recognition on the voice command to determine a corresponding voice text; sending, in response to satisfying a preset information sending condition, the voice text to a cloud; receiving a resource for the voice command returned from the cloud; and responding to the voice command based on the resource.