G10L15/34

Systems and methods for cloud computing data processing
11354162 · 2022-06-07 ·

Systems and methods allow users to leverage multiple disparate cloud solutions, offered by disparate service providers, in a unified and cohesive manner. A system includes an engine configured to receive performance metrics from two or more disparate cloud services, select target resources among the two or more disparate cloud services to run tasks based on the performance metrics, a multiservice load balancing scheme, and task parameters. Resources can be scaled up or down in the two or more disparate cloud services based on task loads.

Method for processing the output of a speech recognizer

A method for processing speech, comprising semantically parsing a received natural language speech input with respect to a plurality of predetermined command grammars in an automated speech processing system; determining if the parsed speech input unambiguously corresponds to a command and is sufficiently complete for reliable processing, then processing the command; if the speech input ambiguously corresponds to a single command or is not sufficiently complete for reliable processing, then prompting a user for further speech input to reduce ambiguity or increase completeness, in dependence on a relationship of previously received speech input and at least one command grammar of the plurality of predetermined command grammars, reparsing the further speech input in conjunction with previously parsed speech input, and iterating as necessary. The system also monitors abort, fail or cancel conditions in the speech input.

Method for processing the output of a speech recognizer

A method for processing speech, comprising semantically parsing a received natural language speech input with respect to a plurality of predetermined command grammars in an automated speech processing system; determining if the parsed speech input unambiguously corresponds to a command and is sufficiently complete for reliable processing, then processing the command; if the speech input ambiguously corresponds to a single command or is not sufficiently complete for reliable processing, then prompting a user for further speech input to reduce ambiguity or increase completeness, in dependence on a relationship of previously received speech input and at least one command grammar of the plurality of predetermined command grammars, reparsing the further speech input in conjunction with previously parsed speech input, and iterating as necessary. The system also monitors abort, fail or cancel conditions in the speech input.

ELECTRONIC APPARATUS AND METHOD OF CONTROLLING THE SAME
20220139413 · 2022-05-05 · ·

An electronic apparatus includes an interface configured to connect with an external apparatus, and a processor. The processor is configured to, in response to a first user speech received by the electronic apparatus including a trigger word, identify a first noise level corresponding to the first user speech received by the electronic apparatus. The processor is configured to identify a first recognition apparatus among a plurality of recognition apparatuses having a highest first noise level corresponding to the first user speech. The plurality of recognition apparatuses identify the first user speech as the trigger word and include the electronic apparatus and the external apparatus. The processor is configured to perform a control operation to implement a function corresponding to a second user speech in response to identifying a second recognition apparatus as having a highest second noise level corresponding to the second user speech among the plurality of recognition apparatuses.

Parallel signal processing system and method
11322171 · 2022-05-03 ·

A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general-purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.

Parallel signal processing system and method
11322171 · 2022-05-03 ·

A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general-purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.

Intermediate data for inter-device speech processing

Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.

Intermediate data for inter-device speech processing

Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.

Orchestrating execution of a series of actions requested to be performed via an automated assistant

Implementations are set forth herein for creating an order of execution for actions that were requested by a user, via a spoken utterance to an automated assistant. The order of execution for the requested actions can be based on how each requested action can, or is predicted to, affect other requested actions. In some implementations, an order of execution for a series of actions can be determined based on an output of a machine learning model, such as a model that has been trained according to supervised learning. A particular order of execution can be selected to mitigate waste of processing, memory, and network resources—at least relative to other possible orders of execution. Using interaction data that characterizes past performances of automated assistants, certain orders of execution can be adapted over time, thereby allowing the automated assistant to learn from past interactions with one or more users.

VOICE CONTROL METHOD, CLOUD SERVER AND TERMINAL DEVICE

A voice control method that includes: a terminal device receiving voice information; the terminal device querying a control instruction corresponding to the voice information from a local voice library; when no control instruction corresponding to the voice information is found in the local voice library, the terminal device uploading the voice information onto a cloud server; the cloud server querying the control instruction corresponding to the voice information from a cloud voice library; when the control instruction corresponding to the voice information is found in the cloud voice library, the cloud server sending the control instruction to the terminal device; the terminal device receiving the control instruction, and executing a corresponding operation on the basis of the control instruction. The present disclosure improves the response speed of a terminal device, and improves user experience.