Patent classifications
G10L15/04
Speech endpointing
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
NETWORKED DEVICES, SYSTEMS, & METHODS FOR INTELLIGENTLY DEACTIVATING WAKE-WORD ENGINES
In one aspect, a playback deice is configured to identify in an audio stream, via a second wake-word engine, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by a microphone. The first and second wake-word engines are configured according to different sensitivity levels for false positives of a particular wake word. Based on identifying the false wake word, the playback device is configured to (i) deactivate the first wake-word engine and (ii) cause at least one network microphone device to deactivate a wake-word engine for a particular amount of time. While the first wake-word engine is deactivated, the playback device is configured to cause at least one speaker to output audio based on the audio stream. After a predetermined amount of time has elapsed, the playback device is configured to reactivate the first wake-word engine.
Method and apparatus for information query and storage medium
The present application discloses a method and an apparatus for information query, and an electronic device, which relates to a field of deep learning (DL), natural language processing (NLP) and artificial intelligence (AI) technology. The method includes: receiving a query sentence, segmenting the query sentence to obtain word segments, and obtaining a dependency relationship between two word segments and part of speech of the word segments; obtaining a coding sequence of the query sentence according to the dependency relationship and the part of speech of the word segments; matching the coding sequence with a generalized template to obtain a core corpus of the query sentence, wherein the generalized template comprises part of speech to be extracted and a dependency relationship to be extracted; and obtaining a query result corresponding to the query sentence based on the core corpus. The application no longer relies on the accumulation of massive business scenario data to enhance a generalization ability, which ensures accurate and efficient information query, and improves the efficiency and reliability of the information query process. At the same time, it may support information query in different business scenarios, with strong expansion capability and high universality.
Method and apparatus for information query and storage medium
The present application discloses a method and an apparatus for information query, and an electronic device, which relates to a field of deep learning (DL), natural language processing (NLP) and artificial intelligence (AI) technology. The method includes: receiving a query sentence, segmenting the query sentence to obtain word segments, and obtaining a dependency relationship between two word segments and part of speech of the word segments; obtaining a coding sequence of the query sentence according to the dependency relationship and the part of speech of the word segments; matching the coding sequence with a generalized template to obtain a core corpus of the query sentence, wherein the generalized template comprises part of speech to be extracted and a dependency relationship to be extracted; and obtaining a query result corresponding to the query sentence based on the core corpus. The application no longer relies on the accumulation of massive business scenario data to enhance a generalization ability, which ensures accurate and efficient information query, and improves the efficiency and reliability of the information query process. At the same time, it may support information query in different business scenarios, with strong expansion capability and high universality.
SYSTEMS AND METHODS FOR AUTOMATED AUDIO TRANSCRIPTION, TRANSLATION, AND TRANSFER FOR ONLINE MEETING
The present invention discloses systems and methods for multimedia processing. For example, the present invention provides systems and methods for receiving spoken audio, converting the spoken audio to text, and transferring the text to a user. As desired, the speech or text can be translated into one or more different languages. Systems and methods for real-time conversion and transmission of speech and text are provided, including systems and methods for large scale processing of multimedia events.
SYSTEMS AND METHODS FOR AUTOMATED AUDIO TRANSCRIPTION, TRANSLATION, AND TRANSFER FOR ONLINE MEETING
The present invention discloses systems and methods for multimedia processing. For example, the present invention provides systems and methods for receiving spoken audio, converting the spoken audio to text, and transferring the text to a user. As desired, the speech or text can be translated into one or more different languages. Systems and methods for real-time conversion and transmission of speech and text are provided, including systems and methods for large scale processing of multimedia events.
Intelligent Voice Interface for Handling Out-of-Context Dialog
In a method for handling out-of-sequence caller dialog, an intelligent voice interface is configured to lead callers through pathways of an algorithmic dialog that includes available voice prompts for requesting different types of caller information. The method may include, during a voice communication with a caller via a caller device, receiving from the caller device caller input data indicative of a voice input of the caller, without having first provided to the caller device any voice prompt that requests a first type of caller information, and determining, by processing the caller input data, that the voice input includes caller information of the first type. The method also includes after determining that the voice input includes the caller information of the first type, bypassing one or more voice prompts, of the available voice prompts, that request the first type of caller information.
Inverted Projection for Robust Speech Translation
The technology provides an approach to train translation models that are robust to transcription errors and punctuation errors. The approach includes introducing errors from actual automatic speech recognition and automatic punctuation systems into the source side of the machine translation training data. A method for training a machine translation model includes performing automatic speech recognition on input source audio to generate a system transcript. The method aligns a human transcript of the source audio to the system transcript, including projecting system segmentation onto the human transcript. Then the method performs segment robustness training of a machine translation model according to the aligned human and system transcripts, and performs system robustness training of the machine translation model, e.g., by injecting token errors into training data.