G10L2015/081

Methods for hybrid GPU/CPU data processing

The present invention describes methods for performing large-scale graph traversal calculations on parallel processor platforms. The invention describes methods for on-the-fly hypothesis rescoring that utilizes graphic processing units (GPUs) in combination with utilizing central processing units (CPUs) of computing devices. The invention is described in one embodiment as applied to the task of large vocabulary continuous speech recognition.

Decoding parameters for Viterbi search
09552808 · 2017-01-24 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decoding parameters for Viterbi search are disclosed. In one aspect, a method includes the actions of receiving lattice data that defines a plurality of lattices. The actions include for each defined lattice determining a particular path that traverses the lattice; determining a node cost of a path from the start node to the frame node; determining a beam size for each frame; determining a beam cost width for each frame; determining a maximum beam size from the beam sizes determined for frames; and determining a maximum beam cost width from the beam cost widths determine for the frames. The actions include selecting a particular beam size and a particular beam cost width. The actions include determining paths for additional lattices using the pruning parameters of the particular beam size and the particular beam cost width.

PERFORMING SPEECH RECOGNITION USING A SET OF WORDS WITH DESCRIPTIONS IN TERMS OF COMPONENTS SMALLER THAN THE WORDS

A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

Electronic device and speech processing method thereof

According to various example embodiments, an electronic device includes a microphone configured to receive an audio signal including speech of a user, a processor, and a memory configured to store instructions executable by the processor and personal information of the user, in which the processor is configured to extract a plurality of speech recognition candidates by analyzing a feature of the speech of the user, extract a keyword based on the plurality of speech recognition candidates, search for replacement data, based on the keyword and the personal information, and generate a recognition result corresponding to the speech of the user, based on the replacement data.

Specifying preferred information sources to an assistant
12347429 · 2025-07-01 · ·

Implementations relate to interactions between a user and an automated assistant during a dialog between the user and the automated assistant. Some implementations relate to processing received user request input to determine that it is of a particular type that is associated with a source parameter rule and, in response, causing one or more sources indicated as preferred by the source parameter rule and one or more additional sources not indicated by the source parameter rule to be searched based on the user request input. Further, those implementations relate to identifying search results of the search(es), and generating, in dependence on the search results, a response to the user request that includes content from search result(s) of the preferred source(s) and/or content from search result(s) of the additional source(s). Generating the response further includes including, in the response, some indication that indicates whether the source parameter rule was followed or violated in generating the response.

Removing bias from automatic speech recognition models using internal language model estimates

Bias may be removed from automatic speech recognition model predictions using internal language model estimates. Audio data may be received for speech recognition. The audio data may be processed both through an automatic speech recognition model to produce original word token predictions and masked in different portions of the audio data to produce other word token predictions for the masked audio. A comparison of the original word token predictions and the other word token predictions may provide an estimate of an internal language model for the automatic speech recognition model. This estimate can be used to modify the original word token predictions to remove the lexical bias and produce a speech prediction.

SYSTEMS AND METHODS FOR IMPROVED HANDLING OF OUT-OF-VOCABULARY WORDS IN SPEECH RECOGNITION SYSTEMS
20250298980 · 2025-09-25 ·

Systems and methods applicable, for instance, to improved handling of out-of-vocabulary words in speech recognition systems. A machine learning model can be trained to selectively associate frequency tokens with transcribed words. Once the model has been trained, a system can make a decision to turn on or turn off the use of contextual information for a given transcribed word, based on the frequency token placement decision made by the machine learning model for that transcribed word.

SPECIFYING PREFERRED INFORMATION SOURCES TO AN ASSISTANT
20250329330 · 2025-10-23 ·

Implementations relate to interactions between a user and an automated assistant during a dialog between the user and the automated assistant. Some implementations relate to processing received user request input to determine that it is of a particular type that is associated with a source parameter rule and, in response, causing one or more sources indicated as preferred by the source parameter rule and one or more additional sources not indicated by the source parameter rule to be searched based on the user request input. Further, those implementations relate to identifying search results of the search(es), and generating, in dependence on the search results, a response to the user request that includes content from search result(s) of the preferred source(s) and/or content from search result(s) of the additional source(s). Generating the response further includes including, in the response, some indication that indicates whether the source parameter rule was followed or violated in generating the response.

Deep learning internal state index-based search and classification
12499875 · 2025-12-16 · ·

Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.

Vehicle and control method thereof
12499888 · 2025-12-16 · ·

A vehicle and a control method thereof includes a microphone to which a speech command of a user is input; a communication module configured to receive contact data and contact history data from a mobile device; at least one memory configured to store a first speech recognition database, obtained based on the contact history data received from the mobile device, and a second speech recognition database obtained based on the contact data received from the mobile device; and at least one processor configured to, when a speech command for calling or texting is input to the microphone, determine a final recipient or generate a recipient candidate list, based on recipient information included in the speech command, the stored first speech recognition database, and the stored second speech recognition database.