Patent classifications
G10L2015/085
Annotating maps with user-contributed pronunciations
Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user's request.
Communication-based monitoring of compliance with aviation regulations and operating procedures
A method for detecting noncompliance with aviation regulations and operating procedures is disclosed herein. The method analyzes communication associated with a host aircraft to identify key information, stores the identified key information in a data storage device, and determines whether a particular aviation regulation or operating procedure applies to the host aircraft. After determining that a regulation or operating procedure applies, the stored key information is compared against reference information maintained in a database in association with the regulation or operating procedure. An alert is generated when the comparing detects a discrepancy between the stored key information and the reference information.
Dynamic pruning in speech recognition
In a dynamic automatic speech recognition (ASR) processing system, ASR processing may be configured to estimate a latency of returning speech results to a user based on work being done by an ASR processor. The ASR processing system may measure work done by an ASR processor by measuring one or more time independent metrics and comparing the metrics to threshold values. If the metrics exceed the thresholds, the ASR system may take steps to reduce latency associated with processing the utterance, including adjusting a speech recognition parameter.
DYNAMIC ADAPTATION OF LANGUAGE MODELS AND SEMANTIC TRACKING FOR AUTOMATIC SPEECH RECOGNITION
Generally, this disclosure provides systems, devices, methods and computer readable media for adaptation of language models and semantic tracking to improve automatic speech recognition (ASR). A system for recognizing phrases of speech from a conversation may include an ASR circuit configured to transcribe a user's speech to a first estimated text sequence, based on a generalized language model. The system may also include a language model matching circuit configured to analyze the first estimated text sequence to determine a context and to select a personalized language model (PLM), from a plurality of PLMs, based on that context. The ASR circuit may further be configured to re-transcribe the speech based on the selected PLM to generate a lattice of paths of estimated text sequences, wherein each of the paths of estimated text sequences comprise one or more words and an acoustic score associated with each of the words.
System and Method for Dynamically Adjusting a Number of Emissions in Speech Processing Systems Operating with Large Stride Values
A method, computer program product, and computing system for dynamically adjusting the number of emitted tokens per frame in speech processing systems operating with large stride values. The number of emitted tokens per frame can be dynamically adjusted in speech processing systems operating with large stride values by processing a signal frame according to a time-synchronous beam search technique at a frame rate based on a stride value; determining a hypothesis score for each hypothesis of a set of first information for the signal frame; determining a hypothesis score for each hypothesis of a set of second information for the signal frame; comparing a worst hypothesis score of the set of first information to a sum of a best hypothesis score of the set of second information and a threshold value; and ceasing processing of the signal frame when the worst hypothesis score of the set of first information is greater than the sum of the best hypothesis score of the set of second information and the threshold value.
COMMUNICATION-BASED MONITORING OF COMPLIANCE WITH AVIATION REGULATIONS AND OPERATING PROCEDURES
A method for detecting noncompliance with aviation regulations and operating procedures is disclosed herein. The method analyzes communication associated with a host aircraft to identify key information, stores the identified key information in a data storage device, and determines whether a particular aviation regulation or operating procedure applies to the host aircraft. After determining that a regulation or operating procedure applies, the stored key information is compared against reference information maintained in a database in association with the regulation or operating procedure. An alert is generated when the comparing detects a discrepancy between the stored key information and the reference information.
Decoding parameters for Viterbi search
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decoding parameters for Viterbi search are disclosed. In one aspect, a method includes the actions of receiving lattice data that defines a plurality of lattices. The actions include for each defined lattice determining a particular path that traverses the lattice; determining a node cost of a path from the start node to the frame node; determining a beam size for each frame; determining a beam cost width for each frame; determining a maximum beam size from the beam sizes determined for frames; and determining a maximum beam cost width from the beam cost widths determine for the frames. The actions include selecting a particular beam size and a particular beam cost width. The actions include determining paths for additional lattices using the pruning parameters of the particular beam size and the particular beam cost width.
Systems and methods for utilizing voice commands onboard an aircraft
A system and method are provided for programming a flight management system in response to a voice input. The voice input is validated (the pilot spoken vocabulary is thereby filtered (e.g., adapted) to improve recognition accuracy and reduce false positives) by comparing the current operational state of the aircraft (for example, climb, level flight, descent, speed, altitude, and heading), operation validities and availabilities (for example, operations allowed and not allowed) based on the flight management system planned and predicted lateral and vertical trajectory of the flight route (flight plan) from origin/present position to destination, and the requested action being taken.
Speech decoding method and apparatus, computer device, and storage medium
A method for speech decoding is performed by a computer device. The method includes: obtaining audio data corresponding to a speech, the audio data including a first audio frame and a second audio frame; decoding the first audio frame using a first decoding network corresponding to a low-order language model and a second decoding network corresponding to a differential language model to obtain a plurality of first tokens, each first token having a corresponding decoding score according to the first and second decoding network; determining pruning parameters according to a target token of the plurality of first tokens having a smallest decoding score, wherein the pruning parameters is used for restricting a decoding process of the second audio frame; and decoding the second audio frame using the first decoding network and the second decoding network according to the first token list and the pruning parameters.
METHOD AND APPARATUS WITH DECODING IN NEURAL NETWORK FOR SPEECH RECOGNITION
A decoding method includes receiving an input sequence corresponding to an input speech at a current time; and in a neural network (NN) for speech recognition, generating an encoded vector sequence by encoding the input sequence, determining reuse tokens from candidate beams of two or more previous times by comparing the candidate beams of the previous times, and decoding one or more tokens subsequent to the reuse tokens based on the reuse tokens and the encoded vector sequence.