G10L15/193

Open smart speaker

Methods to build an open smart speaker to orchestrate voice services from multiple providers, and open smart speakers that orchestrate voice services from multiple providers.

SEMANTIC UNDERSTANDING METHOD AND APPARATUS, AND DEVICE AND STORAGE MEDIUM
20220392440 · 2022-12-08 ·

A semantic understanding method and apparatus, and a device and a storage medium are provided. The method includes: acquiring a recognition character string that matches speech information; acquiring, from an entity vocabulary library, at least one entity vocabulary respectively corresponding to each recognition character in the recognition character string; and according to a situation of each entity vocabulary hitting the recognition character string, determining a matching entity vocabulary as a semantic understanding result of the speech information. By means of the method, insofar as a completely matching entity vocabulary is not acquired, a matching entity vocabulary can still be determined according to an entity vocabulary library, and semantic information of speech is thus accurately understood; and the method also has relatively high fault tolerance for situations such as wrong words, added words, and omitted words, such that the semantic understanding accuracy of speech information is improved.

SPEECH RECOGNITION METHODS AND SYSTEMS WITH CONTEXTUAL KEYWORD MAPPING

Methods and systems are provided for assisting operation of a vehicle using speech recognition. One method involves automatically identifying a parameter value for an operational subject based at least in part on a preceding audio communication with respect to the vehicle and thereafter recognizing an audio input as an input command, determining a second operational subject associated with the input command, and automatically commanding a vehicle system to implement the parameter value for the operational subject when the second operational subject maps or otherwise corresponds to the operational subject associated with the parameter value. In this regard, the second operational subject may be conveyed by a user enunciating placeholder terminology that maps to the operational subject as part of the input command.

Neural network accelerator with compact instruct set
11520561 · 2022-12-06 · ·

Described herein is a neural network accelerator with a set of neural processing units and an instruction set for execution on the neural processing units. The instruction set is a compact instruction set including various compute and data move instructions for implementing a neural network. Among the compute instructions are an instruction for performing a fused operation comprising sequential computations, one of which involves matrix multiplication, and an instruction for performing an elementwise vector operation. The instructions in the instruction set are highly configurable and can handle data elements of variable size. The instructions also implement a synchronization mechanism that allows asynchronous execution of data move and compute operations across different components of the neural network accelerator as well as between multiple instances of the neural network accelerator.

NATURAL LANGUAGE INTERFACES

It is not trivial to implement speech and natural language processing in offline embedded systems. Voice control of devices in various settings and applications can benefit from an embedded speech and natural language processing solution. One feature that helps to correct automatic speech recognition outputs is grammar projection. Another feature addresses situations where there is imperfect information or incomplete information by providing an application programming interface to enable structured queries and responses between an interpreter and an application.

NATURAL LANGUAGE INTERFACES

It is not trivial to implement speech and natural language processing in offline embedded systems. Voice control of devices in various settings and applications can benefit from an embedded speech and natural language processing solution. One feature that helps to correct automatic speech recognition outputs is grammar projection. Another feature addresses situations where there is imperfect information or incomplete information by providing an application programming interface to enable structured queries and responses between an interpreter and an application.

SYSTEM AND/OR METHOD FOR SEMANTIC PARSING OF AIR TRAFFIC CONTROL AUDIO
20230059866 · 2023-02-23 ·

The method S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance to text, determining commands from the text using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250. The method functions to automatically interpret flight commands from the air traffic control (ATC) stream.

SYSTEMS AND METHODS FOR GENERATING A DYNAMIC LIST OF HINT WORDS FOR AUTOMATED SPEECH RECOGNITION
20230030830 · 2023-02-02 ·

Systems and methods are provided for determining hint words that improve the accuracy of automated speech recognition (ASR) systems. Hint words are typically determined in the context of a user issuing voice commands in connection with a voice interface system, however, a voice interface system may capture terms from overheard content and/or conversations. A system may determine a sliding window of hint words using set of qualifier rules. The system may capture audio, e.g., from a conversation or played back content, as a first input and decipher a plurality of words including a qualifying first term added to the hint words. The voice interface system may capture more audio as a second input and decipher a second plurality of words including a qualifying second term. The first term may be removed from the set of hint words, e.g., when the second term is added or after an expiration time.

SPEECH RECOGNITION USING ON-THE-FLY-CONSTRAINED LANGUAGE MODEL PER UTTERANCE

Presented herein are techniques for augmenting a speech recognition engine. According to the disclosed techniques, audio data is obtained as part of an automatic speech recognition session. Speech hints are also obtained as part of the automatic speech recognition session. A dynamic language model is generated from the speech hints for use during the automatic speech recognition session. A combined language model is then generated from the dynamic language model and a static language model. Finally, the audio data is converted to text using the combined language model as part of the automatic speech recognition session.

Meeting-adapted language model for speech recognition

A system includes acquisition of meeting data associated with a meeting, determination of a plurality of meeting participants based on the acquired meeting data, acquisition of e-mail data associated with each of the plurality of meeting participants, generation of a meeting language model based on the acquired e-mail data and the meeting data, and transcription of audio associated with the meeting based on the meeting language model.