G10L2015/228

MAN-MACHINE DIALOGUE MODE SWITCHING METHOD
20220399020 · 2022-12-15 · ·

The present disclosure discloses a man-machine dialogue mode switching method, which is applicable to an electronic device. The method includes receiving a current user sentence spoken by a current user; determining whether a dialogue field to which the current user sentence belongs is a preset dialogue field; if yes, switching the current dialogue mode to a full-duplex dialogue mode; and if not, switching the current dialogue mode to a half-duplex dialogue mode. In the present disclosure, the dialogue mode is switched by determining whether the dialogue field to which the current user sentence belongs is the preset dialogue field, and the dialogue mode can be automatically switched and adjusted according to the difference of the dialogue fields, such that the man-machine dialogue is always in the most suitable dialogue mode and can be realized smoothly.

PROVIDING HIGH QUALITY SPEECH RECOGNITION
20220399006 · 2022-12-15 ·

A computer-implemented method, system and computer program product for providing high quality speech recognition. A first speech-to-text model is selected to perform speech recognition of a customer's spoken words and a second speech-to-text model is selected to perform speech recognition of the agent's spoken words during a call. The combined results of the speech-to-text models used to process the customer's and agent's spoken words are then analyzed to generate a reference speech-to-text result. The customer speech data that was processed by the first speech-to-text model is reprocessed by multiple other speech-to-text models. A similarity analysis is performed on the results of these speech-to-text models with respect to the reference speech-to-text result resulting in similarity scores being assigned to these speech-to-text models. The speech-to-text model with the highest similarity score is then selected as the new speech-to-text model for performing speech recognition of the customer's spoken words during the call.

Focus session at a voice interface device
11527246 · 2022-12-13 · ·

A first electronic device of a local group of connected electronic devices receives a first voice command including a request for a first operation, assigns a first target device from among a local group of connected electronic devices as an in-focus device for performing the first operation, causes the first operation to be performed by the first target device via operation of a server-implemented common network service, receives a second voice command including a request for a second operation, and based on a determination that the second voice command does not include an explicit designation of a second target device and a determination that the second operation can be performed by the first target device, assigning the first target device as the in-focus device for performing the second operation.

Method and apparatus for generating hint words for automated speech recognition
11527234 · 2022-12-13 · ·

Systems and methods for determining hint words that improve the accuracy of automated speech recognition (ASR) systems. Hint words are determined in the context of a user issuing voice commands in connection with a voice interface system. Terms are initially taken from most frequently occurring terms in operation of a voice interface system. For example, most frequently occurring terms that arise in electronic search queries or received commands are selected. Certain of these terms are selected as hint words, and the selected hint words are then transmitted to an ASR system to assist in translation of speech to text.

COMMUNICATION MODE SELECTION BASED UPON USER CONTEXT FOR PRESCRIPTION PROCESSES

Methods and systems may incorporate voice interaction and other audio interaction to facilitate access to prescription related information and processes. Particularly, voice/audio interactions may be utilized to achieve authentication to access prescription-related information and action capabilities. Additionally, voice/audio interactions may be utilized in performance of processes such as obtaining prescription refills and receiving reminders to consume prescription products.

ERROR CORRECTION IN SPEECH RECOGNITION

Systems and methods for speech recognition correction include receiving a voice recognition input from an individual user and using a trained error correction model to add a new alternative result to a results list based on the received voice input processed by a voice recognition system. The error correction model is trained using contextual information corresponding to the individual user. The contextual information comprises a plurality of historical user correction logs, a plurality of personal class definitions, and an application context. A re-ranker re-ranks the results list with the new alternative result and a top result from the re-ranked results list is output.

Processing Voice Commands
20220392435 · 2022-12-08 ·

Recorded background noises, and other contextual data, may be used to assist in resolving ambiguity in spoken voice commands. The background noises may comprise sounds from entities in a room other than the user issuing the voice commands. One such entity may be a content item being watched by the user, and the captured background noises may comprise audio of the content item. The content item may be identified based on the captured audio of the content item in the background noises, and the identification may be used to interpret the ambiguous voice command. Additional contextual information associated with the voice commands (e.g., identifications of the users in the room) and/or the content item (e.g., the video quality of the content item, a service outputting the content item, a genre of the content item, etc.) may be used to identify the content item.

Collaborative voice controlled devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.

System and method for modifying speech recognition result

Provided are a system and method for modifying a speech recognition result. The method includes: receiving, from a device, text output from an automatic speech recognition (ASR) model of the device; identifying at least one domain related to the received text; selecting, from among a plurality of text modification models included in the server, at least one text modification model corresponding to the identified at least one domain; and modifying the received text by using the selected at least one text modification model.

ELECTRONIC CONTROL DEVICE FOR AN AVIONICS SYSTEM FOR IMPLEMENTING A CRITICAL AVIONICS FUNCTION, METHOD AND COMPUTER PROGRAM THEREFOR
20220380024 · 2022-12-01 ·

An electronic control device of an avionics system for implementation of a critical avionics function, comprising: a module for receiving a voice instruction signal; a speech recognition module configured to transform the voice signal into a textual transcript; a processing module configured to associate the textual transcript with at least one action to be performed; a monitoring system comprising: a control module configured to check whether the textual transcript and/or the action to be performed is consistent if and only if: a) the textual transcript and/or the action to be performed is consistent with the expected syntax, b) the textual transcript and/or the action to be performed is consistent with the expected lexical field, and c) the textual transcript and/or the action to be performed is consistent with the current context, a module for generating an associated command only if no inconsistencies are detected.