G10L2015/228

Guide voice output control system and guide voice output control method
11705119 · 2023-07-18 · ·

A guide voice output control system includes a voice output control unit having a function of outputting a guide voice in response to a trigger and a function of executing interaction related processing having a reception stage for receiving voice, a recognition stage for recognizing voice, and an output stage for outputting voice based on a recognition result, in which the voice output control unit controls the output of the guide voice according to the processing stage of the interaction related processing when the trigger is generated during the execution of the processing, and dynamically controls the output of the guide voice according to whether or not the processing stage is a stage that does not affect the accuracy of voice recognition or listening difficulty of a user even if the guide voice is output.

Language models using domain-specific model components
11557289 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Electronic apparatus, document displaying method thereof and non-transitory computer readable recording medium

The disclosure relates to an artificial intelligence (AI) system using a machine learning algorithm such as deep learning, and an application thereof. In particular, an electronic apparatus, a document displaying method thereof, and a non-transitory computer readable recording medium are provided. An electronic apparatus according to an embodiment of the disclosure includes a display unit displaying a document, a microphone receiving a user voice, and a processor configured to acquire at least one topic from contents included in a plurality of pages constituting the document, recognize a voice input through the microphone, match the recognized voice with one of the acquired at least one topic, and control the display unit to display a page including the matched topic.

DEVICE-DIRECTED UTTERANCE DETECTION

A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

ENHANCING SIGNATURE WORD DETECTION IN VOICE ASSISTANTS
20230223021 · 2023-07-13 ·

Systems and methods detecting a spoken sentence in a speech recognition system are disclosed herein. Speech data is buffered based on an audio signal captured at a computing device operating in an active mode. The speech data is buffered irrespective of whether the speech data comprises a signature word. The buffered speech data is processed to detect a presence of the sentence comprising at least one command and a query for the computing device. Processing the buffered speech data includes detecting the signature word in the buffered speech data, and in response to detecting the signature word in the speech data, initiating detection of the sentence in the buffered speech data.

Event-based speech interactive media player

Interactive content containing audio or video may be provided in conjunction with non-interactive content containing audio or video to enhance user engagement and interest with the contents and to increase the effectiveness of the distributed information. Interactive content may be directly inserted into the existing, non-interactive content. Additionally or alternatively, interactive content may be streamed in parallel to the existing content, with minimal modification to the existing content. For example, the server may monitor content from a content provider; detect an event (e.g., a marker embedded in the content stream, or in a data source external to the content stream); upon detecting the event, play interactive content at a designated time while silencing the content stream of the content provider (e.g., by muting, pausing, playing silence.) The marker may be a sub-audible tone or metadata associated with the content stream. The user may respond to the interactive content by voice.

Interactive method and device of robot, and device

Embodiments of the present disclosure provide an interactive method of a robot, an interactive device of a robot and a device. The method includes: obtaining voice information input by an interactive object, and performing semantic recognition on the voice information to obtain a conversation intention; obtaining feedback information corresponding to the conversation intention based on a conversation scenario knowledge base pre-configured by a simulated user; and converting the feedback information into a voice of the simulated user, and playing the voice to the interactive object.

Techniques for dialog processing using contextual data

Techniques are described for using data stored for a user in association with context levels to improve the efficiency and accuracy of dialog processing tasks. A dialog system stores historical dialog data in association with a plurality of configured context levels. The dialog system receives an utterance and identifies a term for disambiguation from the utterance. Based on a determined context level, the dialog system identifies relevant historical data stored to a database. The historical data may be used to perform tasks such as resolving an ambiguity based on user preferences, disambiguating named entities based on a prior dialog, and identifying previously generated answers to queries. Based on the context level, the dialog system can efficiently identify the relevant information and use the identified information to provide a response.

Electronic device and method for speech recognition of the same

An electronic device for recognizing a user's speech and a speech recognition method therefor are provided. The electronic device includes a microphone configured to receive a user's speech, a memory for storing speech recognition models, and at least one processor configured to select a speech recognition model from among the speech recognition models stored in the memory based on an operation state of the electronic device, and recognize the user's speech received by the microphone based on the selected speech recognition model.

CONTEXTUAL EDITABLE SPEECH RECOGNITION METHODS AND SYSTEMS

Methods and systems are provided for assisting operation of a vehicle using speech recognition. One method involves recognizing an audio input as an input voice command including a commanded value for an operational subject, automatically identifying an expected value for the operational subject that is different from the commanded value, providing a graphical representation of the input voice command on a graphical user interface (GUI) display and providing a selectable GUI element associated with the expected value for the operational subject on the GUI display. After selection of the selectable GUI element, a destination system associated with the vehicle is commanded to execute a command corresponding to the input voice command using the expected value for the operational subject.