G10L2015/081

INFORMATION PROCESSING APPARATUS AND DESTINATION SEARCH METHOD
20210358485 · 2021-11-18 · ·

An information processing apparatus is connected to a voice processing server that analyzes text data transmitted from a voice input/output apparatus that converts an instruction by an utterance of a user to the text data and outputs the text data, and outputs an instruction obtained by analysis and utterance language information indicating a language of the utterance, and the information processing apparatus includes: a communicator that communicates with the voice processing server; a destination searcher that determines on the basis of the utterance language information whether to include a space character in a target of the search, and searches for a name indicated in a search character string from a destination list on the basis of a result of the determination; and a hardware processor that performs control to transmit a search result of a destination by the destination searcher to the voice processing server via the communicator.

Speech recognition method and apparatus, and storage medium

A speech recognition method is provided. The method includes: obtaining a voice signal; processing the voice signal according to a speech recognition algorithm to obtain n candidate recognition results, the candidate recognition results including text information corresponding to the voice signal; identifying a target result from among the n candidate recognition results according to a selection rule selected from among m selection rules, the selection rule having an execution sequence of j, the target result being a candidate recognition result that has a highest matching degree with the voice signal in the n candidate recognition results, an initial value of j being 1; and identifying the target result from among the n candidate recognition results according to a selection rule having an execution sequence of j+1 based on the target result not being identified according to the selection rule having the execution sequence of j.

EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEAKER IDENTIFICATION

In some examples, a computing system includes a storage device configured to store a machine learning model trained with audio feature values to determine a reliability of an audio segment for performing speech processing; and processing circuitry. The processing circuitry is configured to: receive an audio dataset comprising a sequence of audio segments; extract, for each audio segment of the sequence of audio segments, a set of audio feature values corresponding to a set of audio features; execute the machine learning model to determine, for each audio segment of the sequence of audio segments, a reliability score based on the set of audio feature values corresponding to the respective audio segment, wherein the reliability score indicates a reliability of the audio segment for performing speech processing; and output an indication of the respective reliability scores determined for at least one audio segment of the sequence of audio segments.

DEEP LEARNING INTERNAL STATE INDEX-BASED SEARCH AND CLASSIFICATION
20230317062 · 2023-10-05 ·

Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.

Information output system and information output method
11657806 · 2023-05-23 · ·

An information output system includes a speech acquisition unit configured to acquire a speech of a user, a recognition processing unit configured to recognize the content of the acquired speech of the user, and an output processing unit configured to output a question to the user and to perform processing for outputting a response to the content of the speech of the user who has answered the question. The output processing unit is configured to derive a user's positive degree based on the content of the speech of the user who has answered the question and to determine guidance information to be output to the user based on the derived positive degree.

Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation
20220262352 · 2022-08-18 ·

The present disclosure provides methods and apparatus for optimizing a keyword spotting system. A set of utterance texts including a given keyword may be generated. A set of speech signals corresponding to the set of utterance texts may be synthesized. An acoustic model in the keyword spotting system may be optimized with at least a part of speech signals in the set of speech signals and utterance texts in the set of utterance texts corresponding to the at least a part of speech signals.

Adaptively Modifying Dialog Output by an Artificial Intelligence Engine During a Conversation with a Customer

In some examples, a server may receive an utterance from a customer. The utterance may be included in a conversation between the artificial intelligence engine and the customer. The server may convert the utterance to text and determine a customer intent based on the text and a user history. The server may determine a user model of the customer based on the text and the customer intent. The server may update a conversation state associated with the conversation based on the customer intent and the user model. The server may determine a user state based on the user model and the conversation state. The server may select, using a reinforcement learning based module, a particular action from a set of actions, the particular action including a response and provide the response to the customer.

SPEECH RECOGNITION METHOD, SYSTEM AND STORAGE MEDIUM
20210249019 · 2021-08-12 ·

Provided are a speech recognition method and system, and a storage medium. The speech recognition method includes: receiving a feature vector and a decoding map sent by a CPU, wherein the feature vector is extracted from a speech signal, and the decoding map is pre-trained; recognizing the feature vector according to a pre-trained acoustic model to obtain a probability matrix; decoding the probability matrix according to the decoding map using a parallel mechanism to obtain text sequence information; and sending the text sequence information to the CPU.

Deliberation Model-Based Two-Pass End-To-End Speech Recognition

A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis

INFORMATION PROCESSING DEVICE AND SETTING DEVICE
20210149938 · 2021-05-20 ·

A control section includes: an identifying section configured to, by referring to one or more search keywords set for one or more control targets, identify at least one search keyword from among the one or more search keywords, the at least one search keyword matching any of one or more main words contained in input data acquired through voice input; and a selecting section configured to select at least one control target from among the one or more control targets based on one or more numeric values obtained through calculation of one or more expressions each of which batch-converts, into numerical form, one or more of the one or more search keywords set for the one or more control targets.