Patent classifications
G10L15/197
SPEECH RECOGNITION APPARATUS, METHOD AND PROGRAM
A score integration unit 7 obtains a new score Score (l.sub.1:n.sup.b, c) that integrates a score Score (l.sub.1:n.sup.b, c) and a score Score (w.sub.1:o.sup.b, c). This new score Score (l.sub.1:n.sup.b, c) becomes a score Score (l.sub.1:n.sup.b) in a hypothesis selection unit 8. Thus, the score Score (l.sub.1:n.sup.b) can be said to take into account the score Score (w.sub.1:o.sup.b, c). In a speech recognition apparatus, first information is extracted on the basis of the score Score (l.sub.1:n.sup.b) taking into account the score Score (w.sub.1:o.sup.b, c). Thus, speech recognition with higher performance than that in the related art can be achieved.
SPEECH RECOGNITION APPARATUS, METHOD AND PROGRAM
A score integration unit 7 obtains a new score Score (l.sub.1:n.sup.b, c) that integrates a score Score (l.sub.1:n.sup.b, c) and a score Score (w.sub.1:o.sup.b, c). This new score Score (l.sub.1:n.sup.b, c) becomes a score Score (l.sub.1:n.sup.b) in a hypothesis selection unit 8. Thus, the score Score (l.sub.1:n.sup.b) can be said to take into account the score Score (w.sub.1:o.sup.b, c). In a speech recognition apparatus, first information is extracted on the basis of the score Score (l.sub.1:n.sup.b) taking into account the score Score (w.sub.1:o.sup.b, c). Thus, speech recognition with higher performance than that in the related art can be achieved.
Method for training speech recognition model, method and system for speech recognition
Disclosed are a method for training speech recognition model, a method and a system for speech recognition. The disclosure relates to field of speech recognition and includes: inputting an audio training sample into the acoustic encoder to represent acoustic features of the audio training sample in an encoded way and determine an acoustic encoded state vector; inputting a preset vocabulary into the language predictor to determine text prediction vector; inputting the text prediction vector into the text mapping layer to obtain a text output probability distribution; calculating a first loss function according to a target text sequence corresponding to the audio training sample and the text output probability distribution; inputting the text prediction vector and the acoustic encoded state vector into the joint network to calculate a second loss function, and performing iterative optimization according to the first loss function and the second loss function.
Method for training speech recognition model, method and system for speech recognition
Disclosed are a method for training speech recognition model, a method and a system for speech recognition. The disclosure relates to field of speech recognition and includes: inputting an audio training sample into the acoustic encoder to represent acoustic features of the audio training sample in an encoded way and determine an acoustic encoded state vector; inputting a preset vocabulary into the language predictor to determine text prediction vector; inputting the text prediction vector into the text mapping layer to obtain a text output probability distribution; calculating a first loss function according to a target text sequence corresponding to the audio training sample and the text output probability distribution; inputting the text prediction vector and the acoustic encoded state vector into the joint network to calculate a second loss function, and performing iterative optimization according to the first loss function and the second loss function.
Real-time anomaly determination using integrated probabilistic system
An audio stream is detected during a communication session with a user. Natural language processing on the audio stream is performed to update a set of attributes by supplementing the set of attributes based on attributes derived from the audio stream. A set of filter values is updated based on the updated set of attributes. The updated set of filter values is used to query a set of databases to obtain datasets. A probabilistic program is executed during the communication session by determining a set of probability parameters characterizing a probability of an anomaly occurring based on the datasets and the set of attributes. A determination is made if whether the probability satisfies a threshold. In response to a determination that the probability satisfies the threshold, a record is updated to identify the communication session to indicate that the threshold is satisfied.
Real-time anomaly determination using integrated probabilistic system
An audio stream is detected during a communication session with a user. Natural language processing on the audio stream is performed to update a set of attributes by supplementing the set of attributes based on attributes derived from the audio stream. A set of filter values is updated based on the updated set of attributes. The updated set of filter values is used to query a set of databases to obtain datasets. A probabilistic program is executed during the communication session by determining a set of probability parameters characterizing a probability of an anomaly occurring based on the datasets and the set of attributes. A determination is made if whether the probability satisfies a threshold. In response to a determination that the probability satisfies the threshold, a record is updated to identify the communication session to indicate that the threshold is satisfied.
Attention-based joint acoustic and text on-device end-to-end model
A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
Attention-based joint acoustic and text on-device end-to-end model
A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
Electronic device and method for providing conversational service
A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.
Electronic device and method for providing conversational service
A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.