Patent classifications
G10L15/144
Audio segmentation method based on attention mechanism
An audio segmentation method based on an attention mechanism is provided. The audio segmentation method according to an embodiment obtains a mapping relationship between an “inputted text” and an “audio spectrum feature vector for generating an audio signal”, the audio spectrum feature vector being automatically synthesized by using the inputted text, and segments an inputted audio signal by using the mapping relationship. Accordingly, high quality can be guaranteed and the effort, time, and cost can be noticeably reduced through audio segmentation utilizing the attention mechanism.
Identifying representative conversations using a state model
A plurality of conversations may be processed to obtain one or more representative conversations to allow a better understanding of the plurality of conversations. A representative conversation may be determined by representing each conversation as a sequence of states where a state may represent messages with similar meanings. Distances may be computed between pairs of conversations, and the conversations may be clustered using the distances. To obtain a representative conversation for a cluster of conversations, a representative sequence of states may be obtained for the cluster and a representative message may be obtained for each state of the sequence of states. The representative conversation may then be presented to a user.
LEARNING DEVICE, LEARNING METHOD AND LEARNING PROGRAM
A learning device (10) includes a feature extracting unit (11) that extracts features of speech from speech data for training, a probability calculating unit (12) that, on the basis of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings, an error calculating unit (13) that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire model that minimizes an expected value of summation of loss in the word error rates, and an updating unit (14) that updates a parameter of the model in accordance with the parameter obtained by the error calculating unit (13).
Mobile terminal and method of operating the same
A terminal includes a memory configured to store voice data and a processor configured to measure reliability of learnable data stored in the memory, to classify the learnable data into learning data or adaptive data according to the measured reliability, to generate a learning model by performing unsupervised learning with respect to the learning data, to generate an adaptive model using the adaptive data, and to evaluate recognition performance of each of the learning model and the adaptive model.
Transforming audio content into images
A technique is described herein for transforming audio content into images. The technique may include: receiving the audio content from a source; converting the audio content into a temporal stream of audio features; and converting the stream of audio features into one or more images using one or more machine-trained models. The technique generates the image(s) based on recognition of: semantic information that conveys one or more semantic topics associated with the audio content; and sentiment information that conveys one or more sentiments associated with the audio content. The technique then generates an output presentation that includes the image(s), which it provides to one or more display devices for display thereat. The output presentation serves as a summary of salient semantic and sentiment-related characteristics of the audio content.
Method and apparatus for optimizing model applicable to pattern recognition, and terminal device
Embodiments of the present invention relate to a method and an apparatus for optimizing a model applicable to pattern recognition, and a terminal device. The terminal device receives a universal model delivered by a server, where the universal model includes an original feature parameter; recognizes target information by using the universal model, and collects a plurality of local samples; when a model optimization condition is met, corrects the original feature parameter by using a first training algorithm to obtain a new feature parameter; and optimizes the universal model according to a second training algorithm and the new feature parameter, to obtain an optimized universal model. That is, in the present invention, the terminal device further optimizes, according to the collected local samples, the universal model received from the server to obtain a relatively personalized model applicable to pattern recognition.
Methods and apparatus for learning sensor data patterns of physical-training activities
Methods and systems for learning, recognition, classification and analysis of real-world cyclic patterns using a model having n oscillators, with primary frequency .sub.1, .sub.2, . . . , .sub.n. The state of the oscillators is evolved over time using sensor observations, which are also used to determine the sensor characteristics, or the sensor observation functions. Once trained, a set of activity detection filters may be used to classify a sensor data stream as being associated with an activity.
MODELING METHOD FOR SPEECH RECOGNITION, APPARATUS AND DEVICE
The present disclosure provides a modeling method for speech recognition and a device. The method includes: determining N types of tags; training a neural network according to speech data of Mandarin to generate a recognition model whose outputs are the N types of tags; inputting speech data of each dialect into the recognition model to obtain an output tag of each frame of the speech data of each dialect; determining, according to the output tags and tagged true tags, error rates of the N types of tags for the each dialect, generating M types of target tags according to tags with error rates greater than a preset threshold; and training an acoustic model according to third speech data of Mandarin and third speech data of the P dialects, outputs of the acoustic model being the N types of tags and the M types of target tags corresponding to each dialect.
METHOD AND APPARATUS FOR RECOGNITION OF SOUND EVENTS BASED ON CONVOLUTIONAL NEURAL NETWORK
Provided is a sound event recognition method that may improve a sound event recognition performance using a correlation between difference sound signal feature parameters based on a neural network, in detail, that may extract a sound signal feature parameter from a sound signal including a sound event, and recognize the sound event included in the sound signal by applying a convolutional neural network (CNN) trained using the sound signal feature parameter.
SYSTEM AND METHOD FOR KEY PHRASE SPOTTING
A method for key phrase spotting may comprise: obtaining an audio; obtaining a plurality of candidate words corresponding to a plurality of the audio portions and obtaining a first probability score for each corresponding relationship between the obtained candidate word and the audio portion; determining if the plurality of candidate words respectively match a plurality of key words of a key phrase and if the first probability score of each of the plurality of candidate words exceeds a corresponding first threshold, the plurality of candidate words constituting a candidate phrase; and in response to determining the plurality of candidate words matching the plurality of key words and the each first probability score exceeding the corresponding threshold, obtaining a second probability score representing a matching relationship between the candidate phrase and the key phrase based on the first probability score of each of the plurality of candidate words.