Patent classifications
G10L2015/0631
OBFUSCATING TRAINING DATA
Examples disclosed herein involve obfuscating training data. An example method includes computing a sequence of acoustic features from audio data of training data, the training data comprising the audio data and a corresponding text transcript; mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding context from the text transcript; and providing a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system.
Method and device for user registration, and electronic device
Provided in embodiments of the present application are a method and apparatus for user registration and electronic device. The method includes: after obtaining a wake-up voice of a user each time, extracting and storing a first voiceprint feature corresponding to the wake-up voice; clustering the stored first voiceprint features to divide the stored first voiceprint features into at least one category, wherein, each of the at least one category includes at least one first voiceprint feature which belongs to the same user; assigning one category identifier to each category; storing each category identifier in correspondence to at least one first voiceprint feature corresponding to this category identifier to complete user registration. The embodiments of the present application can simplify the user operation and improve the user experience.
Methods and systems for predicting non-default actions against unstructured utterances
A method to adaptively predict non-default actions against unstructured utterances by an automated assistant operating in a computing-system is provided. The method includes extracting voice-features based on receiving an input utterance from at-least one speaker by an automatic speech recognition (ASR) device, identifying the input utterance as an unstructured utterance based on the extracted voice-features and a mapping between the input utterance with one or more default actions as drawn by the ASR, obtaining at least one probable action to be performed in response to the unstructured utterance through a dynamic bayesian network (DBN). The method further includes providing the at least one probable action obtained by the DBN to the speaker in an order of the posterior probability with respect to each action.
Gathering user's speech samples
Disclosed is gathering a user's speech samples. According to an embodiment of the disclosure, a method of gathering learning samples may gather a speaker's speech data obtained while talking on a mobile terminal and text data generated from the speech data and gather training data for generating a speech synthesis model. According to the disclosure, the method of gathering learning samples may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.
AUTOMATED GENERATION OF FINE-GRAINED CALL REASONS FROM CUSTOMER SERVICE CALL TRANSCRIPTS
Embodiments disclosed are directed to a computing system that performs steps to automatically generate fine-grained call reasons from customer service call transcripts. The computing system extracts, using a natural language processing (NLP) technique, a set of events from a set of text strings of speaker turns. The computing system then identifies a set of clusters of events based on the set of events and labels each cluster of events in the set of clusters of events to generate a set of labeled clusters of events. Subsequently, the computing system assigns each event in the set of events to a respective labeled cluster of events in the set of labeled clusters of events.
Information processing device and information processing method
The present technology relates to an information processing device and an information processing method that make it possible to generate interaction data with less cost. Provided is the information processing device including a processor that generates, on the basis of interaction history information, a coupling context to be coupled to a context of interest to be noticed among a plurality of contexts. This makes it possible to generate interaction data with less cost. The present technology is applicable as server-side service of a voice interaction system, for example.
Artificial intelligence apparatus and method for recognizing speech in consideration of utterance style
Disclosed herein an artificial intelligence apparatus for recognizing speech in consideration of an utterance style including a microphone, and a processor configured to obtain, via the microphone, speech data including speech of a user, extract an utterance feature vector from the obtained speech data, determine an utterance style corresponding to the speech based on the extracted utterance feature vector, and generate a speech recognition result using a speech recognition model corresponding to the determined utterance style.
AUTOMATIC CLASSIFICATION OF PHONE CALLS USING REPRESENTATION LEARNING BASED ON THE HIERARCHICAL PITMAN-YOR PROCESS
Embodiments of the disclosed technology include a representation learning model for classification of natural language text. In embodiments, a classification model comprises a feature model and a classifier. The feature model may be hierarchical in nature: data may pass through a series of representations, decreasing in specificity and increasing in generality. Intermediate levels of representation may then be used as automatically learned features to train a statistical classifier. Specifically, the feature model may be based on a hierarchical Pitman-Yor process. In embodiments, once the feature model has been expressed as a Bayesian Belief Network and some aspect of the feature model has been selected for prediction, the feature model may be attached to the classifier. In embodiments, after training, potentially using a mix of labeled and unlabeled data, the classification model can be used to classify documents such as call transcripts based on topics of conversation represented in the transcripts.
Onboard device, traveling state estimation method, server device, information processing method, and traveling state estimation system
An onboard device estimates a traveling state of a vehicle that may be influenced by the psychological state of a driver, based on an utterance of the driver without the use of various sensors, and includes: a voice collection unit for collecting a driver's voice; a traveling state collection unit for collecting traveling state information representing a traveling state of a vehicle; a database generation unit for generating a database by associating voice information corresponding to the collected voice with the collected traveling state information; a learning unit for learning an estimation model, with pairs including the voice information and the traveling state information recorded in the generated database being used as learning data; and an estimation unit for estimating the traveling state of the vehicle that may be influenced by a psychological state of the driver by using the estimation model, based on an utterance of the driver.
System and method for automating natural language understanding (NLU) in skill development
A method includes receiving, from an electronic device, information defining a user utterance associated with a skill to be performed, where the skill is not recognized by a natural language understanding (NLU) engine. The method also includes receiving, from the electronic device, information defining one or more actions for performing the skill. The method further includes identifying, using at least one processor, one or more known skills having one or more slots that map to at least one word or phrase in the user utterance. The method also includes creating, using the at least one processor, a plurality of additional utterances based on the one or more mapped slots. In addition, the method includes training, using the at least one processor, the NLU engine using the plurality of additional utterances.