Patent classifications
G10L15/05
Speech endpointing based on word comparisons
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
Extracting content from speech prosody
A prosodic speech recognition engine configured to identify prosodic features and patterns in a speech continuum for the extraction of linguistic content including para-syntactic content, discourse function, information structure, meaning, and speaker sentiment.
Extracting content from speech prosody
A prosodic speech recognition engine configured to identify prosodic features and patterns in a speech continuum for the extraction of linguistic content including para-syntactic content, discourse function, information structure, meaning, and speaker sentiment.
Semantic recognition method and semantic recognition device
A semantic recognition method and a semantic recognition device are provided. A spectrogram of a speech signal is generated. At least one keyword of the spectrogram is detected by inputting the spectrogram into a neural network model. A semantic category to which each of the at least one keyword belongs is distinguished. A semantic intention of the speech signal is determined according to the at least one keyword and the semantic category of the at least one keyword.
Semantic recognition method and semantic recognition device
A semantic recognition method and a semantic recognition device are provided. A spectrogram of a speech signal is generated. At least one keyword of the spectrogram is detected by inputting the spectrogram into a neural network model. A semantic category to which each of the at least one keyword belongs is distinguished. A semantic intention of the speech signal is determined according to the at least one keyword and the semantic category of the at least one keyword.
INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD
An information processing apparatus according to the present disclosure includes an acquisition unit that acquires inspiration information indicating inspiration of a user, and a prediction unit that predicts whether or not the user utters after the inspiration of the user on the basis of the inspiration information acquired by the acquisition unit.
SYSTEMS AND METHODS FOR UNSUPERVISED STRUCTURE EXTRACTION IN TASK-ORIENTED DIALOGUES
Embodiments described herein propose an approach for unsupervised structure extraction in task-oriented dialogues. Specifically, a Slot Boundary Detection (SBD) module is adopted, for which utterances from training domains are tagged with the conventional BIO schema but without the slot names. A transformer-based classifier is trained to detect the boundary of potential slot tokens in the test domain. Next, while the state number is usually unknown, it is more reasonable to assume the slot number is given when analyzing a dialogue system. The detected tokens are clustered into the number of slot of groups. Finally, the dialogue state is represented with a vector recording the modification times of every slot. The slot values are then tracked through each dialogue session in the corpus and label utterances with their dialogue states accordingly. The semantic structure is portrayed by computing the transition frequencies among the unique states.
SYSTEMS AND METHODS FOR UNSUPERVISED STRUCTURE EXTRACTION IN TASK-ORIENTED DIALOGUES
Embodiments described herein propose an approach for unsupervised structure extraction in task-oriented dialogues. Specifically, a Slot Boundary Detection (SBD) module is adopted, for which utterances from training domains are tagged with the conventional BIO schema but without the slot names. A transformer-based classifier is trained to detect the boundary of potential slot tokens in the test domain. Next, while the state number is usually unknown, it is more reasonable to assume the slot number is given when analyzing a dialogue system. The detected tokens are clustered into the number of slot of groups. Finally, the dialogue state is represented with a vector recording the modification times of every slot. The slot values are then tracked through each dialogue session in the corpus and label utterances with their dialogue states accordingly. The semantic structure is portrayed by computing the transition frequencies among the unique states.
COMMUNICATION DATA LOG PROCESSING APPARATUS, COMMUNICATION DATA LOG PROCESSING METHOD, AND STORAGE MEDIUM STORING PROGRAM
According to one embodiment, a communication data log processing apparatus includes a processor including hardware. The processor receives communication data contained in a communication data log as a log of the communication data containing a speech sentence and meta information. The processor determines a section to which the received communication data should belong based on the speech sentence and the meta information.
COMMUNICATION DATA LOG PROCESSING APPARATUS, COMMUNICATION DATA LOG PROCESSING METHOD, AND STORAGE MEDIUM STORING PROGRAM
According to one embodiment, a communication data log processing apparatus includes a processor including hardware. The processor receives communication data contained in a communication data log as a log of the communication data containing a speech sentence and meta information. The processor determines a section to which the received communication data should belong based on the speech sentence and the meta information.