G10L2015/081

METHODS AND APPARATUS FOR LEVERAGING MACHINE LEARNING FOR GENERATING RESPONSES IN AN INTERACTIVE RESPONSE SYSTEM

Apparatus and methods for leveraging machine learning and artificial intelligence to generate a response to an utterance expressed by a user during an interaction between an interactive response system and the user is provided. The methods may include a natural language processor processing the utterance to output an utterance intent. The methods may also include a signal extractor processing the utterance, the utterance intent and previous utterance data to output utterance signals. The methods may additionally include an utterance sentiment classifier using a hierarchy of rules to extract, from a database, a label, the extracting being based on the utterance signals. The methods may further include a sequential neural network classifier using a trained algorithm to process the label and a sequence of historical labels to output a sentiment score. The methods may further include, based on the utterance intent, the label and the score, to output a response.

Concurrent segmentation of multiple similar vocalizations

Various implementations disclosed herein include a training module configured to concurrently segment a plurality of vocalization instances of a voiced sound pattern (VSP) as vocalized by a particular speaker, who is identifiable by a corresponding set of vocal characteristics. Aspects of various implementations are used to determine a concurrent segmentation of multiple similar instances of a VSP using a modified hierarchical agglomerative clustering (HAC) process adapted to jointly and simultaneously segment multiple similar instances of the VSP. Information produced from multiple instances of a VSP vocalized by a particular speaker characterize how the particular speaker vocalizes the VSP and how those vocalizations may vary between instances. In turn, in some implementations, the information produced using the modified HAC process is sufficient to determine more a reliable detection (and/or matching) threshold metric(s) for detecting and matching the VSP as vocalized by the particular speaker.

Decoder for searching a digraph and generating a lattice, decoding method, and computer program product
09786272 · 2017-10-10 · ·

According to an embodiment, a decoder includes a token operating unit, a node adder, and a connection detector. The token operating unit is configured to, every time a signal or a feature is input, propagate each of a plurality of tokens, which is an object assigned with a state of the of a path being searched, according to a digraph until a state or a transition assigned with a non-empty input symbol is reached. The node adder is configured to, in each instance of token propagating, add, in a lattice, a node corresponding to a state assigned to each of the plurality of tokens. The connection detector is configured to refer to the digraph and detect a node that is connected to a node added in an i-th instance in the lattice and that is added in an i+1-th instance in the lattice.

SPEECH RECOGNITION METHOD, APPARATUS, ELECTRONIC DEVICE AND COMPUTER READABLE STORAGE MEDIUM

A speech recognition method, an apparatus, an electronic device, and a computer-readable storage medium are provided. The method includes acquiring a first speech recognition result of a speech; acquiring context information and pronunciation feature information about a target text unit in the first speech recognition result; and acquiring a second speech recognition result of the speech based on the context information and the pronunciation feature information.

INFORMATION OUTPUT SYSTEM AND INFORMATION OUTPUT METHOD
20220036888 · 2022-02-03 · ·

An information output system includes a speech acquisition unit configured to acquire a speech of a user, a recognition processing unit configured to recognize the content of the acquired speech of the user, and an output processing unit configured to output a question to the user and to perform processing for outputting a response to the content of the speech of the user who has answered the question. The output processing unit is configured to derive a user's positive degree based on the content of the speech of the user who has answered the question and to determine guidance information to be output to the user based on the derived positive degree.

End-to-end neural networks for speech recognition and classification
11367433 · 2022-06-21 · ·

Systems and methods are disclosed for end-to-end neural networks for speech recognition and classification and additional machine learning techniques that may be used in conjunction or separately. Some embodiments comprise multiple neural networks, directly connected to each other to form an end-to-end neural network. One embodiment comprises a convolutional network, a first fully-connected network, a recurrent network, a second fully-connected network, and an output network. Some embodiments are related to generating speech transcriptions, and some embodiments relate to classifying speech into a number of classifications.

ABSTRACT GENERATION DEVICE, METHOD, PROGRAM, AND RECORDING MEDIUM

A speech recognition unit (12) converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit (14) generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit (16) generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and the summary generating unit generates a high-quality summary having less speech recognition errors and low redundancy using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on the length of a summary to be generated.

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

Generally discussed herein are devices, systems, and methods for generating a phrase that is confusing to a language classifier. A method can include determining, by the LC, a first classification score (CS) of a prompt indicating whether the prompt is a first class or a second class, predicting, based on the prompt and by a pre-trained language model (PLM), likely next words and a corresponding probability for each of the likely next words, determining, by the LC, a second CS for each of the likely next words, determining, by an adversarial classifier, respective scores for each of the likely next words, the respective scores determined based on the first CS of the prompt, the second CS of the likely next words, and the probabilities of the likely next words, and selecting, by an adversarial classifier, a next word of the likely next words based on the respective scores.

VEHICLE AND CONTROL METHOD THEREOF
20230260511 · 2023-08-17 · ·

A vehicle and a control method thereof includes a microphone to which a speech command of a user is input; a communication module configured to receive contact data and contact history data from a mobile device; at least one memory configured to store a first speech recognition database, obtained based on the contact history data received from the mobile device, and a second speech recognition database obtained based on the contact data received from the mobile device; and at least one processor configured to, when a speech command for calling or texting is input to the microphone, determine a final recipient or generate a recipient candidate list, based on recipient information included in the speech command, the stored first speech recognition database, and the stored second speech recognition database.

Information processing apparatus and destination search method
11769494 · 2023-09-26 · ·

An information processing apparatus is connected to a voice processing server that analyzes text data transmitted from a voice input/output apparatus that converts an instruction by an utterance of a user to the text data and outputs the text data, and outputs an instruction obtained by analysis and utterance language information indicating a language of the utterance, and the information processing apparatus includes: a communicator that communicates with the voice processing server; a destination searcher that determines on the basis of the utterance language information whether to include a space character in a target of the search, and searches for a name indicated in a search character string from a destination list on the basis of a result of the determination; and a hardware processor that performs control to transmit a search result of a destination by the destination searcher to the voice processing server via the communicator.