Patent classifications
G10L2015/221
Speech recognition apparatus, vehicle including the same, and method of controlling the same
A speech recognition apparatus includes a voice collection unit configured to collect voices, and a controller configured to control recognizing a speech based on signals of the voices collected during a predetermined time period from a voice collection start time, identifying whether a signal is received during the predetermined time period when an operation command corresponding to the recognized speech is not identified, determining that an early utterance occurs upon determination that the signal is received during the predetermined time period, counting a number of times of the speech recognition failure occurring by the early utterance, re-performing speech recognition when the counted number of times is less than a reference number of times, and outputting early utterance habit guidance information when the counted number of times is the same as the reference number of times.
POS TERMINAL, PRODUCT INFORMATION REGISTRATION METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PRODUCT INFORMATION REGISTRATION PROGRAM
The present invention provides a POS terminal that can prevent errors in registering product information relating to a product for sale from occurring when sales processing is performed. The POS terminal is for managing sales information of products at a store. A voice recognition dictionary (5) includes product names of products for sale registered therein. When a product name included in a voice uttered by an operator matches one of the product names registered in the voice recognition dictionary (5), a voice recognition unit (3) outputs the product name included in the voice as a voice-recognized product name. An operator display unit (10) displays product buttons for inputting the product names. When a product name of a product button pressed by the operator matches the product name voice-recognized by the voice recognition unit (3), product information of a product having the matched product name is registered.
Multi-command single utterance input method
Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
Data-driven and rule-based speech recognition output enhancement
According to some embodiments, a multi-layer speech recognition transcript post processing system may include a data-driven, statistical layer associated with a trained automatic speech recognition model that selects an initial transcript. A rule-based layer may receive the initial transcript from the data-driven, statistical layer and execute at least one pre-determined rule to generate a first modified transcript. A machine learning approach layer may receive the first modified transcript from the rule-based layer and perform a neural model inference to create a second modified transcript. A human editor layer may receive the second modified transcript from the machine learning approach layer along with an adjustment from at least one human editor. The adjustment may create, in some embodiments, a final transcript that may be used to fine-tune the data-driven, statistical layer.
SELECTIVE DISPLAYING OF PUSH NOTIFICATIONS
In an approach for selectively displaying a push notification, audio is captured using a microphone. A processor receives a push notification, wherein the push notification includes information. A processor identifies a keyword associated with the push notification based on the information. A processor determines that the captured audio includes the keyword. A processor determines whether to display the push notification based on the determination of whether the captured audio includes the keyword.
SYSTEMS AND METHOD FOR PERFORMING SPEECH RECOGNITION
A system and method for performing speech recognition. A speech recognition engine includes a plurality of grammar paths each defining a recognized phrase. The grammar paths each have at least two nodes that are connected by a recognized word. An input device receives a user specified input that corresponds to the recognized word. A microphone receives a user phrase and a processor excludes grammar paths from the speech recognition engine based on an absence of the user specified input. The processor selects the recognized phrase from the non-excluded grammar paths based on the user phrase.
Method for controlling speech-recognition text-generation system and method for controlling mobile terminal
A method is provided for controlling a speech-recognition text-generation system that captures speech, and converts the captured speech into character strings through speech recognition. The method includes determining whether or not the character strings include a predetermined phrase, and specifying, in a case where the predetermined phrase is determined to be included, a character string associated with the predetermined phrase among the character strings as a first character string which is a deletion candidate. The method also includes displaying the first character string in a first display form on a display terminal and displaying a second character string, which is a character string other than the first character string, in a second display form on the display terminal.
Performing speech recognition over a network and using speech recognition results based on determining that a network connection exists
Systems, methods and apparatus for initiating communication. According to one implementation, a recording of a particular user speaking a name of a contact is obtained, a voice dialing command including an utterance of the name of the contact by the particular user is received, and in response to receiving the voice dialing command including the utterance of the name of the contact by the particular user, the recording of the particular user speaking the name of the contact or a text-to-speech audio output of the name is provided for output, and communication is initiated between the particular user and the contact.
Selecting alternates in speech recognition
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting alternates in speech recognition. In some implementations, data is received that indicates multiple speech recognition hypotheses for an utterance. Based on the multiple speech recognition hypotheses, multiple alternates for a particular portion of a transcription of the utterance are identified. For each of the identified alternates, one or more features scores are determined, the features scores are input to a trained classifier, and an output is received from the classifier. A subset of the identified alternates is selected, based on the classifier outputs, to provide for display. Data indicating the selected subset of the alternates is provided for display.
VOICE INPUT SUPPORT METHOD AND DEVICE
An information processing system includes circuitry configured to, acquire information identifying a plurality of voice commands associated with each of a plurality of screens to be displayed by a display, identify a first plurality of voice commands of the plurality of voice commands corresponding to a first screen, of the plurality of screens, currently displayed by the display, acquire first sound information captured by a microphone, compare the first sound information to first voice patterns associated with the first plurality of voice commands, and output a first result based on a first comparison between the first sound information to the first voice patterns.