G10L15/083

SYSTEMS AND METHODS FOR FAST FILTERING OF AUDIO KEYWORD SEARCH
20220020361 · 2022-01-20 · ·

An audio keyword searcher arranged to identify a voice segment of a received audio signal; identify, by an automatic speech recognition engine, one or more phonemes included in the voice segment; output, from the automatic speech recognition engine, the one or more phonemes to a keyword filter to detect whether the voice segment includes any of the one or more first keywords of the first keyword list and, if detected, output the one or more phonemes included in the voice segment to a decoder but, if not detected, not output the one or more phonemes included in the voice segment to the decoder. If the one or more phonemes are output to the decoder: generate a word lattice associated with the voice segment; search the word lattice for one or more second keywords, and determine whether the voice segment includes the one or more second keywords.

Printing system and control method
11226779 · 2022-01-18 · ·

A relay apparatus transmits information corresponding to a registered photograph to the print management apparatus based on an event being specified when the first option is selected, wherein the event that is specified is registration of the photograph in the Web service, and transmits an instruction to print the game contents to the print management apparatus when the second option is selected and a speech instruction in a predetermined phrase spoken toward the speech recognition terminal is specified.

INTERFACING WITH APPLICATIONS VIA DYNAMICALLY UPDATING NATURAL LANGUAGE PROCESSING
20210358489 · 2021-11-18 ·

Dynamic interfacing with applications is provided. For example, a system receives a first input audio signal. The system processes, via a natural language processing technique, the first input audio signal to identify an application. The system activates the application for execution on the client computing device. The application declares a function the application is configured to perform. The system modifies the natural language processing technique responsive to the function declared by the application. The system receives a second input audio signal. The system processes, via the modified natural language processing technique, the second input audio signal to detect one or more parameters. The system determines that the one or more parameters are compatible for input into an input field of the application. The system generates an action data structure for the application. The system inputs the action data structure into the application, which executes the action data structure.

INTERPRETATION SYSTEM, SERVER APPARATUS, DISTRIBUTION METHOD, AND STORAGE MEDIUM
20210358475 · 2021-11-18 ·

In order to solve a conventional problem that there is no interpretation system realized by a server apparatus and one or more terminal apparatuses, and configured to distribute, to one or more users, one or more pieces of interpreted speech obtained from speech of one speaker through interpretation performed by one or more interpreters, wherein the server apparatus properly manages information regarding languages of one or more interpreters, a server apparatus includes an interpreter information group storage unit in which an interpreter information group, which is a group of one or more pieces of interpreter information, is stored, the information being information regarding an interpreter who interprets speech in a first language to a second language, and having a first language identifier for identifying the first language, a second language identifier for identifying the second language, and an interpreter identifier for identifying the interpreter. Accordingly, it is possible to properly manage information regarding languages of one or more interpreters.

Speech keyword recognition method and apparatus, computer-readable storage medium, and computer device

A speech keyword recognition method includes: obtaining first speech segments based on a to-be-recognized speech signal; obtaining first probabilities respectively corresponding to the first speech segments by using a preset first classification model. A first probability of a first speech segment is obtained from probabilities of the first speech segment respectively corresponding to pre-determined word segmentation units of a pre-determined keyword. The method also includes obtaining second speech segments based on the to-be-recognized speech signal, and respectively generating first prediction characteristics of the second speech segments based on first probabilities of first speech segments that correspond to each second speech segment; performing classification based on the first prediction characteristics by using a preset second classification model, to obtain second probabilities respectively corresponding to the second speech segments related to the pre-determined keyword; and determining, based on the second probabilities, whether the pre-determined keyword exists in the to-be-recognized speech signal.

Bone conduction transducers for privacy

A method for routing audio content through an electronic device that is to be worn by a user. The method obtains a communication and determines whether the communication is private. In response to determining that the communication is private, the method drives a bone conduction transducer of the electronic device with an audio signal associated with the communication. In response to determining that the communication is not private, however, the method drives a speaker of the electronic device with the audio signal.

Performing subtask(s) for a predicted action in response to a separate user interaction with an automated assistant prior to performance of the predicted action

Implementations herein relate to pre-caching data, corresponding to predicted interactions between a user and an automated assistant, using data characterizing previous interactions between the user and the automated assistant. An interaction can be predicted based on details of a current interaction between the user and an automated assistant. One or more predicted interactions can be initialized, and/or any corresponding data pre-cached, prior to the user commanding the automated assistant in furtherance of the predicted interaction. Interaction predictions can be generated using a user-parameterized machine learning model, which can be used when processing input(s) that characterize a recent user interaction with the automated assistant. Should the user command the automated assistant in a way that is aligned with a pre-cached, predicted interaction, the automated assistant will exhibit instant fulfillment of the command, thereby eliminating any latency that the user would have otherwise experienced interacting with the automated assistant.

SYSTEM AND METHOD FOR PROVIDING VOICE ASSISTANCE SERVICE
20210350797 · 2021-11-11 ·

An artificial intelligence (AI) system using a machine learning algorithm such as deep learning, and an application thereof are provided. A method of providing, by a device, a voice assistance service includes obtaining a voice input of a user, receiving certain context information from at least one peripheral device, generating first query information from the received context information and the voice input, generating second query information including noise information by inputting the first query information into a noise learning model, transmitting the generated second query information to a server, receiving, from the server, response information obtained based on the transmitted second query information, generating a response message by removing response information corresponding to the noise information from the received response information, and outputting the response message.

KEYWORD DETECTION APPARATUS, KEYWORD DETECTION METHOD, AND PROGRAM

A keyword is extracted robustly despite a voice recognition result including an error. A model storage unit 10 stores a keyword extraction model that accepts word vector representations of a plurality of words as an input and extracts and outputs a word vector representation of a word to be extracted as a keyword. A speech detection unit 11 detects a speech part from a voice signal. A voice recognition unit 12 executes voice recognition on the speech part of the voice signal and outputs a confusion network which is a voice recognition result. A word vector representation generating unit 13 generates a word vector representation including reliability of voice recognition with regard to each candidate word for each confusion set. A keyword extraction unit 14 inputs the word vector representation of the candidate word to the keyword extraction model in descending order of the reliability and obtains the word vector representation of the keyword.

METHOD FOR RECOGNIZING A SLOT, AND ELECTRONIC DEVICE

A method for recognizing a slot, and an electronic device are provided. The technical solution includes: determining each first word in an input sentence and a part of speech each first word; combining each first word in the input sentence based on the parts of speech first words in the input sentence based on the part of speech of each first word to obtain one or more candidate slot segments included in the input sentence; determining a matching degree between each first word in the one or more candidate slot segments and each second word in each reference slot of the slot library; and determining a target slot in the one or more candidate slot segments and a slot name of the target slot based on the matching degree.