G10L2015/0631

CHARACTERIZING, SELECTING AND ADAPTING AUDIO AND ACOUSTIC TRAINING DATA FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
20170278527 · 2017-09-28 ·

A system for and method of characterizing a target application acoustic domain analyzes one or more speech data samples from the target application acoustic domain to determine one or more target acoustic characteristics, including a CODEC type and bit-rate associated with the speech data samples. The determined target acoustic characteristics may also include other aspects of the target speech data samples such as sampling frequency, active bandwidth, noise level, reverberation level, clipping level, and speaking rate. The determined target acoustic characteristics are stored in a memory as a target acoustic data profile. The data profile may be used to select and/or modify one or more out of domain speech samples based on the one or more target acoustic characteristics.

Voice wake-up detection from syllable and frequency characteristic

A voice wake-up apparatus used in an electronic device that includes a voice activity detection circuit, a storage circuit and a smart detection circuit is provided. The voice activity detection circuit receives an input sound signal and detects a voice activity section of the input sound signal. The storage circuit stores a predetermined voice sample. The smart detection circuit receives the input sound signal to perform a time domain and a frequency domain detection on the voice activity section to generate a syllable and frequency characteristic detection result, compare the syllable and frequency characteristic detection result with the predetermined voice sample and generate a wake-up signal to a processing circuit of the electronic device when the syllable and frequency characteristic detection result matches the predetermined voice sample to wake up the processing circuit.

Generation of trigger recognition models for robot
11250852 · 2022-02-15 · ·

Provided are a trigger recognition model generating method for a robot and a robot to which the method is applied. A trigger recognition model generating method comprises obtaining an input text which expresses a voice trigger, obtaining a first set of voice triggers by voice synthesis from the input text, obtaining a second set of voice triggers by applying a first filter in accordance with an environmental factor to the first set of voice triggers, obtaining a third set of voice triggers by applying a second filter in accordance with a mechanism characteristic of the robot to the second set of voice triggers, and applying the first, second, and third sets of voice triggers to the trigger recognition model as learning data for the voice trigger. By doing this, a trigger recognition model which is capable of recognizing a new trigger is generated.

APPARATUS AND METHOD FOR TRAINING A NEUTRAL NETWORK ACOUSTIC MODEL, AND SPEECH RECOGNITION APPARATUS AND METHOD

According to one embodiment, an apparatus for training a neural network acoustic model includes a calculating unit, a clustering unit, and a sharing unit. The calculating unit calculates, based on training data including a training speech and a labeled phoneme state, scores of phoneme states different from the labeled phoneme state. The clustering unit clusters a phoneme state whose score is larger than a predetermined threshold and the labeled phoneme state. he sharing unit shares probability of the labeled phoneme state by the clustered phoneme states. The training unit trains the neural network acoustic model based on the training speech and the clustered phoneme states.

Structured conversation enhancement

Structured conversation enhancement can include determining an anticipated ebb point of a current conversation. The determination can be made in response to a predetermined triggering event indicating a start of the current conversation. Structured conversation enhancement also can include monitoring the current conversation using pattern recognition. A probable change in the anticipated ebb point can be determined in response to recognizing a predetermined word pattern indicating a change in the conversation. A response action can be initiated in response to the probable change in the anticipated ebb point.

ENTITY LEVEL DATA AUGMENTATION IN CHATBOTS FOR ROBUST NAMED ENTITY RECOGNITION

Techniques for data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes generating a list of values to cover for an entity, selecting utterances from a set of data that have context for the entity, converting the utterances into templates, where each template of the templates comprises a slot that maps to the list of values for the entity, selecting a template from the templates, selecting a value from the list of values based on the mapping between the slot within the selected template and the list of values for the entity; and creating an artificial utterance based on the selected template and the selected value, where the creating the artificial utterance comprises inserting the selected value into the slot of the selected template that maps to the list of values for the entity.

Method and device for voice information acquisition
11195515 · 2021-12-07 ·

The present application provides a method and device for voice acquisition to reduce the affect of individual differences by quantitatively inputting voice indicators, the method comprising: displaying a first prompt word and starting to receive a first input voice of a user; after the first input voice of the user is received, recognizing the received first input voice to be a first user word; comparing the first user word with the first prompt word; if the first user word is matched with the first prompt word, then displaying a second prompt word and starting to receive a second input voice of the user; after the second input voice of the user is received, recognizing the received second input voice to be a second user word; comparing the second user word with the second prompt word; and integrating the first input voice and the second input voice to be a digital voice file, and storing the digital voice file. The method can accurately, completely and conveniently acquire user sound, thus facilitating subsequent analysis and recognition.

Artificial intelligence communications agent
11366857 · 2022-06-21 · ·

Systems and methods for artificial intelligence communications agents are disclosed. Implementations relate to capturing individual agent's behaviors and modelling them in artificial intelligence (AI) learning models so that the agent's behavior can be easily replicated. Some implementations further relate to systems and methods for capturing human-computer interactions (HCl) performed by agents and using robotic process automation (RPA) to automate tasks that would otherwise require human interaction. The combination of AI learning models and RPA are used to provide artificial intelligence communications agents capable of responding to a variety of topics of conversation over a variety of communication mediums.

Pattern-based statement attribution

A system, method, and computer program product for determining statement attributions. The system includes at least one processing component, at least one memory component, a feature extractor, a model generator, a model database, and an attribution selector. The method includes receiving a statement, generating at least one pattern that defines a grammatical feature of the statement, and generating a statement model from the at least one pattern. The method also includes determining a similarity value for the statement model and at least one reference model.

PROVIDING PRE-COMPUTED HOTWORD MODELS
20230274742 · 2023-08-31 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device.