G10L15/142

METHOD AND SYSTEM FOR TRANSCRIPTION OF A LEXICAL UNIT FROM A FIRST ALPHABET INTO A SECOND ALPHABET
20170357634 · 2017-12-14 ·

A server and a method for transcription of a lexical unit from a first alphabet into a second alphabet, the method comprising: acquiring a pair of (i) the lexical unit written in the first alphabet, and (ii) the corresponding transcription of the lexical unit written in the second alphabet, both having been divided into respective segments, such that within the pair, every segment of the lexical unit has a corresponding segment in the transcription of the lexical unit, and such that each lexical unit comprises either a sequence of sequentially alternating consonant segments, or a single vowel segment, or a single consonant segment; defining, for each given segment of the lexical unit, its context; training the server to calculate a theoretical frequency of at least one second alphabet character representing transcription of a particular given segment based on the context of particular given segment of the lexical unit.

Artificial intelligence apparatus for recognizing speech including multiple languages, and method for the same
11682388 · 2023-06-20 · ·

An AI apparatus includes a microphone to acquire speech data including multiple languages, and a processor to acquire text data corresponding to the speech data, determine a main language from languages included in the text data, acquire a translated text data obtained by translating a text data portion, which has a language other than the main language, in the main language, acquire a morpheme analysis result for the translated text data, extract a keyword for intention analysis from the morpheme analysis result, acquire an intention pattern matched to the keyword, and perform an operation corresponding to the intention pattern.

Statistical voice dialog system and method

A method for processing a voice command using a statistical dialog model determines a belief state as a probability distribution over states organized in a hierarchy with a parent-child relationship of nodes representing the states. The belief state includes the hierarchy of state variables defining probabilities of each state to correspond to the voice command and a probability of a state of a child node in the hierarchy is conditioned on a probability of a state of a corresponding parent node. A system action is selected based on the belief state.

CONTEXTUAL AWARENESS IN DYNAMIC DEVICE GROUPS

Systems and methods for contextual awareness in dynamic device groups are disclosed. For example, a dynamic device group may be generated while output of content is occurring. When a user provides user input to alter the output of the content, contextual data indicating the devices in the dynamic device group when the user input is received may be generated and utilized by an application to determine which devices are to receive a command to perform an action responsive to the user input.

APPARATUS AND METHOD FOR GENERATING VISUAL CONTENT FROM AN AUDIO SIGNAL
20170337913 · 2017-11-23 ·

An apparatus and method for generating visual content from an audio signal are described. The method includes receiving (310) audio content, processing (320) the audio content to separate into a first and second portion of the audio content, converting (330) the second portion into visual content, delaying (340) the first portion based on a time relationship between the audio content and the visual content, the delaying accounting for time to process the first portion and convert the second portion, and providing (350) the visual content and audio content for reproduction. The apparatus includes a source separation module (210) processing the received audio content to separate into a first and second portion of the audio content, a converter module (220) converting the second portion into visual content, and a synchronization module (230) delaying the first portion based on a time relationship between the audio content and the visual content.

Estimating Clean Speech Features Using Manifold Modeling
20170316790 · 2017-11-02 ·

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at one or more processing devices, a portion of an input signal representing noisy speech, and extracting, from the portion of the input signal, one or more frequency domain features of the noisy speech. The method also includes generating a set of projected features by projecting each of the one or more frequency domain features on a manifold that represents a model of frequency domain features for clean speech. The method further includes using the set of projected features for at least one of: a) generating synthesized speech that represents a noise-reduced version of the noisy speech, b) performing speaker recognition, or c) performing speech recognition.

Apparatus and method for large vocabulary continuous speech recognition

Provided is an apparatus for large vocabulary continuous speech recognition (LVCSR) based on a context-dependent deep neural network hidden Markov model (CD-DNN-HMM) algorithm. The apparatus may include an extractor configured to extract acoustic model-state level information corresponding to an input speech signal from a training data model set using at least one of a first feature vector based on a gammatone filterbank signal analysis algorithm and a second feature vector based on a bottleneck algorithm, and a speech recognizer configured to provide a result of recognizing the input speech signal based on the extracted acoustic model-state level information.

METHODS AND SYSTEMS FOR IDENTIFYING KEYWORDS IN SPEECH SIGNAL
20170301341 · 2017-10-19 ·

The disclosed embodiments relate to a method of keyword recognition in a speech signal. The method includes determining a first likelihood score and a second likelihood score of one or more features of a frame of said speech signal being associated with one or more states in a first model and one or more states in a second model, respectively. The one or more states in the first model corresponds to one or more tied triphone states and the one or more states in the second model corresponds to one or more monophone states of a keyword to be recognized in the speech signal. The method further includes determining a third likelihood score based on the first likelihood score and the second likelihood score. The first likelihood score and the third likelihood score are utilizable to determine presence of the keyword in the speech signal.

Decoder for searching a digraph and generating a lattice, decoding method, and computer program product
09786272 · 2017-10-10 · ·

According to an embodiment, a decoder includes a token operating unit, a node adder, and a connection detector. The token operating unit is configured to, every time a signal or a feature is input, propagate each of a plurality of tokens, which is an object assigned with a state of the of a path being searched, according to a digraph until a state or a transition assigned with a non-empty input symbol is reached. The node adder is configured to, in each instance of token propagating, add, in a lattice, a node corresponding to a state assigned to each of the plurality of tokens. The connection detector is configured to refer to the digraph and detect a node that is connected to a node added in an i-th instance in the lattice and that is added in an i+1-th instance in the lattice.

FINE-GRAINED NATURAL LANGUAGE UNDERSTANDING

A system capable of performing natural language understanding (NLU) without the concept of a domain that influences NLU results. The present system uses a hierarchical organizations of intents/commands and entity types, and trained models associated with those hierarchies, so that commands and entity types may be determined for incoming text queries without necessarily determining a domain for the incoming text. The system thus operates in a domain agnostic manner, in a departure from multi-domain architecture NLU processing where a system determines NLU results for multiple domains simultaneously and then ranks them to determine which to select as the result.