Patent classifications
G10L19/00
COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS
Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
Portable exercise-related data apparatus
A portable apparatus includes an exercise-measurement circuitry that measures exercise-related measurement data related to a user carrying out an exercise, a communication circuitry configured to provide the portable apparatus with wireless communication capability, and a processing circuitry configured to a perform operations. The operations include receiving the exercise-related measurement data from the exercise-measurement circuitry, receiving configuration data from an external user interface apparatus over a bidirectional wireless communication connection established through the communication circuitry and capable of transferring payload data to both directions, processing the exercise-related measurement data according to the received exercise-related parameters in order to obtain advanced exercise-related data, and communicating the advanced exercise-related measurement data to the user interface apparatus over the bidirectional wireless communication connection.
Portable exercise-related data apparatus
A portable apparatus includes an exercise-measurement circuitry that measures exercise-related measurement data related to a user carrying out an exercise, a communication circuitry configured to provide the portable apparatus with wireless communication capability, and a processing circuitry configured to a perform operations. The operations include receiving the exercise-related measurement data from the exercise-measurement circuitry, receiving configuration data from an external user interface apparatus over a bidirectional wireless communication connection established through the communication circuitry and capable of transferring payload data to both directions, processing the exercise-related measurement data according to the received exercise-related parameters in order to obtain advanced exercise-related data, and communicating the advanced exercise-related measurement data to the user interface apparatus over the bidirectional wireless communication connection.
Cognitive analysis for speech recognition using multi-language vector representations
A method, system and computer program product for speech recognition using multiple languages includes receiving, by one or more processors, an input from a user, the input includes a sentence in a first language. The one or more processors translate the sentence to a plurality of languages different than the first language, and create vectors associated with the plurality of languages, each vector includes a representation of the sentence in each of the plurality of languages. The one or more processors calculate eigenvectors for each vector associated with a language in the plurality of languages, and based on the calculated eigenvectors, a score is assigned to each of the plurality of languages according to a relevance for determining a meaning of the sentence.
Cognitive analysis for speech recognition using multi-language vector representations
A method, system and computer program product for speech recognition using multiple languages includes receiving, by one or more processors, an input from a user, the input includes a sentence in a first language. The one or more processors translate the sentence to a plurality of languages different than the first language, and create vectors associated with the plurality of languages, each vector includes a representation of the sentence in each of the plurality of languages. The one or more processors calculate eigenvectors for each vector associated with a language in the plurality of languages, and based on the calculated eigenvectors, a score is assigned to each of the plurality of languages according to a relevance for determining a meaning of the sentence.
METHODS AND SYSTEMS FOR STREAMABLE MULTIMODAL LANGUAGE UNDERSTANDING
The present disclosure describes methods and systems for generating semantic predictions from an input speech signal representing a speaker's speech, and maps the semantic predictions to a command action that represents the speaker's intent. A streamable multimodal language understanding (MLU) system includes a machine learning-based model, such as a RNN model that is trained to convert speech chunks and corresponding text predictions of the input speech signal into semantic predictions that represent a speaker's intent. A semantic prediction is generated and updated, over a series of time steps. In each time step, a new speech chunk and corresponding text prediction of the input speech signal are obtained, encoded and fused to generate an audio-textual representation. A semantic prediction is generated by a sequence classifier by processing the audio-textual representation and the semantic prediction is updated as new speech chunks and corresponding text predictions are obtained. Extracted semantic information contained within a sequence of semantic predictions representing a speaker's speech are acted upon through a command action performed by another computing device or computer application.
METHODS AND SYSTEMS FOR STREAMABLE MULTIMODAL LANGUAGE UNDERSTANDING
The present disclosure describes methods and systems for generating semantic predictions from an input speech signal representing a speaker's speech, and maps the semantic predictions to a command action that represents the speaker's intent. A streamable multimodal language understanding (MLU) system includes a machine learning-based model, such as a RNN model that is trained to convert speech chunks and corresponding text predictions of the input speech signal into semantic predictions that represent a speaker's intent. A semantic prediction is generated and updated, over a series of time steps. In each time step, a new speech chunk and corresponding text prediction of the input speech signal are obtained, encoded and fused to generate an audio-textual representation. A semantic prediction is generated by a sequence classifier by processing the audio-textual representation and the semantic prediction is updated as new speech chunks and corresponding text predictions are obtained. Extracted semantic information contained within a sequence of semantic predictions representing a speaker's speech are acted upon through a command action performed by another computing device or computer application.
System and method of performing secured transactions in a communication network
A system and a method of data communication between a first computing device, associated with a first user, and at least one second computing device associated with a second user may include: receiving, by the first computing device, one or more data elements pertaining to details of a transaction request from the second computing device, via a voice channel; extracting said transaction request details by the first computing device; transmitting, by the first computing device, one or more authentication data elements of an electronic wallet module, comprised in the first computing device, to the second computing device, via the voice channel; and carrying out the requested transaction by the first computing device, based on the extracted transaction request details and the electronic wallet authentication data.
System and method of performing secured transactions in a communication network
A system and a method of data communication between a first computing device, associated with a first user, and at least one second computing device associated with a second user may include: receiving, by the first computing device, one or more data elements pertaining to details of a transaction request from the second computing device, via a voice channel; extracting said transaction request details by the first computing device; transmitting, by the first computing device, one or more authentication data elements of an electronic wallet module, comprised in the first computing device, to the second computing device, via the voice channel; and carrying out the requested transaction by the first computing device, based on the extracted transaction request details and the electronic wallet authentication data.
SPEECH SYNTHESIS METHOD AND SYSTEM
Disclosed is a speech synthesis method including: acquiring fundamental frequency information and acoustic feature information from original speech; generating an impulse train from the fundamental frequency information, and inputting it to a harmonic time-varying filter; inputting the acoustic feature information into a neural network filter estimator to obtain corresponding impulse response information; generating noise signal by a noise generator; determining, by the harmonic time-varying filter, harmonic component information through filtering processing on the impulse train and the impulse response information; determining, by a noise time-varying filter, noise component information based on the impulse response information and the noise; and generating a synthesized speech from the harmonic component information and the noise component information. Acoustic features are processed to obtain corresponding impulse response information, and harmonic component information and noise component information are modeled respectively, thereby reducing computation of speech synthesis and improving the quality of the synthesized speech.