G10L15/063

Systems and methods for machine-generated avatars
11551705 · 2023-01-10 · ·

Systems and methods are disclosed for creating a machine generated avatar. A machine generated avatar is an avatar generated by processing video and audio information extracted from a recording of a human speaking a reading corpora and enabling the created avatar to be able to say an unlimited number of utterances, i.e., utterances that were not recorded. The video and audio processing consists of the use of machine learning algorithms that may create predictive models based upon pixel, semantic, phonetic, intonation, and wavelets.

METHOD FOR OUTPUTTING BLEND SHAPE VALUE, STORAGE MEDIUM, AND ELECTRONIC DEVICE

A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t−n)/2 time point based on an input feature vector of a previous layer between a t time point and a t-n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.

DEVICE FOR DETECTING MUSIC DATA FROM VIDEO CONTENTS, AND METHOD FOR CONTROLLING SAME

A data processing method according to the present invention comprises the steps of: receiving an input of video contents including a video stream and an audio stream; detecting music data from the audio stream; and filtering the audio stream so that the music data detected from the audio stream is removed.

SYSTEM FOR TRANSCRIBING AND PERFORMING ANALYSIS ON PATIENT DATA

Methods, apparatuses, and systems for transcribing and performing analysis on patient data are disclosed. Data is collected from one or more medical professionals as well as sensors and imaging devices positioned on or oriented towards a patient. An analysis is performed on the patient data and the data is presented to a medical professional via a verbal interface in a conversational manner, allowing the medical professional to provide additional data such as observations or instructions which may be used for further analysis or to perform actions related to the patient's care.

Label generation device, model learning device, emotion recognition apparatus, methods therefor, program, and recording medium

With correct emotion classes selected as correct values of an emotion of an utterer of a first utterance from among a plurality of emotion classes C.sub.1, . . . , C.sub.K by listeners who have listened to the first utterance, as an input, the numbers of times n.sub.i that emotion classes C.sub.i have been selected as the correct emotion classes are obtained, and rates of the numbers of times n.sub.k to a sum total of the numbers of times n.sub.1, . . . , n.sub.K or smoothed values of the rates are obtained as correct emotion soft labels t.sub.k.sup.(s) corresponding to the first utterance.

METHODS AND APPARATUS TO GENERATE TEXTUAL DATA USING MACHINE LEARNING PROCESSES
20230214592 · 2023-07-06 ·

This application relates to apparatus and methods for automatically generating item information, such as item descriptions, and providing the item information to customers. For example, the embodiments may generate and provide personalized item descriptions to customers during conversational interactions in speech-based systems. In some examples, the embodiments determine entities (e.g., attributes) from item information, and apply trained machine learning processes to the extracted entities to generate textual data, such as item descriptions. For example, a computing device may apply a trained natural language processing, such as a trained transformer-based machine learning technique, to the extracted entities to generate the item descriptions. In some examples, the computing device applies post processing techniques to the generated textual data. The generated textual data may include descriptive phrases that are user friendly to customers in an e-commerce system. The textual data can be converted to audio and played to customers.

Learning device and method for updating a parameter of a speech recognition model

A learning device (10) includes a feature extracting unit (11) that extracts features of speech from speech data for training, a probability calculating unit (12) that, on the basis of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings, an error calculating unit (13) that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire model that minimizes an expected value of summation of loss in the word error rates, and an updating unit (14) that updates a parameter of the model in accordance with the parameter obtained by the error calculating unit (13).

N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models

A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Electronic device and method of controlling thereof

An electronic device and a method for controlling the electronic device are disclosed. The electronic device of the disclosure includes a microphone, a memory storing at least one instruction, and a processor configured to execute the at least one instruction. The processor, by executing the at least one instruction, is configured to: obtain second voice data by inputting first voice data input via the microphone to a first model trained to enhance sound quality, obtain a weight by inputting the first voice data and the second voice data to a second model, and identify input data to be input to a third model using the weight.

METHOD OF TRAINING SOUND RECOGNITION MODEL, METHOD OF RECOGNIZING SOUND, AND ELECTRONIC DEVICE FOR PERFORMING THE METHODS

Provided are a method of recognizing sound, a method of training a sound recognition model, and an electronic device performing the same methods. A method of training a sound recognition model according to an example embodiment may include converting training data labeled with a sound class into a feature vector, storing the feature vector in a feature queue, transferring the feature vector stored in the feature queue to a block queue according to an operation of a feature vector transfer timer, inputting the feature vector of the block queue into a sound recognition model trained to predict the sound class and storing an output result in a result queue, transferring the feature vector stored in the feature queue corresponding to timing at which the result is output to the block queue by the feature vector transfer timer when the result is output.