IPIQ

G10L2021/105

Virtual photorealistic digital actor system for remote service of customers

09727874 · 2017-08-08 ·

Ratnakumar Navaratnam

A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.

Virtual photorealistic digital actor system for remote service of customers

09721257 · 2017-08-01 ·

Ratnakumar Navaratnam

Speech to Text Prosthetic Hearing Aid

20170186431 · 2017-06-29 ·

Frank Xavier Didik

The invention is prosthetic hearing aid designed to assist and enrich the lives of people who are hearing impaired or have experienced a total loss of hearing by allowing them to hear or understand what is spoken to them. The invention consists of a frame assembly having left and right temples and a front, a lens assembly secured to the fame assembly, a set of microphones attached to the frame assembly, capable of detecting the sound of the spoken word, a television camera system attached to the frame assembly, that is able to track lip movement, a semi transparent viewing screen and a CPU microprocessor and appropriate electronic software coding in order to convert both the audio as well as the lip movement of the spoken word into text and also change the frequency of the spoken word.

Physical-virtual patient bed system

09679500 · 2017-06-13 ·

University Of Central Florida Research Foundation, Inc.

A patient simulation system for healthcare training is provided. The system includes one or more interchangeable shells comprising a physical anatomical model of at least a portion of a patient's body, the shell adapted to be illuminated from behind to provide one or more dynamic images viewable on the outer surface of the shells; a support system adapted to receive the shells via a mounting system, wherein the system comprises one or more image units adapted to render the one or more dynamic images viewable on the outer surface of the shells; one or more interface devices located about the patient shells to receive input and provide output; and one or more computing units in communication with the image units and interface devices, the computing units adapted to provide an interactive simulation for healthcare training.

SYSTEMS AND METHODS FOR SPEECH ANIMATION USING VISEMES WITH PHONETIC BOUNDARY CONTEXT

20170154457 · 2017-06-01 ·

DISNEY ENTERPRISES, INC.

Speech animation may be performed using visemes with phonetic boundary context. A viseme unit may comprise an animation that simulates lip movement of an animated entity. Individual ones of the viseme units may correspond to one or more complete phonemes and phoneme context of the one or more complete phonemes. Phoneme context may include a phoneme that is adjacent to the one or more complete phonemes that correspond to a given viseme unit. Potential sets of viseme units that correspond with individual phoneme string portions may be determined. One of the potential sets of viseme units may be selected for individual ones of the phoneme string portions based on a fit metric that conveys a match between individual ones of the potential sets and the corresponding phoneme string portion.

METHOD, APPARATUS AND COMPUTER PROGRAM

20250069308 · 2025-02-27 ·

A computer-implemented method comprising: receiving, from a user device, video data from a user; training a first machine learning model based on the video data to provide a second machine learning model, the second machine learning model being personalized to the user, wherein the second machine learning model is trained to predict movement of the user based on audio data; receiving further audio data from the user; determining predicted movements of the user based on the further audio data and the second machine learning model; using the predicted movements of the user to generate animation of an avatar of the user.

Apparatus and method for generating lip sync image

12236943 · 2025-02-25 ·

DEEPBRAIN AI INC.

An apparatus for generating a lip sync image according to disclosed embodiment has one or more processors and a memory which stores one or more programs executed by the one or more processors. The apparatus includes a first artificial neural network model configured to generate an utterance match synthesis image by using a person background image and an utterance match audio signal corresponding to the person background image as an input, and generate an utterance mismatch synthesis image by using the person background image and an utterance mismatch audio signal not corresponding to the person background image as an input, and a second artificial neural network model configured to output classification values for an input pair in which an image and a voice match and an input pair in which an image and a voice do not match by using the input pairs as an input.

Photo-realistic synthesis of three dimensional animation with facial features synchronized with speech

09613450 · 2017-04-04 ·

Microsoft Technology Licensing, Llc

Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model.

Wearable speech input-based to moving lips display overlay

12243552 · 2025-03-04 ·

Snap Inc.

Kathleen Worthington McMahon

Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.

Actor-replacement system for videos

12260882 · 2025-03-25 ·

Roku, Inc.

In one aspect, an example method includes (i) estimating, using a skeletal detection model, a pose of an original actor for each of multiple frames of a video; (ii) obtaining, for each of a plurality of the estimated poses, a respective image of a replacement actor; (iii) obtaining replacement speech in the replacement actor's voice that corresponds to speech of the original actor in the video; (iv) generating, using the estimated poses, the images of the replacement actor, and the replacement speech, synthetic frames corresponding to the multiple frames of the video that depict the replacement actor in place of the original actor, with the synthetic frames including facial expressions for the replacement actor that temporally align with the replacement speech; and (iv) combining the synthetic frames and the replacement speech so as to obtain a synthetic video that replaces the original actor with the replacement actor.

Patent classifications

G10L2021/105