G10L2021/105

System and method to insert visual subtitles in videos

A system and method to insert visual subtitles in videos is described. The method comprises segmenting an input video signal to extract the speech segments and music segments. Next, a speaker representation is associated for each speech segment corresponding to a speaker visible in the frame. Further, speech segments are analyzed to compute the phones and the duration of each phone. The phones are mapped to a corresponding viseme and a viseme based language model is created with a corresponding score. Most relevant viseme is selected for the speech segments by computing a total viseme score. Further, a speaker representation sequence is created such that phones and emotions in the speech segments are represented as reconstructed lip movements and eyebrow movements. The speaker representation sequence is then integrated with the music segments and super imposed on the input video signal to create subtitles.

Physical-virtual patient bed system

A patient simulation system for healthcare training is provided. The system includes one or more interchangeable shells comprising a physical anatomical model of at least a portion of a patient's body, the shell adapted to be illuminated from behind to provide one or more dynamic images viewable on the outer surface of the shells; a support system adapted to receive the shells via a mounting system, wherein the system comprises one or more image units adapted to render the one or more dynamic images viewable on the outer surface of the shells; one or more interface devices located about the patient shells to receive input and provide output; and one or more computing units in communication with the image units and interface devices, the computing units adapted to provide an interactive simulation for healthcare training.

Physical face cloning

A computer-implemented method is provided for physical face cloning to generate a synthetic skin. Rather than attempt to reproduce the mechanical properties of biological tissue, an output-oriented approach is utilized that models the synthetic skin as an elastic material with isotropic and homogeneous properties (e.g., silicone rubber). The method includes capturing a plurality of expressive poses from a human subject and generating a computational model based on one or more material parameters of a material. In one embodiment, the computational model is a compressible neo-Hookean material model configured to simulate deformation behavior of the synthetic skin. The method further includes optimizing a shape geometry of the synthetic skin based on the computational model and the captured expressive poses. An optimization process is provided that varies the thickness of the synthetic skin based on a minimization of an elastic energy with respect to rest state positions of the synthetic skin.

Method of translating and synthesizing a foreign language
20190244623 · 2019-08-08 ·

A method to interactively convert a source language video/audio stream into one or more target languages in high definition video format using a computer. The spoken words in the converted language are synchronized with synthesized movements of a rendered mouth. Original audio and video streams from pre-recorded or live sermons are synthesized into another language with the original emotional and tonal characteristics. The original sermon could be in any language and be translated into any other language. The mouth and jaw are digitally rendered with viseme and phoneme morphing targets that are pre-generated for lip synching with the synthesized target language audio. Each video image frame has the simulated lips and jaw inserted over the original. The new audio and video image then encoded and uploaded for internee viewing or recording to a storage medium.

Enhanced avatar animation

Avatar animation may be enhanced to reflect emotion and other human traits when animated to read messages received from other users or other messages. A message may be analyzed to determine visual features associated with data in the message. The visual features may be depicted graphically by the avatar to create enhanced avatar animation. A text-based message may include indicators, such as punctuation, font, words, graphics, and/or other information, which may be extracted to create the visual features. This information may be used to select visual features as special animation, which may be implemented in animation of the avatar. Examples of visual features include animations of laugher, smiling, clapping, whistling, and/or other animations.

SYSTEM, METHOD, AND COMPUTER PROGRAM FOR TRANSMITTING FACE MODELS BASED ON FACE DATA POINTS
20190222807 · 2019-07-18 ·

A system, method, and computer program are provided for transmitting face models based on face data points. In use, a first image is received and at least one face associated with the first image is identified. Next, a face model is created of the at least one face by determining a structure of the at least one face, wherein the face model includes one or more face data points. The face model is transmitted. Additionally, a real-time stream is enabled of the at least one face, and a real-time face model is determined of the real-time stream using the face model. The real-time face model is then transmitted.

Producing realistic talking Face with Expression using Images text and voice
20190197755 · 2019-06-27 · ·

A method for providing visual sequences using one or more images comprising: receiving one or more person images of showing at least one face, receiving a message to be enacted by the person, wherein the message comprises at least a text or a emotional and movement command, processing the message to extract or receive an audio data related to voice of the person, and a facial movement data related to expression to be carried on face of the person, processing the image/s, the audio data, and the facial movement data, and generating an animation of the person enacting the message.
Wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part's movement.

PRODUCTION OF SPEECH BASED ON WHISPERED SPEECH AND SILENT SPEECH

A method, a system, and a computer program product are provided for interpreting low amplitude speech and transmitting amplified speech to a remote communication device. At least one computing device receives sensor data from multiple sensors. The sensor data is associated with the low amplitude speech. At least one of the at least one computing device analyzes the sensor data to map the sensor data to at least one syllable resulting in a string of one or more words. An electronic representation of the string of the one or more words may be generated and transmitted to a remote communication device for producing the amplified speech from the electronic representation.

Real-Time Lip Synchronization Animation
20190172241 · 2019-06-06 ·

A device includes a processor and a memory that stores predetermined data including a progressive transition rule and animation models. Each of the animation models corresponds to a respective phoneme. The memory stores instructions including receiving a request from a user and obtaining an answer to the request. The answer includes first and second indicators that correspond to first and second phonemes. The instructions include, according to the first indicator, identifying a first animation model that corresponds to the first phoneme. The instructions include, according to the second indicator, identifying a second animation model that corresponds to the second phoneme. The instructions include generating a transition animation model according to the progressive transition rule using the first and second animation models. The instructions include generating images according to the first, second, and transition animation models. The instructions include outputting the images to the user via a display.

Actor-replacement system for videos
12014753 · 2024-06-18 · ·

In one aspect, an example method includes (i) estimating, using a skeletal detection model, a pose of an original actor for each of multiple frames of a video; (ii) obtaining, for each of a plurality of the estimated poses, a respective image of a replacement actor; (iii) obtaining replacement speech in the replacement actor's voice that corresponds to speech of the original actor in the video; (iv) generating, using the estimated poses, the images of the replacement actor, and the replacement speech, synthetic frames corresponding to the multiple frames of the video that depict the replacement actor in place of the original actor, with the synthetic frames including facial expressions for the replacement actor that temporally align with the replacement speech; and (iv) combining the synthetic frames and the replacement speech so as to obtain a synthetic video that replaces the original actor with the replacement actor.