Patent classifications
G06T2207/20132
Method and apparatus for detecting pores based on artificial neural network and visualizing the detected pores
According to various embodiments, a pore visualization service providing server based on artificial intelligence may include a data pre-processor for obtaining a user's face image captured by a user terminal from the user terminal and performing pre-processing based on facial feature points based on the face image; a pore image extractor for generating a pore image corresponding to the user's face image by inputting the user's face image that has been pre-processed through the data pre-processing into an artificial neural network; a data post-processor for post-processing the generated pore image; and a pore visualization service providing unit for superimposing the post-processed pore image on the face image and transmitting a pore visualization image to the user terminal.
Systems and methods for augmented reality application for annotations and adding interfaces to control panels and screens
Example implementations described herein systems and method for providing a platform to facilitate augmented reality (AR) overlays, which can involve stabilizing video received from a first device for display on a second device and for input made to a portion of the stabilized video at the second device, generating an AR overlay on a display of the first device corresponding to the portion of the stabilized video.
MATCHING MOUTH SHAPE AND MOVEMENT IN DIGITAL VIDEO TO ALTERNATIVE AUDIO
A method for matching mouth shape and movement in digital video to alternative audio includes deriving a sequence of facial poses including mouth shapes for an actor from a source digital video. Each pose in the sequence of facial poses corresponds to a middle position of each audio sample. The method further includes generating an animated face mesh based on the sequence of facial poses and the source digital video, transferring tracked expressions from the animated face mesh or the target video to the source video, and generating a rough output video that includes transfers of the tracked expressions. The method further includes generating a finished video at least in part by refining the rough video using a parametric autoencoder trained on mouth shapes in the animated face mesh or the target video. One or more computers may perform the operations of the method.
Method of Universal Automated Verification of Vehicle Damage
The present invention relates to verification of damage to vehicles. More particularly, the present invention relates to a universal approach to automated generation of a damage estimate to a vehicle using images of the vehicle and verification of a manually-generated damage repair proposals using the automatically generated damage estimate.
Aspects and/or embodiments seek to provide a computer-implemented method of generating one or more repair estimates from one or more photos of a damaged vehicle and comparing the generated estimate(s) to one or more input repair estimates to verify the one or more input repair estimates.
USER INPUT BASED DISTRACTION REMOVAL IN MEDIA ITEMS
A media application receives user input that indicates one or more objects to be erased from a media item. The media application translates the user input to a bounding box. The media application provides a crop of the media item based on the bounding box to a segmentation machine-learning model. The segmentation machine-learning model outputs a segmentation mask for one or more segmented objects in the crop of the media item and a corresponding segmentation score that indicates a quality of the segmentation mask.
ACTIVE SPEAKER DETECTION USING IMAGE DATA
A system can operate a speech-controlled device to perform active speaker detection to detect an utterance using image data showing a user speaking the utterance. This enables the device to perform utterance detection using the image data and/or determine which user is speaking the utterance. To perform active speaker detection, the device processes the image data to determine expression parameters associated with the user's face and generates facial measurements based on the expression parameters. For example, the device can use the expression parameters to generate a 3D model including an agnostic facial representation and determine a mouth aspect ratio by measuring a mouth height and a mouth width of the agnostic facial representation. As the mouth aspect ratio changes when the user is speaking, the device can determine that the user is speaking and/or detect an utterance based on an amount of variation of the mouth aspect ratio.
TECHNIQUES FOR ENHANCING SLOW MOTION RECORDING
Methods, systems, and devices for enhancing scene statistics for slow motion recording are described. The method may include capturing from a first sensor of the device a first set of video frames at a first frame rate, capturing from a second sensor of the device a second set of video frames at a second frame rate different from the first frame rate, analyzing an aspect of the first set of video frames in relation to an aspect of the second set of video frames, generating a mapped set of video frames based on the analyzing and a mapping of the aspect of the first set of video frames to the aspect of the second set of video frames, and storing the mapped set of video frames on a display of the device.
END-TO-END ACTION RECOGNITION IN INTELLIGENT VIDEO ANALYSIS AND EDGE COMPUTING SYSTEMS
Apparatuses, systems, and techniques to perform action recognition. In at least one embodiment, action recognition is performed using one or more neural networks and hardware accelerators, in which the one or more neural networks are processed based on, for example, one or more quantization and pruning processes.
System and method for real-time creation and execution of a human Digital Twin
The present invention presents a universal reconfigurable video stream processing system where a digital twin is applied to 3D marker cloud mapping of a set of parameters, related to the current state of the monitored person (object). The invention includes two reconfigurable units, with at least one of these units being universally adjusted for any input-output mapping application with fixed input size, fixed output size and numerical values ordered by their meaning. Each reconfigurable unit includes at least one machine learning based mathematical model with a high number of parameters and non-linear functions performing as a universal approximator and ensuring high flexibility during training process. Each unit of the presented system, which includes a machine learning based mathematical model should be trained in advance of system execution with input-output mapping examples, where the range of the input values in the training example set should cover the range of the input values that will be used during system execution.
SYSTEMS AND METHODS FOR AUTOMATIC TEMPLATE-BASED DETECTION OF ANATOMICAL STRUCTURES
Systems and methods for detecting anatomical structures include, for each training subject of a plurality of training subjects, a corresponding MR image, and generating an initial anatomical template based on a first training subject of the plurality of training subjects. A computing device can map MR images of the other training subjects onto a template space by applying a global transformation followed by a local transformation. The computing device can average the mapped MR images with the initial anatomical template to generate a final anatomical template and boundaries of an anatomical structure of interest can be drawn in the final anatomical template. The computing device can fine tune the boundaries using an edge detection algorithm. The final anatomical template can be used to identify boundaries of the anatomical structure(s) of interest automatically (e g., without human intervention) in non-training subjects.