Patent classifications
G06V10/809
METHOD AND APPARTAUS FOR DATA EFFICIENT SEMANTIC SEGMENTATION
A method and system for training a neural network are provided. The method includes receiving an input image, selecting at least one data augmentation method from a pool of data augmentation methods, generating an augmented image by applying the selected at least one data augmentation method to the input image, and generating a mixed image from the input image and the augmented image.
Vehicle pose determining system and method
A vehicle pose determining system and method for accurately estimating the pose of a vehicle (i.e., the location and/or orientation of a vehicle). The system and method use a form of sensor fusion, where output from vehicle dynamics sensors (e.g., accelerometers, gyroscopes, encoders, etc.) is used with output from vehicle radar sensors to improve the accuracy of the vehicle pose data. Uncorrected vehicle pose data derived from dynamics sensor data is compensated with correction data that is derived from occupancy grids that are based on radar sensor data. The occupancy grids, which are 2D or 3D mathematical objects that are somewhat like radar-based maps, must correspond to the same geographic location. The system and method use mathematical techniques (e.g., cost functions) to rotate and shift multiple occupancy grids until a best fit solution is determined, and the best fit solution is then used to derive the correction data that, in turn, improves the accuracy of the vehicle pose data.
Method for eye-tracking and terminal for executing the same
A terminal according to an embodiment is for tracking eyes on the basis of a first eye tracking model in which multiple pieces of learning data related to line-of-sight information are accumulated. The terminal may include a data collecting unit which obtains a facial image of a user using an imaging device, and extracts line-of-sight information about the user from the facial image, a data transmitting unit which transmits, to a server, the line-of-sight information about the user and location information about a point to which a line of sight of the user is directed within a screen of the terminal; a model receiving unit which receives from the server a second eye tracking model obtained by training the first eye tracking model with the line-of-sight information and the location information, and an eye tracking unit which tracks eyes of the user using the second eye tracking model.
Fake finger detection based on transient features
In a method for determining whether a finger is a real finger at an ultrasonic fingerprint sensor, a sequence of images of a fingerprint of a finger are captured at an ultrasonic fingerprint sensor, wherein the sequence of images includes images captured during a change in contact state between the finger and the ultrasonic fingerprint sensor. A plurality of transient features of the finger is extracted from the sequence of images. A classifier is applied to the plurality of transient features to classify the finger as one of a real finger and a fake finger. It is determined whether the finger is a real finger based on an output of the classifier.
Object detection based on machine learning combined with physical attributes and movement patterns detection
Presented herein are systems and methods for increasing reliability of object detection, comprising, receiving a plurality of images of one or more objects captured by imaging sensor(s), receiving an object classification coupled with a first probability score from machine learning model(s) trained to detect the object(s) and applied to the image(s), computing a second probability score for classification of the object(s) according to physical attribute(s) of the object(s) estimated by analyzing the image(s), computing a third probability score for classification of the object(s) according to a movement pattern of the object(s) estimated by analyzing at least some consecutive images, computing an aggregated probability score aggregating the first, second and third probability scores, and outputting, in case the aggregated probability score exceeds a certain threshold, the classification of each object coupled with the aggregated probability score for use by object detection based system(s).
Hybrid deep learning method for recognizing facial expressions
A computer implemented method for recognizing facial expressions by applying feature learning and feature engineering to face images. The method includes conducting feature learning on a face image comprising feeding the face image into a first convolution neural network to obtain a first decision, conducting feature engineering on a face image, comprising the steps of automatically detecting facial landmarks in the face image, transforming the facial features into a two-dimensional matrix, and feeding the two-dimensional matrix into a second convolution neural network to obtain a second decision, computing a hybrid decision based on the first decision and the second decision, and recognizing a facial expression in the face image in accordance to the hybrid decision.
Appearance and movement based model for determining risk of micro mobility users
The systems and methods disclosed herein provide a risk prediction system that uses trained machine learning models to make predictions that a VRU will take a particular action. The system first receives, in a video stream, an image depicting a VRU operating a micro-mobility vehicle and extract the depictions from the image. The extraction process may be determined by bounding box classifiers trained to identify various VRUs and micro-mobility vehicles. The system feeds the extracted depictions to machine learning models and receives, as an output, risk profiles for the VRU and the micro-mobility vehicle. The risk profile may include data associated with the VRU/micro-mobility vehicle determined based on classifications of the VRU and the micro-mobility vehicles. The system may then generate a prediction that the VRU operating the micro-mobility vehicle will take a particular action based on the risk profile.
Training diverse and robust ensembles of artificial intelligence computer models
Mechanisms are provided to implement a hardened ensemble artificial intelligence (AI) model generator. The hardened ensemble AI model generator co-trains at least two AI models. The hardened ensemble AI model generator modifies, based on a comparison of the at least two AI models, a loss surface of one or more of the at least two AI models to prevent an adversarial attack on one AI model, in the at least two AI models, transferring to another AI model in the at least two AI models, to thereby generate one or more modified AI models. At least one of the one or more modified AI models then processes an input to generate an output result.
CASCADE STAGE BOUNDARY AWARENESS NETWORKS FOR SURGICAL WORKFLOW ANALYSIS
Techniques are described for improving computer-assisted surgical (CAS) systems, particularly, to recognize surgical phases in a video of a surgical procedure. A CAS system includes cameras that provide video stream of a surgical procedure. According to one or more aspects the surgical phases are automatically detected in the video stream using a machine learning model. Particularly, the machine learning model includes a boundary aware cascade stage network to perform surgical phase recognition.
Action localization in images and videos using relational features
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing action localization in images and videos. In one aspect, a system comprises a data processing apparatus; a memory in data communication with the data processing apparatus and storing instructions that cause the data processing apparatus to perform image processing and video processing operations comprising: receiving an input comprising an image depicting a person; identifying a plurality of context positions from the image; determining respective feature representations of each of the context positions; providing a feature representation of the person and the feature representations of each of the context positions to a context neural network to obtain relational features, wherein the relational features represent relationships between the person and the context positions; and determining an action performed by the person using the feature representation of the person and the relational features.