Patent classifications
G06V30/1914
System to identify authorship of handwritten text based on individual alphabets
A device, method, and non-transitory computer readable medium are described. The method includes receiving a dataset including hand written Arabic words and hand written Arabic alphabets from one or more users. The method further includes removing whitespace around alphabets in the hand written Arabic words and the hand written Arabic alphabets in the dataset. The method further includes splitting the dataset into a training set, a validation set, and a test set. The method further includes classifying one or more user datasets from the training set, the validation set, and the test set. The method further includes identifying the target user from the one or more user datasets. The identification of the target user includes a verification accuracy of the hand written Arabic words being larger than a verification accuracy threshold value.
Visual labeling for machine learning training
Systems, methods, and computer-readable media are disclosed for visual labeling of training data items for training a machine learning model. Training data items may be generated for training the machine learning model. Visual labels, such as QR codes, may be created for the training data items. The creation of the training data item and the visual label may be automated. The visual labels and the training data items may be combined to obtain a labeled training data item. The labeled training data item may comprise a separator to distinguish the training data item from the visual label. The labeled training data item may be used for training and validation of the machine learning model. The machine learning model may analyze the training data item, attempt to identify the training data item, and compare the identification against the embedded label.
Generating depth maps for panoramic camera systems
A camera system captures images from a set of cameras to generate binocular panoramic views of an environment. The cameras are oriented in the camera system to maximize the minimum number of cameras viewing a set of randomized test points. To calibrate the system, matching features between images are identified and used to estimate three-dimensional points external to the camera system. Calibration parameters are modified to improve the three-dimensional point estimates. When images are captured, a pipeline generates a depth map for each camera using reprojected views from adjacent cameras and an image pyramid that includes individual pixel depth refinement and filtering between levels of the pyramid. The images may be used to generate views of the environment from different perspectives (relative to the image capture location) by generating depth surfaces corresponding to the depth maps and blending the depth surfaces.
Panoramic camera systems
A camera system captures images from a set of cameras to generate binocular panoramic views of an environment. The cameras are oriented in the camera system to maximize the minimum number of cameras viewing a set of randomized test points. To calibrate the system, matching features between images are identified and used to estimate three-dimensional points external to the camera system. Calibration parameters are modified to improve the three-dimensional point estimates. When images are captured, a pipeline generates a depth map for each camera using reprojected views from adjacent cameras and an image pyramid that includes individual pixel depth refinement and filtering between levels of the pyramid. The images may be used generate views of the environment from different perspectives (relative to the image capture location) by generating depth surfaces corresponding to the depth maps and blending the depth surfaces.
ACTIVE LEARNING METHOD FOR TEMPORAL ACTION LOCALIZATION IN UNTRIMMED VIDEOS
Various embodiments describe active learning methods for training temporal action localization models used to localize actions in untrimmed videos. A trainable active learning selection function is used to select unlabeled samples that can improve the temporal action localization model the most. The select unlabeled samples are then annotated and used to retrain the temporal action localization model. In some embodiment, the trainable active learning selection function includes a trainable performance prediction model that maps a video sample and a temporal action localization model to a predicted performance improvement for the temporal action localization model.
FEATURE MATCHING WITH A SUBSPACE SPANNED BY MULTIPLE REPRESENTATIVE FEATURE VECTORS
Methods, systems, and devices for object recognition are described. A device may generate a subspace based at least in part on a set of representative feature vectors for an object. The device may obtain an array of pixels representing an image. The device may determine a probe feature vector for the image by applying a convolutional operation to the array of pixels. The device may create a reconstructed feature vector in the subspace based at least in part on the set of representative feature vectors and the probe feature vector. The device may compare the reconstructed feature vector and the probe feature vector and recognize the object in the image based at least in part on the comparison. For example, the described techniques may support pose invariant facial recognition or other such object recognition applications.
GENERATIVE AUGMENTATION OF IMAGE DATA
Systems and methods to receive one or more first images associated with a training set of images to train a machine learning model; provide the one or more first images as a first input to a first set of layers of computational units, wherein the first set of layers utilizes image filters; provide a first output of the first set of layers of computational units as a second input to a second layer of the computational units, wherein the second layer utilizes random parameter sets for computations; obtain distortion parameters from the second layer of the computational units; generate one or more second images comprising a representation of the one or more first images modified with the distortion parameters; obtain, as a third output, the one or more second images; and add the one or more second images to the training set of images to train the machine learning model.
IMAGE QUALITY ASSESSMENT AND IMPROVEMENT FOR PERFORMING OPTICAL CHARACTER RECOGNITION
Techniques are disclosed for performing optical character recognition (OCR) by assessing and improving quality of electronic documents to perform the OCR. For example a method for identifying information in an electronic document includes obtaining a reference image of the electronic document, distorting the reference image by adjusting different sets of one or more parameters associated with a quality of the reference image to generate a plurality of distorted images, analyzing each distorted image to detect the adjusted set of parameters and corresponding adjusted values, determining an accuracy of detection of the set of parameters and the adjusted values, and training a model based at least on the plurality of distorted images and the accuracy of the detection, wherein the trained model determines at least a first technique for adjusting a set of parameters in a second image to prepare the second image for optical character recognition.
MACHINE LEARNING ARTIFICIAL CHARACTER GENERATION
Embodiments of the technology discussed herein address problems of traditional electronic character recognition training by artificially generating handwriting in a unique way according to machine learning techniques that transform handwriting samples according to generative rules and discriminative rules. Solutions provided herein produce a wide range of artificially generated handwriting that appears to be human generated handwriting. As such, embodiments herein provide additional characters for a system's character bank that are obtained more efficiently, as compared to traditional techniques. Further, embodiments herein are designed to be suitable for machine learning, and as such, the techniques grow ever more efficient as the techniques are performed. In short, the solutions provided herein improve the computing technology itself in a manner that makes robust electronic Chinese character recognition feasible.
Methods and Systems for Testing an Optical Character Recognition Model
Systems and methods for generating optical character recognition (OCR) models configured to identify characters from a variety of different documents. The OCR models are based on a base model. One or more outside models can be tested to determine their effectiveness in supplementing the base model. When an outside model is effective it is incorporated into the base model. Generations of base models can be created that provide for additional functionality that is not present in the preceding model. A family of the generations of base models is maintained.