IPIQ

G06V10/7753

SCENE PARSING

20250292574 · 2025-09-18 ·

International Business Machines Corporation

An embodiment partitions, using a trained image segmentation model, an input image into a plurality of patches. An embodiment generates, using a vision transformer model, a plurality of patch embeddings, each patch embedding comprising a multidimensional numerical representation of a patch in the plurality of patches. An embodiment generates, using a trained patch-label similarity model, a plurality of word embeddings corresponding to the plurality of patch embeddings. An embodiment generating, using a trained label prediction model and the plurality of word embeddings, a text label corresponding to the input image.

AUTOMATED GATE DRAWING IN FLOW CYTOMETRY DATA

20250297939 · 2025-09-25 ·

A device receives cell representations produced by a flow cytometry machine from a sample (e.g., blood or bone marrow) of a patient, the cell representations comprising location data and organized in a graph based on the location data, each cell representation corresponding to a cell of the sample. The device inputs the cell representations into a supervised machine learning model, and receives, from the supervised machine learning model, classifications of cell type for each of the cell representations. The device applies an unsupervised machine learning model to the classified cell representations, the unsupervised machine learning model outputting different gates for each of the classifications, the different gates forming an intersection. The device reapplies the unsupervised machine learning model to cell representations within the intersection until the intersection is eliminated.

AUTOMATED ASSESSMENT OF MACHINE LEARNING MODELS USING SYNTHESIZED DATA WITH DIFFERENT CONTEXTS

20250299475 · 2025-09-25 ·

To assess a machine learning (ML) model for possible inaccuracies, different, synthesized trial scenes are applied to the ML model. Each trial scene includes a target object of interest, plus different surrounding contexts. The ML model takes the scenes as input and makes a corresponding prediction. The prediction is affected by both the target object and the surrounding context. The synthesis of many trial scenes with different contexts allows an assessment of the effect of different contexts on the ML prediction. The predictions and corresponding contexts are analyzed to assess the behavior of the ML model. For example, assume that the ML model has some sort of inaccuracy that shows up in some trial scenes but not others. The contexts for the trial scenes with the inaccuracy may be compared with the trial scenes without the inaccuracy.

Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium

12424010 · 2025-09-23 ·

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.

Method of augmenting the number of labeled images for training a neural network

12423967 · 2025-09-23 ·

A method of augmenting the number of labeled images for training a neural network comprising the steps ofStarting from a dataset of labeled images with corresponding segmentation masks and a dataset of unlabeled images, gathering for a given image i in a data set of labeled images a number of images with similar metadata in said dataset of unlabeled images so as to form data sub-set Sim i,Training a multiclass segmentation neural network on said labeled images thereby generating segmentation masks for the images in subset Sim i,On the basis of these segmentation masks judging similarity between images of Sim i and image i and finding the most similar image(s) in Sim i by computing and comparing histograms of segmentation masks of image i and images in Sim iTransferring the histogram of the most similar images in Sim i to given image i.

Provable guarantees for self-supervised deep learning with spectral contrastive loss

12423961 · 2025-09-23 ·

A method for self-supervised learning is described. The method includes generating a plurality of augmented data from unlabeled image data. The method also includes generating a population augmentation graph for a class determined from the plurality of augmented data. The method further includes minimizing a contrastive loss based on a spectral decomposition of the population augmentation graph to learn representations of the unlabeled image data. The method also includes classifying the learned representations of the unlabeled image data to recover ground-truth labels of the unlabeled image data.

MACHINE LEARNING USING CATEGORICAL UNCERTAINTY SAMPLING

20250308210 · 2025-10-02 ·

A system identifies and selects the best image to use to retrain a machine learning algorithm, thereby creating the best model. The best image is identified by determining the image, in an object identification, that has the most uncertainty. In addition to determining uncertainties for images, the system uses priority scores, groupings and orderings of subsets of images by the priority scores, computing a complement of recall or difficulty measure, and selecting images based on the difficulty measure.

Semantic image fill at high resolutions

12417559 · 2025-09-16 ·

Adobe Inc.

Semantic fill techniques are described that support generating fill and editing images from semantic inputs. A user input, for example, is received by a semantic fill system that indicates a selection of a first region of a digital image and a corresponding semantic label. The user input is utilized by the semantic fill system to generate a guidance attention map of the digital image. The semantic fill system leverages the guidance attention map to generate a sparse attention map of a second region of the digital image. A semantic fill of pixels is generated for the first region based on the semantic label and the sparse attention map. The edited digital image is displayed in a user interface.

System and method of training vision transformer on small-scale datasets

12417620 · 2025-09-16 ·

Mohamed bin Zayed University of Artificial Intelligence

A deep learning training system and method, includes an imaging system for capturing medical images, a machine learning engine, and display. The machine learning engine selects a small-scale of images from a training dataset, generates global views by randomly selecting regions in one image, generates local views by randomly selecting regions covering less than a majority of the image, receives the generated global views as a first sequence of non-overlapping image patches, receives the generated global views and the generated local views as a second sequence of non-overlapping image patches, trains parameters in a student-teacher network to predict a class of objects by self-supervised view prediction using the first sequence and the second sequence. The teacher parameters are updated via exponential moving average of the student network parameters. The parameters in the teacher network are transferred to the vision transformer, and the vision transformer is trained by supervised learning.

Few-shot object detection method

12437521 · 2025-10-07 ·

Beijing University of Posts & Telecommunications

A few-shot object detection method includes: sending a weight of a backbone network and a weight of a feature pyramid to a detection network; generating candidate regions, in which the candidate regions derived from a result of foreground-and-background classification and regression of output features of the visual representation backbone network by a region proposal network; generating candidate region features of a uniform size using a pooling operator based on the candidate regions, and performing location regression, content classification and fine-grained feature mining on the candidate region features of the uniform size; establishing fine-grained positive sample pairs and negative sample pairs through the fine-grained feature mining, and performing comparative learning between fine-grained features of the candidate regions; and generating a loss function according to a strategy in fine-grained feature mining, and updating detection network parameters by calculating based on the loss function.

Patent classifications

G06V10/7753