G06V10/464

METHODS AND ARRANGEMENTS FOR IDENTIFYING OBJECTS

In some arrangements, product packaging is digitally watermarked over most of its extent to facilitate high-throughput item identification at retail checkouts. Imagery captured by conventional or plenoptic cameras can be processed (e.g., by GPUs) to derive several different perspective-transformed views—further minimizing the need to manually reposition items for identification. Crinkles and other deformations in product packaging can be optically sensed, allowing such surfaces to be virtually flattened to aid identification. Piles of items can be 3D-modelled and virtually segmented into geometric primitives to aid identification, and to discover locations of obscured items. Other data (e.g., including data from sensors in aisles, shelves and carts, and gaze tracking for clues about visual saliency) can be used in assessing identification hypotheses about an item. Logos may be identified and used—or ignored—in product identification. A great variety of other features and arrangements are also detailed.

SYSTEMS, METHODS, AND APPARATUSES FOR IMPLEMENTING A SELF-SUPERVISED CHEST X-RAY IMAGE ANALYSIS MACHINE-LEARNING MODEL UTILIZING TRANSFERABLE VISUAL WORDS

Not only is annotating medical images tedious and time consuming, but it also demands costly, specialty-oriented expertise, which is not easily accessible. To address this challenge, a new self-supervised framework is introduced: TransVW (transferable visual words), exploiting the prowess of transfer learning with convolutional neural networks and the unsupervised nature of visual word extraction with bags of visual words, resulting in an annotation-efficient solution to medical image analysis. TransVW was evaluated using NIH ChestX-ray14 to demonstrate its annotation efficiency. When compared with training from scratch and ImageNet-based transfer learning, TransVW reduces the annotation efforts by 75% and 12%, respectively, in addition to significantly accelerating the convergence speed. More importantly, TransVW sets new records: achieving the best average AUC on all 14 diseases, the best individual AUC scores on 10 diseases, and the second best individual AUC scores on 3 diseases. This performance is unprecedented, because heretofore no self-supervised learning method has outperformed ImageNet-based transfer learning and no annotation reduction has been reported for self-supervised learning. These achievements are contributable to a simple yet powerful observation: The complex and recurring anatomical structures in medical images are natural visual words, which can be automatically extracted, serving as strong yet free supervision signals for CNNs to learn generalizable and transferable image representation via self-supervision.

Control method, information terminal, recording medium, and determination method

If a lesion included in a specification target image is a texture lesion, a probability image calculation unit calculates a probability value indicating a probability that each of a plurality of pixels of the specification target image is included in a lesion area. An output unit calculates, as a candidate area, an area including pixels whose probability values are equal to or larger than a first threshold in a probability image obtained from the probability image calculation unit and, as a modification area, an area including pixels whose probability values are within a certain probability range including the first threshold. An input unit detects an input from a user on a pixel in the modification area. A lesion area specification unit specifies a lesion area on the basis of the probability image, the candidate area, the modification area, and user operation information.

Fault-tolerance to provide robust tracking for autonomous and non-autonomous positional awareness
10983527 · 2021-04-20 · ·

The described positional awareness techniques employing visual-inertial sensory data gathering and analysis hardware with reference to specific example implementations implement improvements in the use of sensors, techniques and hardware design that can enable specific embodiments to provide positional awareness to machines with improved speed and accuracy.

Object authentication device and object authentication method
10997972 · 2021-05-04 · ·

An object authentication device includes a speech recognition unit configured to obtain candidates for a speech recognition result for an input speech and a likelihood of the speech as a speech likelihood and an image model generation unit configured to obtain image models of a predetermined number of candidates for the speech recognition result in descending order of speech likelihoods, wherein the image model generation unit initially performs retrieval from an image model database storing the image models when the image models for the candidates for the speech recognition result are generated and generates an image model from information acquired from a network if the image model is not stored in the image model database.

METHOD, APPARATUS, AND DEVICE FOR PROCESSING IMAGE AND STORAGE MEDIUM
20210124974 · 2021-04-29 ·

The present disclosure provides a method, an apparatuses and a storage medium for processing an image. According to an example of the method, n times of feature extraction are performed on the to-be-processed image, to obtain a pixel feature recognition result. Each time of feature extraction includes p data nodes, in each time of feature extraction, data on each data node except the last data node is determined based on data obtained by processing data on each of the data nodes before the data node respectively via a preset feature extraction method in that time of feature extraction and input data of that time of feature extraction. A feature extraction method used in each time of feature extraction includes feature extraction methods used for data on the data nodes in that time of feature extraction. Thereafter, the to-be-processed image can be processed based on the pixel feature recognition result.

Optimizing 360-degree video streaming with video content analysis

Aspects of the subject disclosure may include, for example, a method performed by a processing system of determining a present orientation of a display region presented at a first time on a display of a video viewer, predicting a future orientation of the display region occurring at a second time based on data collected, to obtain a predicted orientation of the display region to be presented at the second time on the display of the video viewer, identifying, based on the predicted orientation of the display region, a first group of tiles from a video frame of a panoramic video being displayed by the video viewer, wherein the first group of tiles covers the display region in the video frame at the predicted orientation, and a plurality of objects moving in the video frame from the first time to the second time, wherein each object of the plurality of objects is located in a separate spatial region of the video frame at the second time, wherein a second group of tiles collectively covers the separate spatial regions, wherein tiles in the first group of tiles and tiles in the second group of tiles are different, and facilitating wireless transmission of the first group of tiles and a second tile from the second group of tiles, for presentation at the video viewer at the second time. Other embodiments are disclosed.

Framework for Training Machine-Learned Models on Extremely Large Datasets

A MapReduce-based training framework exploits both data parallelism and model parallelism to scale training of complex models. Particular model architectures facilitate and benefit from use of such training framework. As one example, a machine-learned model can include a shared feature extraction portion configured to receive and process a data input to produce an intermediate feature representation and a plurality of prediction heads that are configured to receive and process the intermediate feature representation to respectively produce a plurality of predictions. For example, the data input can be a video and the plurality of predictions can be a plurality of classifications for content of the video (e.g., relative to a plurality of classes).

AUTOMATED CATEGORIZATION AND ASSEMBLY OF LOW-QUALITY IMAGES INTO ELECTRONIC DOCUMENTS

An apparatus includes a memory and processor. The memory stores document categories, text generated from an image a physical document page, and a machine learning algorithm. The machine learning algorithm is configured to extract features associated with natural language processing and features associated with the text. The machine learning algorithm is also configured to generate a feature vector that includes the first and second pluralities of features, and to generate, based on the feature vector, a set of probabilities, each of which is associated with a document category and indicates a probability that the physical document from which the text was generated belongs to that document category. The processor applies the machine learning algorithm to the text, to generate the set of probabilities, identifies a largest probability, and assigns the image to the associated document category.

ANALYZING CONTENT OF DIGITAL IMAGES
20210049401 · 2021-02-18 ·

Methods, apparatuses, and embodiments related to analyzing the content of digital images. A computer extracts multiple sets of visual features, which can be keypoints, based on an image of a selected object. Each of the multiple sets of visual features is extracted by a different visual feature extractor. The computer further extracts a visual word count vector based on the image of the selected object. An image query is executed based on the extracted visual features and the extracted visual word count vector to identify one or more candidate template objects of which the selected object may be an instance. When multiple candidate template objects are identified, a matching algorithm compares the selected object with the candidate template objects to determine a particular candidate template of which the selected object is an instance.