Patent classifications
G06V10/817
PERCEPTION DIVERSITY FOR IDENTIFICATION OF OBJECTS IN ROBOTICS SYSTEMS AND APPLICATIONS
The present disclosure relates to detecting objects in detection zones using multiple analysis techniques. The multiple analysis techniques may be used to analyze sensor data corresponding to the detection zones. The multiple analysis techniques may be selected based at least on at least two of the analysis techniques of the multiple analysis techniques having a computational diversity by performing different types of computational analyses on the sensor data with respect to each other, and at least two analysis techniques of the multiple analysis techniques having implementation diversity by being implemented on different types of computing platforms with respect to each other.
Analyzing content of digital images
Methods, apparatuses, and embodiments related to analyzing the content of digital images. A computer extracts multiple sets of visual features, which can be keypoints, based on an image of a selected object. Each of the multiple sets of visual features is extracted by a different visual feature extractor. The computer further extracts a visual word count vector based on the image of the selected object. An image query is executed based on the extracted visual features and the extracted visual word count vector to identify one or more candidate template objects of which the selected object may be an instance. When multiple candidate template objects are identified, a matching algorithm compares the selected object with the candidate template objects to determine a particular candidate template of which the selected object is an instance.
Using SLAM 3D information to optimize training and use of deep neural networks for recognition and tracking of 3D object
A system for tracking of an inventory of products on one or more shelves includes a mobile device. The mobile device has an image sensor, at least one processor, and a non-transitory computer-readable medium having instructions that, when executed by the processor, causes the processor to: apply a simultaneous localization and mapping in three dimensions program, on images of a shelf input from the image sensor, to thereby generate a plurality of bounding boxes, each bounding box representing a three-dimensional location and boundaries of a product from the inventory; capture a plurality of two-dimensional images of the shelf; assign an identification to each product displayed in the plurality of two-dimensional images using a deep neural network; associate each identified product in a respective two-dimensional image with a corresponding bounding box, and associate each bounding box with a textual identifier signifying the identified product.
Using SLAM 3D Information To Optimize Training And Use Of Deep Neural Networks For Recognition And Tracking Of 3D Object
A system for tracking of an inventory of products on one or more shelves includes a mobile device. The mobile device has an image sensor, at least one processor, and a non-transitory computer-readable medium having instructions that, when executed by the processor, causes the processor to: apply a simultaneous localization and mapping in three dimensions program, on images of a shelf input from the image sensor, to thereby generate a plurality of bounding boxes, each bounding box representing a three-dimensional location and boundaries of a product from the inventory; capture a plurality of two-dimensional images of the shelf; assign an identification to each product displayed in the plurality of two-dimensional images using a deep neural network; associate each identified product in a respective two-dimensional image with a corresponding bounding box, and associate each bounding box with a textual identifier signifying the identified product.
System and method for discriminating and demarcating targets of interest in a physical scene
Captured samples of a physical structure or other scene are mapped to a predetermined multi-dimensional coordinate space, and spatially-adjacent samples are organized into array cells representing subspaces thereof. Each cell is classified according to predetermined target-identifying criteria for the samples of the cell. A cluster of spatially-contiguous cells of common classification, peripherally bounded by cells of different classification, is constructed, and a boundary demarcation is defined from the peripheral contour of the cluster. The boundary demarcation is overlaid upon a visual display of the physical scene, thereby visually demarcating the boundaries of a detected target of interest.
INFORMATION PROCESSING DEVICE, COMPUTER PROGRAM PRODUCT, AND INFORMATION PROCESSING METHOD
An information processing device includes one or more hardware processors configured to function as a detection unit, a collation unit, a voting unit, and a determination unit. The detection unit detects a tracking target region including a tracking target from a video frame. The collation unit collates the tracking target using a collation dictionary that stores identification information for identifying a collation target and acquires identification information for identifying a collation result in the frame of the tracking target from the collation dictionary. The voting unit obtains voting data by voting, for each of tracking targets, the identification information for identifying the collation result obtained for each of frames. The determination unit determines whether to settle collation based on the voting data and outputs identification information for identifying the settled collation result when the collation is settled.
Systems and methods for object detection including pose and size estimation
The present disclosure is directed to systems and methods for performing object detection and pose estimation in 3D from 2D images. Object detection can be performed by a machine-learned model configured to determine various object properties. Implementations according to the disclosure can use these properties to estimate object pose and size.
Diversity-aware weighted majority vote classifier for imbalanced datasets supporting decision making
An ensemble learning based method is for a binary classification on an imbalanced dataset. The imbalanced dataset has a minority class comprising positive samples and a majority class comprising negative samples. The method includes: generatively oversampling the imbalanced dataset by synthetically generating minority class examples, thereby generating a generated dataset; using the generated dataset to generate subsamples, and learning a base classifier on each of the subsamples to determine a plurality of base classifiers; and learning a weighted majority vote classifier by combining outputs of the base classifiers. Each of the base classifiers is assigned a weight in such a way that a diversity between the base classifiers on the positive samples is minimized.
IMAGE AND DEPTH SENSOR FUSION METHODS AND SYSTEMS
A system for image fusion with a depth data includes an imaging system that provides image data with semantic information. A depth data sensor system provides depth data of objects in a field of view. A processor independently extracts the semantic information from the imaging system and combines it with the depth data by assigning weights. The processor generating a semantic-point encoding with depth data as central data. The central data can then play the primary role in object identification, while the system retains depth data and image data for use when the other is insufficient in view of the conditions during sensing. The depth data preferably is point cloud data, such as data from a mechanical radar that is processed to provide point cloud data or a radar system that provides point cloud data.
Text recognition method and apparatus, computer-readable storage medium and electronic device
A text recognition method, a text recognition apparatus, a computer-readable storage medium, and an electronic device are provided. In the text recognition method, adjacent character strings in a plurality of character strings partially overlap, so that the plurality of character strings may reflect the relationship between the contexts of a text to be recognized, then word vector conversion is performed on the plurality of character strings to obtain a plurality of word vectors, word vector recognition results respectively corresponding to the plurality of word vectors are generated on the basis of the plurality of word vectors to determine whether a text corresponding to the word vectors is an effect text or a non-effect text, and the plurality of word vector recognition results are synthesized to determine a text recognition result of the text to be recognized.