Patent classifications
G06V10/759
Imaging with cameras having different distortion profiles
An imaging system includes first camera having negative distortion; second camera, second field of view of second camera being wider than first field of view of first camera, wherein first field of view fully overlaps with portion of second field of view, second camera having negative distortion at said portion and positive distortion at remaining portion; and processor(s) configured to: capture first image and second image; determine overlapping image segment and non-overlapping image segment of second image; and generate output image from first image and second image, wherein: inner image segment of output image is generated from at least one of: first image, overlapping image segment, and peripheral image segment of output image is generated from non-overlapping image segment.
ZERO-SHOT OBJECT DETECTION
A method, apparatus and system for zero shot object detection includes, in a semantic embedding space having embedded object class labels, training the space by embedding extracted features of bounding boxes and object class labels of labeled bounding boxes of known object classes into the space, determining regions in an image having unknown object classes on which to perform object detection as proposed bounding boxes, extracting features of the proposed bounding boxes, projecting the extracted features of the proposed bounding boxes into the space, computing a similarity measure between the projected features of the proposed bounding boxes and the embedded, extracted features of the bounding boxes of the known object classes in the space, and predicting an object class label for proposed bounding boxes by determining a nearest embedded object class label to the projected features of the proposed bounding boxes in the space based on the similarity measures.
EMBEDDING MULTIMODAL CONTENT IN A COMMON NON-EUCLIDEAN GEOMETRIC SPACE
Embedding multimodal content in a common geometric space includes for each of a plurality of content of the multimodal content, creating a respective, first modality feature vector representative of content of the multimodal content having a first modality using a first machine learning model; for each of a plurality of content of the multimodal content, creating a respective, second modality feature vector representative of content of the multimodal content having a second modality using a second machine learning model; and semantically embedding the respective, first modality feature vectors and the respective, second modality feature vectors in a common geometric space that provides logarithm-like warping of distance space in the geometric space to capture hierarchical relationships between seemingly disparate, embedded modality feature vectors of content in the geometric space; wherein embedded modality feature vectors that are related, across modalities, are closer together in the geometric space than unrelated modality feature vectors.
Label consistency for image analysis
Systems and techniques are disclosed for labeling objects within an image. The objects may be labeled by selecting an option from a plurality of options such that each option is a potential label for the object. An option may have an option score associated with. Additionally, a relation score may be calculated for a first option and a second option corresponding to a second object in an image. The relation score may be based on a frequency, probability, or observance corresponding to the co-occurrence of text associated with the first option and the second option in a text corpus such as the World Wide Web. An option may be selected as a label for an object based on a global score calculated based at least on an option score and relation score associated with the option.
Apparatus and method for recognizing traffic signs
An apparatus for recognizing traffic signs normalizes a window of a predetermined size for a region of interest set in an image frame inputted from an image sensor, generates a first input vector, extracts a candidate region of a traffic sign based on feature pattern information of a neuron having a feature pattern vector, stores the coordinates of the extracted traffic sign candidate region, converts the image size of the extracted candidate region, normalizes a window of a predetermined size for the candidate region of the converted image size, generates a second input vector, determines traffic sign content information of a neuron having a content pattern vector, stores the determined traffic sign content information, and recognizes the location and content of the traffic sign based on the coordinates of the stored candidate regions and the content information of the stored traffic sign when the traffic sign disappears.
MONITORING CHANGES IN PROCESS STATIONS UTILIZING VISUAL INDICATORS
In one aspect, a system is configured to detect a trigger event, identify a first digital image of the process station captured prior to the trigger event, identify a second digital image of the process station captured in response to the trigger event, and identify, in the first and second digital images, a visual indicator of the process station. The system is further configured to compare the visual indicator in the first and second digital images to identify a change in the visual indicator, and identify configuration data for the process station that defines an expected change in the visual indicator from the first digital image to the second digital image. The system is further configured to determine whether the change deviates from the expected change, and generate indicia representative of the identified change in response to determining that the change deviates from the expected change.
DEEP PATCH FEATURE PREDICTION FOR IMAGE INPAINTING
Techniques for using deep learning to facilitate patch-based image inpainting are described. In an example, a computer system hosts a neural network trained to generate, from an image, code vectors including features learned by the neural network and descriptive of patches. The image is received and contains a region of interest (e.g., a hole missing content). The computer system inputs it to the network and, in response, receives the code vectors. Each code vector is associated with a pixel in the image. Rather than comparing RGB values between patches, the computer system compares the code vector of a pixel inside the region to code vectors of pixels outside the region to find the best match based on a feature similarity measure (e.g., a cosine similarity). The pixel value of the pixel inside the region is set based on the pixel value of the matched pixel outside this region.
DATA VOLUME SCULPTOR FOR DEEP LEARNING ACCELERATION
Embodiments of a device include on-board memory, an applications processor, a digital signal processor (DSP) cluster, a configurable accelerator framework (CAF), and at least one communication bus architecture. The communication bus communicatively couples the applications processor, the DSP cluster, and the CAF to the on-board memory. The CAF includes a reconfigurable stream switch and a data volume sculpting unit, which has an input and an output coupled to the reconfigurable stream switch. The data volume sculpting unit has a counter, a comparator, and a controller. The data volume sculpting unit is arranged to receive a stream of feature map data that forms a three-dimensional (3D) feature map. The 3D feature map is formed as a plurality of two-dimensional (2D) data planes. The data volume sculpting unit is also arranged to identify a 3D volume within the 3D feature map that is dimensionally smaller than the 3D feature map and isolate data from the 3D feature map that is within the 3D volume for processing in a deep learning algorithm.
IMAGE PROCESSING DEVICE FOR IMPROVING DETAILS OF AN IMAGE, AND OPERATION METHOD OF THE SAME
Provided are an image processing apparatus and an operation method of the image processing apparatus. The image processing apparatus includes a memory storing one or more instructions, and a processor configured to execute the one or more instructions stored in the memory to, by using one or more convolution neural networks, extract target features by performing a convolution operation between features of target regions having same locations in a plurality of input images and a first kernel set, extract peripheral features by performing a convolution operation of features of peripheral regions located around the target regions in the plurality of input images and a second kernel set, and determine a feature of a region corresponding to the target regions in an output image, based on the target features and the peripheral features.
METHOD AND SERVER FOR CLASSIFYING APPAREL DEPICTED IN IMAGES AND SYSTEM FOR IMAGE-BASED QUERYING
Many clothing listings, particularly in the secondhand market, lack comprehensive or standardized information about the product's attributes, making it difficult for consumers to find what they need. A method and server for classifying images depicting apparel items based on reference shapes is provided. Images depicting apparel are classified based on reference shapes and a geometrical model of the body. A method and system for querying images based on the classification is further provided.