Patent classifications
G06V10/426
Unified framework for multi-modal similarity search
Technology is disclosed herein for enhanced similarity search. In an implementation, a search environment includes one or more computing hardware, software, and/or firmware components in support of enhanced similarity search. The one or more components identify a modality for a similarity search with respect to a query object. The components generate an embedding for the query object based on the modality and based on connections between the query object and neighboring nodes in a graph. The embedding for the query object provides the basis for the search for similar objects.
ACTIVITY RECOGNITION SYSTEMS AND METHODS
An activity recognition system is disclosed. A plurality of temporal features is generated from a digital representation of an observed activity using a feature detection algorithm. An observed activity graph comprising one or more clusters of temporal features generated from the digital representation is established, wherein each one of the one or more clusters of temporal features defines a node of the observed activity graph. At least one contextually relevant scoring technique is selected from similarity scoring techniques for known activity graphs, the at least one contextually relevant scoring technique being associated with activity ingestion metadata that satisfies device context criteria defined based on device contextual attributes of the digital representation, and a similarity activity score is calculated for the observed activity graph as a function of the at least one contextually relevant scoring technique, the similarity activity score being relative to at least one known activity graph.
ACTIVITY RECOGNITION SYSTEMS AND METHODS
An activity recognition system is disclosed. A plurality of temporal features is generated from a digital representation of an observed activity using a feature detection algorithm. An observed activity graph comprising one or more clusters of temporal features generated from the digital representation is established, wherein each one of the one or more clusters of temporal features defines a node of the observed activity graph. At least one contextually relevant scoring technique is selected from similarity scoring techniques for known activity graphs, the at least one contextually relevant scoring technique being associated with activity ingestion metadata that satisfies device context criteria defined based on device contextual attributes of the digital representation, and a similarity activity score is calculated for the observed activity graph as a function of the at least one contextually relevant scoring technique, the similarity activity score being relative to at least one known activity graph.
Automated video segmentation
Methods and systems for automated video segmentation are disclosed. A sequence of video frames having video segments of contextually-related sub-sequences may be received. Each frame may be labeled according to segment and segment class. A video graph may be constructed in which each node corresponds to a different frame, and each edge connects a different pair of nodes, and is associated with a time between video frames and a similarity metric of the connected frames. An artificial neural network (ANN) may be trained to predict both labels for the nodes and clusters of the nodes corresponding to predicted membership among the segments, using the video graph as input to the ANN, and ground-truth clusters of ground-truth labeled nodes. The ANN may be further trained to predict segment classes of the predicted clusters, using the segment classes as ground truths. The trained ANN may be configured for application runtime video sequences.
Automated video segmentation
Methods and systems for automated video segmentation are disclosed. A sequence of video frames having video segments of contextually-related sub-sequences may be received. Each frame may be labeled according to segment and segment class. A video graph may be constructed in which each node corresponds to a different frame, and each edge connects a different pair of nodes, and is associated with a time between video frames and a similarity metric of the connected frames. An artificial neural network (ANN) may be trained to predict both labels for the nodes and clusters of the nodes corresponding to predicted membership among the segments, using the video graph as input to the ANN, and ground-truth clusters of ground-truth labeled nodes. The ANN may be further trained to predict segment classes of the predicted clusters, using the segment classes as ground truths. The trained ANN may be configured for application runtime video sequences.
Generating scene graphs from digital images using external knowledge and image reconstruction
Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.
Generating scene graphs from digital images using external knowledge and image reconstruction
Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.
Data processing device, data processing method, and data processing program for evaluation an object based on a persistence diagram from time-series image data
According to one embodiment, a data processing device includes an acquisition part, and a processor. The acquisition part is configured to acquire first data including time-series image data. The processor is configured to derive first feature information based on a multidimensional array of n dimensions based on the first data acquired by the acquisition part. n is an integer not less than 3. A first axis of the multidimensional array is related to time.
Data processing device, data processing method, and data processing program for evaluation an object based on a persistence diagram from time-series image data
According to one embodiment, a data processing device includes an acquisition part, and a processor. The acquisition part is configured to acquire first data including time-series image data. The processor is configured to derive first feature information based on a multidimensional array of n dimensions based on the first data acquired by the acquisition part. n is an integer not less than 3. A first axis of the multidimensional array is related to time.
Storage medium, determination method, and information processing apparatus
A non-transitory computer-readable storage medium storing a determination program that causes at least one computer to execute a process, the process includes obtaining a plurality of pair images of a person obtained from a overlapping region of images captured by each of a plurality of cameras; generating a directed graph including nodes corresponding to person features obtained from each of a plurality of person images included in the plurality of obtained pair images; acquiring weights of links between the nodes in the generated directed graph based on a number of person images with similar person features between the nodes; and determining a combination of the person features in which a number of the person images with the similar person features in the plurality of pair images is maximized based on the acquired weights of the links.