G06V10/806

IMAGE CLASSIFICATION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230290120 · 2023-09-14 ·

Disclosed are an image classification method performed by a computer device. The method includes: acquiring an image feature of a pathological image; extracting, for each scale in multiple scales, a local feature corresponding to the scale from the image feature; splicing the local features respectively corresponding to the scales to obtain a spliced image feature; and classifying the spliced image feature to obtain a category to which the pathological image belongs. According to the method provided in the embodiments of this application, the local features corresponding to different scales contain different information, so that the finally obtained spliced image feature contains feature information corresponding to different scales, and the feature information of the spliced image feature is enriched. The category to which the pathological image belongs is determined based on the spliced image feature, so that the accuracy of the category is ensured.

OBJECT DETECTION DEVICE AND METHOD
20230290104 · 2023-09-14 · ·

An object detection device includes a processor that executes a procedure. The procedure includes: converting an input image into a first vector such that information related to an area of an object in the image is contained in the first vector; converting input text into a second vector such that information related to an order of appearance in the text of one or more word strings each indicating a detection target object included in the text is contained in the second vector; generating a third vector in which the first vector and the second vector have been reflected in a vector of initial values corresponding to detection target objects; and estimating whether or not a feature indicated by the third vector corresponds to a detection target object that appears at which number place in the text, and estimating a position of the detection target object in the image.

System and method for tracking detected objects

Systems and methods for tracking objects are disclosed herein. In one embodiment, a system having a processor merges features of detected objects extracted from a point cloud and a corresponding image to generate fused features for the detected objects, generates a learned distance metric for the detected objects using the fused features, determines matched detected objects and unmatched detected objects, applies prior tracking identifiers of the detected objects at the prior time to the matched detected objects, determines a confidence score for the fused features of the unmatched detected objects, and applies new tracking identifiers to the unmatched detected objects based on the confidence score.

HIERARCHICAL OCCLUSION MODULE AND UNSEEN OBJECT AMODAL INSTANCE SEGMENTATION SYSTEM AND METHOD USING THE SAME

The hierarchical occlusion inference method according to the exemplary embodiment of the present disclosure includes: deriving a bounding box feature of the object instance by receiving a region of interest color-depth FPN feature and the object region of interest feature derived from a cluttered scene image including at least one object instance, deriving a visible feature of the object instance by fusing the object region of interest feature and the bounding box feature, deriving an amodal feature of the object instance by fusing the object region of interest feature, the bounding box feature, and the visible feature, deriving an occlusion feature of the object instance by fusing the object region of interest feature, the bounding box feature, the visible feature, and the amodal feature, and inferring occlusion of an object instance by de-convoluting the occlusion feature of the object instance.

TABLE RECOGNITION METHOD AND APPARATUS AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

Disclosed are a table recognition method and apparatus. The table recognition method includes steps of obtaining an image vision feature and a character content feature of a table image; fusing the image vision feature and the character content feature of the table image to acquire a first fusion feature, and carrying out recognition based on the first fusion feature to acquire a table structure; and performing, based on the table structure, character recognition on the table image to acquire table character contents.

DISEASE PREDICTION SYSTEM AND APPARATUS BASED ON MULTI-RELATION FUNCTIONAL CONNECTIVITY MATRIX
20230290514 · 2023-09-14 ·

Disclosed are a disease prediction method, system and apparatus based on a multi-relation functional connectivity matrix. A Pearson correlation coefficient matrix and a DTW distance matrix are respectively calculated according to resting state functional magnetic resonance time series extracted from a brain atlas, the DTW distance matrix is converted in combination with the Pearson correlation coefficient matrix into a DTW′ matrix which includes correlation degree and correlation direction information and whose numerical range is equivalent to the value range of a Pearson coefficient, and a functional connectivity matrix is obtained after weighted combination. The present disclosure combines DTW distance information to weaken the dynamic change of functional connectivity and the influence of asynchrony of functional signals in different brain regions on the functional connectivity matrix, so that the calculated functional connectivity matrix can better reflect the correlation between the functional signals in different brain regions.

Squeeze-enhanced axial transformer, its layer and methods thereof

A squeeze-enhanced axial transformer (SeaFormer) for mobile semantic segmentation is disclosed, including a shared stem, a context branch, a spatial branch, a fusion module and a light segmentation head, wherein the shared stem produces a feature map; the context branch obtains context-rich information; the spatial branch obtains spatial information; the fusion module incorporates the features in the context branch into the spatial branch; and the light segmentation head receives the feature from the fusion module and output the results. This application is also related to the layer of the SeaFormer, as well as methods thereof.

Gesture language recognition method and apparatus, computer-readable storage medium, and computer device

A gesture language recognition method is provided. In the method, a first video is obtained. Gesture features are extracted from frames of images in the first video. Gesture change features are extracted from the frames of the images in the first video. Gesture language word information is extracted from fused features that are determined based on the gesture features and the gesture change features. The gesture language word information is combined into a gesture language sentence according to context information corresponding to the gesture language word information.

AUTONOMOUS DRIVING WITH SURFEL MAPS

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using a surfel map to generate a prediction for a state of an environment. One of the methods includes obtaining surfel data comprising a plurality of surfels, wherein each surfel corresponds to a respective different location in an environment, and each surfel has associated data that comprises an uncertainty measure; obtaining sensor data for one or more locations in the environment, the sensor data having been captured by one or more sensors of a first vehicle; determining one or more particular surfels corresponding to respective locations of the obtained sensor data; and combining the surfel data and the sensor data to generate a respective object prediction for each of the one or more locations of the obtained sensor data.

SENSOR FUSION FOR AUTONOMOUS MACHINE APPLICATIONS USING MACHINE LEARNING

In various examples, a multi-sensor fusion machine learning model – such as a deep neural network (DNN) – may be deployed to fuse data from a plurality of individual machine learning models. As such, the multi-sensor fusion network may use outputs from a plurality of machine learning models as input to generate a fused output that represents data from fields of view or sensory fields of each of the sensors supplying the machine learning models, while accounting for learned associations between boundary or overlap regions of the various fields of view of the source sensors. In this way, the fused output may be less likely to include duplicate, inaccurate, or noisy data with respect to objects or features in the environment, as the fusion network may be trained to account for multiple instances of a same object appearing in different input representations.