Patent classifications
G06V10/806
Aggregating Nested Vision Transformers
A method includes receiving image data including a series of image patches of an image. The method includes generating, using a first set of transformers of a vision transformer (V-T) model, a first set of higher order feature representations based on the series of image patches and aggregating the first set of higher order feature representations into a second set of higher order feature representations that is smaller than the first set. The method includes generating, using a second set of transformers of the V-T model, a third set of higher order feature representations based on the second set of higher order feature representations and aggregating the third set of higher order feature representations into a fourth set of higher order feature representations that is smaller than the third set. The method includes generating, using the V-T model, an image classification of the image based on the fourth set.
Fine-motion virtual-reality or augmented-reality control using radar
This document describes techniques for fine-motion virtual-reality or augmented-reality control using radar. These techniques enable small motions and displacements to be tracked, even in the millimeter or sub-millimeter scale, for user control actions even when those actions are small, fast, or obscured due to darkness or varying light. Further, these techniques enable fine resolution and real-time control, unlike conventional RF-tracking or optical-tracking techniques.
APPARATUS AND METHOD FOR IMAGE CLASSIFICATION AND SEGMENTATION BASED ON FEATURE-GUIDED NETWORK, DEVICE, AND MEDIUM
The present invention provides an apparatus and method for image classification and segmentation based on a feature-guided network, a device, and a medium, and belongs to the technical field of deep learning. A feature-guided classification network and feature-guided segmentation network of the present invention include basic unit blocks. A local feature is enhanced and a global feature is extracted among the basic unit blocks. This resolves a problem that features are not fully utilized in existing image classification and image segmentation network models. In this way, a trained feature-guided classification network and feature-guided segmentation network have better effects and are more robust. The present invention selects the feature-guided classification network or the feature-guided segmentation network based on a requirement of an input image and outputs a corresponding category or segmented image, to resolve a problem that the existing classification or segmentation network model has an unsatisfactory classification or segmentation effect.
DETECTING AN OBJECT IN AN IMAGE USING MULTIBAND AND MULTIDIRECTIONAL FILTERING
A detection method includes performing multiband filtering on a first area to obtain a plurality of band sub-images, the first area being an area in a first video frame, and performing multidirectional filtering on the plurality of band sub-images to obtain a plurality of direction sub-images. The method further includes acquiring a direction-band fused feature of the first area according to the plurality of direction sub-images, and inputting the direction-band fused feature into a detection model, and performing detection based on the direction-band fused feature using the detection model to detect whether the first area comprises an object.
TERM WEIGHT GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM
A term weight determination method includes: obtaining a video and video-associated text, the video-associated text including at least one term; generating a halfway vector of the term by performing multimodal feature fusion on the features of the video, the video-associated text and the at least one term; and generating the weight of the at least one term based on the halfway vector of the at least one term.
METHODS FOR MANAGING CLEANING ROUTES IN SMART CITIES, INTERNET OF THINGS SYSTEMS, AND STORAGE MEDIUMS
The disclosure provides a method for managing a cleaning route in a smart city, an Internet of Things system and storage medium. The method includes: obtaining target information of a target area within a preset time period based on an object platform; sending the target information to a management platform through a sensor network platform; determining a cleaning route of the target area by processing the target information of the target area based on the management platform, including: determining an estimated amount of fallen leaves of each section of road in the target area; determining an estimated falling range of fallen leaves of each section of road in the target area; determining a cleaning difficulty evaluation value of each section of road based on the estimated amount of fallen leaves and the estimated falling range; and determining a cleaning route based on the cleaning difficulty evaluation value.
MULTIMODAL FUSION FOR DIAGNOSIS, PROGNOSIS, AND THERAPEUTIC RESPONSE PREDICTION
Systems and methods can quantify the tumor microenvironment for diagnosis, prognosis and therapeutic response prediction by fusing different data types (e.g., morphological information from histology and molecular information from omics) using an algorithm that harnesses deep learning. The algorithm employs tensor fusion to provide end-to-end multimodal fusion to model the pairwise interactions of features across multiple modalities (e.g., histology and molecular features) and deep learning. The systems and methods improve upon traditional methods for quantifying the tumor microenvironment that rely on concatenation of extracted features.
LEARNING APPARATUS, A LEARNING METHOD, OBJECT DETECTING APPARATUS, OBJECT DETECTING METHOD, AND RECORDING MEDIUM
A learning apparatus includes an environment information acquisition unit and a learning unit. The environment information acquisition unit acquires environment information concerning a learning image. The learning unit detects an object detection model that detects each target object included in the learning image by using the environment information.
VISION-LiDAR FUSION METHOD AND SYSTEM BASED ON DEEP CANONICAL CORRELATION ANALYSIS
A vision-LiDAR fusion method and system based on deep canonical correlation analysis are provided. The method comprises: collecting RGB images and point cloud data of a road surface synchronously; extracting features of the RGB images to obtain RGB features; performing coordinate system conversion and rasterization on the point cloud data in turn, and then extracting features to obtain point cloud features; inputting point cloud features and RGB features into a pre-established and well-trained fusion model at the same time, to output feature-enhanced fused point cloud features, wherein the fusion model fuses RGB features to point cloud features by using correlation analysis and in combination with a deep neural network; and inputting the fused point cloud features into a pre-established object detection network to achieve object detection. A similarity calculation matrix is utilized to fuse two different modal features.
GENERATING SYNTHESIZED DIGITAL IMAGES UTILIZING A MULTI-RESOLUTION GENERATOR NEURAL NETWORK
This disclosure describes methods, non-transitory computer readable storage media, and systems that generate synthetized digital images via multi-resolution generator neural networks. The disclosed system extracts multi-resolution features from a scene representation to condition a spatial feature tensor and a latent code to modulate an output of a generator neural network. For example, the disclosed systems utilizes a base encoder of the generator neural network to generate a feature set from a semantic label map of a scene. The disclosed system then utilizes a bottom-up encoder to extract multi-resolution features and generate a latent code from the feature set. Furthermore, the disclosed system determines a spatial feature tensor by utilizing a top-down encoder to up-sample and aggregate the multi-resolution features. The disclosed system then utilizes a decoder to generate a synthesized digital image based on the spatial feature tensor and the latent code.