Patent classifications
G06V10/806
METHOD AND SYSTEM FOR DETECTING SCENE TEXT
A method and a system for detecting a scene text are provided. The method includes: acquiring a scene text picture in a preset manner, pre-processing the acquired scene text picture, detecting the pre-processed scene text picture with a training model for scene text detection, and acquiring a detection result. Accordingly, the method and the system have an original PSENet (Progressive Scale Expansion Network) backbone network ResNet (Deep Residual Network) is replaced with a rich feature structure network (i.e. Res2NeXt (Combination of Res2Net and ResNeXt)) to improve a network feature extraction capability, thereby increasing a text detection precision of the network; mixed pooling is added at an appropriate location in the backbone network to acquire useful context information by performing pooling operations of different kernel shapes and capture long and short distance dependency relationships between different locations, thereby further increasing the text detection precision of the network.
UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION METHOD BASED ON STYLE-CONTENT SEPARATION
The embodiments of this disclosure disclose an unsupervised image-to-image translation method. A specific implementation of this method comprises: obtaining an initial image, and zooming the initial image to a specific size; performing spatial feature extraction on the initial image to obtain feature information; inputting the feature information to a style-content separation module to obtain content feature information and style feature information; generating reference style feature information of a reference image in response to obtaining the reference image, and setting the reference style feature information as a Gaussian noise in response to not obtaining the reference image; inputting the content feature information and the reference style feature information into a generator to obtain a target image; and zooming the target image to obtain a final target image. This implementation can be applied to a variety of different high-level visual tasks, and improve the expandability of the whole system.
ALL-WEATHER TARGET DETECTION METHOD BASED ON VISION AND MILLIMETER WAVE FUSION
An all-weather target detection method based on a vision and millimeter wave fusion includes: simultaneously acquiring continuous image data and point cloud data using two types of sensors of a vehicle-mounted camera and a millimeter wave radar; pre-processing the image data and point cloud data; fusing the pre-processed image data and point cloud data by using a pre-established fusion model, and outputting a fused feature map; and inputting the fused feature map into a YOLOv5 detection network for detection, and outputting a target detection result by non-maximum suppression. The method fully fuses millimeter wave radar echo intensity and distance information with the vehicle-mounted camera images. It analyzes different features of a millimeter wave radar point cloud and fuses the features with image information by using different feature extraction structures and ways, so that the advantages of the two types of sensor data complement each other.
METHOD AND DEVICE FOR TRAINING MULTI-TASK RECOGNITION MODEL AND COMPUTER-READABLE STORAGE MEDIUM
A method for training a multi-task recognition model includes: obtaining a number of sample images, wherein some of the sample images are to provide feature-independent facial attributes, some of the sample images are to provide feature-coupled facial attributes, and some of the sample images are to provide facial attributes of face poses; training an initial feature-sharing model based on a first set of sample images to obtain a first feature-sharing model; training the first feature-sharing model based on the first set of sample images and a second set of sample images to obtain a second feature-sharing model with a loss value less than a preset second threshold; obtaining an initial multi-task recognition model by adding a feature decoupling model to the second feature-sharing model; and training the initial multi-task recognition model based on the sample images to obtain a trained multi-task recognition model.
SYSTEM AND METHOD FOR HIERARCHICAL MULTI-LEVEL FEATURE IMAGE SYNTHESIS AND REPRESENTATION
A method for processing breast tissue image data includes processing the image data to generate a set of image slices collectively depicting the patient's breast; for each image slice, applying one or more filters associated with a plurality of multi-level feature modules, each configured to represent and recognize an assigned characteristic or feature of a high-dimensional object; generating at each multi-level feature module a feature map depicting regions of the image slice having the assigned feature; combining the feature maps generated from the plurality of multi-level feature modules into a combined image object map indicating a probability that the high-dimensional object is present at a particular location of the image slice; and creating a 2D synthesized image identifying one or more high-dimensional objects based at least in part on object maps generated for a plurality of image slices.
OBSTACLE RECOGNITION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
An obstacle recognition method and apparatus, a computer device, and a storage medium are provided. The method comprises: acquiring point cloud data scanned by LiDAR and time-sequence pose information of a vehicle; determining a spliced image of an eye bird view according to the point cloud data, the time-sequence pose information, and a historical frame embedded image; inputting the spliced image into a preset first CNN model to obtain a current frame embedded image and pixel-level information of the eye bird view; determining recognition information of at least one obstacle according to the current frame embedded image and pixel-level information.
LEARNING DEVICE, INFERENCE DEVICE, CONTROL METHOD AND STORAGE MEDIUM
The learning device 10D is learned to extract moving image feature amount Fm which is feature amount relating to the moving image data Dm when the moving image data Dm is inputted thereto, and is learned to extract still image feature amount Fs which is feature amount relating to the still image data Ds when the still image data Ds is inputted thereto. The first inference unit 32D performs a first inference regarding the moving image data Dm based on the moving image feature amount Fm. The second inference unit 34D performs a second inference regarding the still image data Ds based on the still image feature amount Fs. The learning unit 36D performs learning of the feature extraction unit 31D based on the results of the first inference and the second inference.
TARGET DETECTION METHOD BASED ON FUSION OF PRIOR POSITIONING OF MILLIMETER-WAVE RADAR AND VISUAL FEATURE
A target detection method based on the fusion of prior positioning of a millimeter-wave radar and a visual feature includes: simultaneously obtaining, based on the millimeter-wave radar and a vehicle-mounted camera after being calibrated, point cloud data of the millimeter-wave radar and a camera image; performing spatial 3D coordinate transformation on the point cloud data to project transformed point cloud data onto a camera plane; generating a plurality of anchor samples based on projected point cloud data according to a preset anchor strategy, and obtaining a final anchor sample based on a velocity-distance weight of each candidate region; fusing RGB information of the camera image and intensity information of an RCS in the point cloud data to obtain a feature of the final sample; and inputting the feature of the final anchor sample into a detection network to generate category and position information of a target in a scenario.
SYSTEM AND METHOD FOR EFFICIENT VISUAL NAVIGATION
A method, apparatus and system for efficient navigation in a navigation space includes determining semantic features and respective 3D positional information of the semantic features for scenes of captured image content and depth-related content in the navigation space, combining information of the determined semantic features of the scene with respective 3D positional information using neural networks to determine an intermediate representation of the scene which provides information regarding positions of the semantic features in the scene and spatial relationships among the sematic features, and using the information regarding the positions of the semantic features and the spatial relationships among the sematic features in a machine learning process to provide at least one of a navigation path in the navigation space, a model of the navigation space, and an explanation of a navigation action by the single, mobile agent in the navigation space.
Road condition determination method and road condition determination device
In the present invention, when a road surface condition is determined based on information acquired by a camera installed in a vehicle, a route of a host vehicle is predicted and the is determined. A road surface condition determination method and device determines a road surface condition of a predicted route based on information acquired by a camera installed in a host vehicle. A controller predicts a route of the host vehicle by determining a road surface friction coefficient of the predicted route based on information acquired by the camera. The determining of the road surface friction coefficient of the predicted route includes: dividing an ahead-of-vehicle image acquired by the camera in a left-right direction and determining a road surface condition for each of the determination areas, and determining the road surface friction coefficient in the determination areas through which the predicted route will pass.