G06V10/806

ELECTRONIC DEVICE AND METHOD WITH FACE KEY POINTS DETECTION

An electronic device includes a memory configured to store instructions, and a processor configured to execute the instructions to configure the processor to obtain a first heat map feature and a first coordinate value feature based on a face image, and detect a face key point based on the first heat map feature and the first coordinate value feature.

Object prediction method and apparatus, and storage medium

The present application relates to an object prediction method and apparatus, an electronic device, and a storage medium. The method is applied to a neural network and includes: performing feature extraction processing on a to-be-predicted object to obtain feature information of the to-be-predicted object; determining multiple intermediate prediction results for the to-be-predicted object according to the feature information; performing fusion processing on the multiple intermediate prediction results to obtain fusion information; and determining multiple target prediction results for the to-be-predicted object according to the fusion information. According to embodiments of the present application, feature information of a to-be-predicted object may be extracted; multiple intermediate prediction results for the to-be-predicted object are determined according to the feature information; fusion processing is performed on the multiple intermediate prediction results to obtain fusion information; and multiple target prediction results for the to-be-predicted object are determined according to the fusion information. The method facilitates improving the accuracy of multiple target prediction results.

Urban remote sensing image scene classification method in consideration of spatial relationships
11710307 · 2023-07-25 · ·

An urban remote sensing image scene classification method in consideration of spatial relationships is provided and includes following steps of: cutting a remote sensing image into sub-images in an even and non-overlapping manner; performing a visual information coding on each of the sub-images to obtain a feature image Fv; inputting the feature image Fv into a crossing transfer unit to obtain hierarchical spatial characteristics; performing convolution of dimensionality reduction on the hierarchical spatial characteristics to obtain dimensionality-reduced hierarchical spatial characteristics; and performing a softmax model based classification on the dimensionality-reduced hierarchical spatial characteristics to obtain a classification result. The method comprehensively considers the role of two kinds of spatial relationships being regional spatial relationship and long-range spatial relationship in classification, and designs three paths in a crossing transfer unit for relationships fusion, thereby obtaining a better urban remote sensing image scene classification result.

IMAGE DATA PROCESSING METHOD AND APPARATUS

An image data processing method and apparatus are provided. In a technical solution provided by embodiments of this disclosure, M object feature maps with different sizes are obtained by extracting a source image. While classification confidence levels corresponding to pixel points in each of the object feature maps are acquired, initial predicted polar radii corresponding to the pixel points in each of the object feature maps may also be acquired. The initial predicted polar radii are refined based on polar radius deviations corresponding to the contour sampling points in each of the object feature maps, to acquire target predicted polar radii corresponding to the pixel points in each of the object feature maps. Then the object edge shape of a target object contained in the source image can be determined based on the target predicted polar radii and the classification confidence levels.

ILLEGAL BUILDING IDENTIFICATION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM

Provided are an illegal building identification method and apparatus, a device, and a storage medium, which relate to the field of cloud computing. The specific implementation scheme is: acquiring a target image and a reference image associated with the target image; extracting a target building feature of the target image and a reference building feature of the reference image, respectively; and determining, according to the target building feature and the reference building feature, an illegal building identification result of the target image.

METHOD AND APPARATUS FOR RETRIEVING TARGET

A method and an apparatus for retrieving a target are provided. The method may include: obtaining at least one image and a description text of a designated object; extracting image features of the image and text features of the description text by using a pre-trained cross-media feature extraction network; and matching the image features with the text features to determine an image that contains the designated object.

METHOD FOR CONTENT RECOMMENDATION AND DEVICE
20230004608 · 2023-01-05 ·

A content recommendation method that includes: acquiring content cover images corresponding to multiple pieces of content accessed by a user account; acquiring cover image features of the multiple content cover images, and determining user account features of the user account according to the cover image features of the multiple content cover images; on the basis of cover image features of content to be recommended and the user account features, determining an access probability value of the user account accessing the content to be recommended; and providing, according to the access probability value, the content to be recommended to the user account.

RECOGNITION APPARATUS, RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
20230236047 · 2023-07-27 · ·

According to one embodiment, a recognition apparatus includes processing circuitry. The processing circuitry generates a first feature quantity exhibiting a feature of sensor data based on the sensor data, converts the first feature quantity into a second feature quantity exhibiting a feature contributing to identification of a class of the sensor data, generates a significant feature quantity exhibiting a feature that is significant in the identification of the class based on a cross-correlation between the first feature quantity and the second feature quantity, generates an integrated feature quantity considering features of the first feature quantity and the second feature quantity, based on the second feature quantity and the significant feature quantity, and identifies the class based on the integrated feature quantity.

Machine learning based models for object recognition

Machine learning based models recognize objects in images. Specific features of the object are extracted from the image using machine learning based models. The specific features extracted from the image assist deep learning based models in identifying subtypes of a type of object. The system recognizes the objects and collections of objects and determines whether the arrangement of objects violates any predetermined policies. For example, a policy may specify relative positions of different types of objects, height above ground at which certain types of objects are placed, or an expected number of certain types of objects in a collection.

VIDEO CLIP POSITIONING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230024382 · 2023-01-26 ·

This application discloses a video clip positioning method performed by a computer device. In this application, clip features of video clips in a video are determined according to the unit features of video units within the video clips, so that the acquired clip features integrate the features of the video units and the time sequence correlation between the video units; and then the clip features of the video clips and a text feature of a target text are fused. The features of video clip dimensions and the time sequence correlation between the video clips are fully used in the feature fusion process, so that more accurate attention weights can be acquired based on the fused features. The attention weights are used to represent matching degrees between the video clips and the target text, and then a target video clip matching the target text can be positioned more accurately.