Patent classifications
G06V10/806
IMAGE PROCESSING METHOD, APPARATUS AND STORAGE MEDIUM
The present disclosure relates to an image processing method and apparatus, an electronic device and a storage medium. The method includes: according to first features of a plurality of first images to be processed, determining respectively a density of each of the first feature; determining density chain information corresponding to a target feature according to the density of the target feature, wherein the target feature is any one of the first features, the density chain information corresponding to the target feature includes N features, an i.sup.th feature of the N features is one of first nearest neighbor features of an (i−1).sup.th feature, and the density of the i.sup.th feature is greater than the density of the (i−1).sup.th feature; adjusting respectively each of the first features according to the density chain information corresponding to each of the first features to obtain second features of the plurality of first images; and clustering the second features of the plurality of first images to obtain a processing result of the plurality of first images. The embodiments of the present disclosure can improve the effect of clustering images.
Video surveillance system
In the video surveillance system of the present invention, Because the multi-channel surveillance videos are integrated into a virtual surveillance scene for panoramic viewing, it is possible to view surveillance videos from the plurality of channels at the same time, reduce the viewing time, and improve the efficiency. In addition, since the surveillance picture is not much different within the same second or even a few seconds, and in the present invention, one frame of image is extracted at the same time point for a video of the multi-channel surveillance videos, and a next frame of image is extracted after a predetermined time interval, instead of extracting all the images to have a view, and thus it is possible to improve the efficiency and not to miss important video information.
URBAN REMOTE SENSING IMAGE SCENE CLASSIFICATION METHOD IN CONSIDERATION OF SPATIAL RELATIONSHIPS
An urban remote sensing image scene classification method in consideration of spatial relationships is provided and includes following steps of: cutting a remote sensing image into sub-images in an even and non-overlapping manner; performing a visual information coding on each of the sub-images to obtain a feature image Fv; inputting the feature image Fv into a crossing transfer unit to obtain hierarchical spatial characteristics; performing convolution of dimensionality reduction on the hierarchical spatial characteristics to obtain dimensionality-reduced hierarchical spatial characteristics; and performing a softmax model based classification on the dimensionality-reduced hierarchical spatial characteristics to obtain a classification result. The method comprehensively considers the role of two kinds of spatial relationships being regional spatial relationship and long-range spatial relationship in classification, and designs three paths in a crossing transfer unit for relationships fusion, thereby obtaining a better urban remote sensing image scene classification result.
VISUAL RELATIONSHIP DETECTION METHOD AND SYSTEM BASED ON REGION-AWARE LEARNING MECHANISMS
The present invention discloses a visual relationship detection method based on a region-aware learning mechanism, comprising: acquiring a triplet graph structure and combining features after its aggregation with neighboring nodes, using the features as nodes in a second graph structure, and connecting in accordance with equiprobable edges to form the second graph structure; combining node features of the second graph structure with features of corresponding entity object nodes in the triplet, using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects, and using the merged region visual features as visual features to be used in the next message propagation by corresponding entity object nodes in the triplet; and after a certain number of times of message propagations, combining the output triplet node features and the node features of the second graph structure to infer predicates between object sets.
SLOT FILLING WITH CONTEXTUAL INFORMATION
A system, method and non-transitory computer readable medium for editing images with verbal commands are described. Embodiments of the system, method and non-transitory computer readable medium may include an artificial neural network (ANN) comprising a word embedding component configured to convert text input into a set of word vectors, a feature encoder configured to create a combined feature vector for the text input based on the word vectors, a scoring layer configured to compute labeling scores based on the combined feature vectors, wherein the feature encoder, the scoring layer, or both are trained using multi-task learning with a loss function including a first loss value and an additional loss value based on mutual information, context-based prediction, or sentence-based prediction, and a command component configured to identify a set of image editing word labels based on the labeling scores.
Advanced driver assist systems and methods of detecting objects in the same
An advanced driver assist system (ADAS) may obtain a video sequence including a plurality of frames captured at the vehicle, each frame corresponding to a separate stereo image including a first viewpoint image and a second viewpoint image; generate disparity information associated with a stereo image; obtain depth information associated with an object included in the stereo image based on reflected electromagnetic waves captured at the vehicle; calculate correlation information between the depth information and the disparity information based on the stereo image, the depth information and the disparity information; and correct depth values associated with the stereo image based on the disparity information and the correlation information to generate a depth image with respect to the stereo image. The ADAS may detecting the at least one object in the stereo image, based on the depth image, and may generate an output signal based on the detection.
MULTI-MODAL, MULTI-TECHNIQUE VEHICLE SIGNAL DETECTION
A vehicle includes one or more cameras that capture a plurality of two-dimensional images of a three-dimensional object. A light detector and/or a semantic classifier search within those images for lights of the three-dimensional object. A vehicle signal detection module fuses information from the light detector and/or the semantic classifier to produce a semantic meaning for the lights. The vehicle can be controlled based on the semantic meaning. Further, the vehicle can include a depth sensor and an object projector. The object projector can determine regions of interest within the two-dimensional images, based on the depth sensor. The light detector and/or the semantic classifier can use these regions of interest to efficiently perform the search for the lights.
MULTI-MODAL, MULTI-TECHNIQUE VEHICLE SIGNAL DETECTION
A vehicle includes one or more cameras that capture a plurality of two-dimensional images of a three-dimensional object. A light detector and/or a semantic classifier search within those images for lights of the three-dimensional object. A vehicle signal detection module fuses information from the light detector and/or the semantic classifier to produce a semantic meaning for the lights. The vehicle can be controlled based on the semantic meaning. Further, the vehicle can include a depth sensor and an object projector. The object projector can determine regions of interest within the two-dimensional images, based on the depth sensor. The light detector and/or the semantic classifier can use these regions of interest to efficiently perform the search for the lights.
METHOD AND APPARATUS FOR PERFORMING STRUCTURED EXTRACTION ON TEXT, DEVICE AND STORAGE MEDIUM
Embodiments of the present disclosure provide a method and apparatus for performing a structured extraction on a text, a device and a storage medium. The method may include: performing a text detection on an entity text image to obtain a position and content of a text line of the entity text image; extracting multivariate information of the text line based on the position and the content of the text line; performing a feature fusion on the multivariate information of the text line to obtain a multimodal fusion feature of the text line; performing category and relationship reasoning based on the multimodal fusion feature of the text line to obtain a category and a relationship probability matrix of the text line; and constructing structured information of the entity text image based on the category and the relationship probability matrix of the text line.
Portrait Segmentation Method, Model Training Method and Electronic Device
Embodiments of the present disclosure provide a portrait segmentation method, a model training method, and an electronic device. The input portrait segmentation request is received, the to-be-segmented image is obtained according to the portrait segmentation request, and the pre-trained portrait segmentation model is invoked to segment the to-be-segmented image into a portrait part and a background part. The portrait segmentation model includes a feature extraction network and a double branch network. The double branch network includes a portrait branch network and a background branch network with a same architecture. The portrait branch network is configured to accurately classify the portrait in the image, and the background branch network is configured to accurately classify the background in the image. Finally, the classification results of the two are fused to split the image into a portrait part and a background part.