Patent classifications
G06V10/86
System and method for verifying user by security token combined with biometric data processing techniques
Embodiments of the inventive concept provide a system and method that verifies a user through a security token combined with biometric information processing techniques capable of changing and canceling the conventional encryption key without storing biometric information-related data corresponding to a user's personal information.
END-TO-END SIGNALIZED INTERSECTION TRANSITION STATE ESTIMATOR WITH SCENE GRAPHS OVER SEMANTIC KEYPOINTS
Systems, methods, computer-readable media, techniques, and methodologies are disclosed for performing end-to-end, learning-based keypoint detection and association. A scene graph of a signalized intersection is constructed from an input image of the intersection. The scene graph includes detected keypoints and linkages identified between the keypoints. The scene graph can be used along with a vehicle's localization information to identify which keypoint that represents a traffic signal is associated with the vehicle's current travel lane. An appropriate vehicle action may then be determined based on a transition state of the traffic signal keypoint and trajectory information for the vehicle. A control signal indicative of this vehicle action may then be output to cause an autonomous vehicle, for example, to implement the appropriate vehicle action.
SYSTEMS AND METHODS FOR RETRIEVING VIDEOS USING NATURAL LANGUAGE DESCRIPTION
Implementations are directed to methods, systems, and computer-readable media for obtaining videos and extracting, from each video, a key frame for the video including a timestamp. For each key frame, a scene graph is generated. Generating the scene graph for the key frame includes identifying, objects in the image, and extracting a relationship feature defining a relationship between a first object and a second, different object of the objects in the key frame. The scene graph for the key frame is generated that includes a set of nodes and a set of edges. A natural language query request for a video is received, including terms defining a relationship between two or more particular objects. A query graph is generated for the natural language query request, and a set of videos corresponding to the set of scene graphs matching the query graph are provided for display on a user device.
SYSTEMS AND METHODS FOR RETRIEVING VIDEOS USING NATURAL LANGUAGE DESCRIPTION
Implementations are directed to methods, systems, and computer-readable media for obtaining videos and extracting, from each video, a key frame for the video including a timestamp. For each key frame, a scene graph is generated. Generating the scene graph for the key frame includes identifying, objects in the image, and extracting a relationship feature defining a relationship between a first object and a second, different object of the objects in the key frame. The scene graph for the key frame is generated that includes a set of nodes and a set of edges. A natural language query request for a video is received, including terms defining a relationship between two or more particular objects. A query graph is generated for the natural language query request, and a set of videos corresponding to the set of scene graphs matching the query graph are provided for display on a user device.
Image Recognition Method and Related Device
In an image recognition method, a terminal determines, based on first positioning information, target object information corresponding to building information in a to-be-recognized image in desensitized map data. The desensitized map data does not include a sensitive building. Then, when the terminal determines that the target object information does not include the building information, the terminal determines that the map data does not include the building information. In this case, the terminal determines to recognize the building information as a sensitive building. In other words, the terminal may recognize, by using the desensitized map data, building information corresponding to a sensitive building in the to-be-recognized image.
GRAPHICAL DIAGRAM COMPARISON
In an example, a method to compare graphical diagrams may include comparing a first set of graphical objects of a first graphical diagram to a second set of graphical objects of a second graphical diagram. The method may include, in response to one or more identifying properties of a given graphical object of the first set matching one or more identifying properties of a given graphical object of the second set, declaring the given graphical object of the first set and the given graphical object of the second set a matched pair. The method may include marking each unmatched graphical object a deleted graphical object or an added graphical object. The method may include generating and displaying a comparison graphical diagram that visually identifies differences between the first and second graphical diagrams.
GRAPHICAL DIAGRAM COMPARISON
In an example, a method to compare graphical diagrams may include comparing a first set of graphical objects of a first graphical diagram to a second set of graphical objects of a second graphical diagram. The method may include, in response to one or more identifying properties of a given graphical object of the first set matching one or more identifying properties of a given graphical object of the second set, declaring the given graphical object of the first set and the given graphical object of the second set a matched pair. The method may include marking each unmatched graphical object a deleted graphical object or an added graphical object. The method may include generating and displaying a comparison graphical diagram that visually identifies differences between the first and second graphical diagrams.
METHOD OF PROCESSING VIDEO, METHOD OF QUERING VIDEO, AND METHOD OF TRAINING MODEL
The present application provides a method of processing a video, a method of querying a video, and a method of training a video processing model. A specific implementation solution of the method of processing the video includes: extracting, for a video to be processed, a plurality of video features under a plurality of receptive fields; extracting a local feature of the video to be processed according to a video feature under a target receptive field in the plurality of receptive fields; obtaining a global feature of the video to be processed according to a video feature under a largest receptive field in the plurality of receptive fields; and merging the local feature and the global feature to obtain a target feature of the video to be processed.
METHOD OF PROCESSING VIDEO, METHOD OF QUERING VIDEO, AND METHOD OF TRAINING MODEL
The present application provides a method of processing a video, a method of querying a video, and a method of training a video processing model. A specific implementation solution of the method of processing the video includes: extracting, for a video to be processed, a plurality of video features under a plurality of receptive fields; extracting a local feature of the video to be processed according to a video feature under a target receptive field in the plurality of receptive fields; obtaining a global feature of the video to be processed according to a video feature under a largest receptive field in the plurality of receptive fields; and merging the local feature and the global feature to obtain a target feature of the video to be processed.
SUPERVISED CONTRASTIVE LEARNING FOR VISUAL GROUNDING
A method of training a neural network model includes generating a positive image based on an original image, generating a positive text corresponding to the positive image based on an original text corresponding to the original image, the positive text referring to an object in the positive image, constructing a positive image-text pair for the object based on the positive image and the positive text, constructing a negative image-text pair for the object based on the original image and a negative text, the negative text not referring to the object, training the neural network model based on the positive image-text pair and the negative image-text pair to output features representing an input image-text pair, and identifying the object in the original image based on the features representing the input image-text pair.