Patent classifications
G06V10/00
Spatio-temporal graph for video captioning with knowledge distillation
A method for scene perception using video captioning based on a spatio-temporal graph model is described. The method includes decomposing the spatio-temporal graph model of a scene in input video into a spatial graph and a temporal graph. The method also includes modeling a two branch framework having an object branch and a scene branch according to the spatial graph and the temporal graph to learn object interactions between the object branch and the scene branch. The method further includes transferring the learned object interactions from the object branch to the scene branch as privileged information. The method also includes captioning the scene by aligning language logits from the object branch and the scene branch according to the learned object interactions.
Optical device and optical neural network apparatus including the same
Provided are an optical device which is capable of optically implementing an activation function of an artificial neural network and an optical neural network apparatus which includes the optical device. The optical device may include: a beam splitter splitting incident light into first light and second light; an image sensor disposed to sense the first light; an optical shutter configured to transmit or block the second light; and a controller controlling operations of the optical shutter, based on an intensity of the first light measured by the image sensor.
Method, system, and computer program product for detecting fraudulent interactions
A method for detecting fraudulent interactions may include receiving interaction data, including a first plurality of interactions with (first) fraud labels and a second plurality of interactions (without fraud labels). Second fraud label data for each of the second plurality of interactions may be generated with a first neural network (e.g., classifying whether each interaction is fraudulent or not). Generated interaction data and generated fraud label data may be generated with a second neural network. Discrimination data for each of the second plurality of interactions and generated interactions may be generated with a third neural network (e.g., classifying whether the respective interaction is real or not). Error data may be determined based on the discrimination data (e.g., whether the respective interaction is correctly classified). At least one of the neural networks may be trained based on the error data. A system and computer program product are also disclosed.
Method for encoding and decoding video, and apparatus using same
The present invention relates to a technique for encoding and decoding video data, and more particularly, to a method for performing inter-prediction in an effective manner. The present invention combines an inter-prediction method using an AMVP mode and an inter-prediction method using a merge mode so as to propose a method for using the same candidate. The method for encoding video data proposed by the present invention comprises the following steps: receiving mode information on an inter-prediction method of a current block; determining, on the basis of the received mode information, whether the interprediction method to be applied to the current block is an AMVP mode or a merge mode; and selecting a candidate to derive motion information of the current block, wherein the candidate is selected in a left region, top region and corner region of the current block and in the same position block as the current block, and the AMVP mode and the merge mode are applied on the basis of the selected candidate.
METHOD OF PROCESSING MAP DATA, ELECTRONIC DEVICE AND STORAGE MEDIUM
A method of processing map data, an electronic device, and a storage medium, which relate to a field of a computer technology, in particular to fields of intelligent transportation technology, image processing technology, etc. The method of processing the map data includes: processing sensor data for a traffic object to obtain point cloud data for the traffic object, where the sensor data includes image data; obtaining mesh data based on the point cloud data; processing the image data based on an association between the mesh data and the image data, so as to obtain processed image data; and obtaining the map data for the traffic object based on the processed image data.
METHOD OF LEARNING A TARGET OBJECT USING A VIRTUAL VIEWPOINT CAMERA AND A METHOD OF AUGMENTING A VIRTUAL MODEL ON A REAL OBJECT IMPLEMENTING THE TARGET OBJECT USING THE SAME
Provided is a method of learning a target object implemented on a computer-aided design program of an authoring computing device using a virtual viewpoint camera, including displaying a digital model of a target object that is a target for image recognition, setting at least one observation area surrounding the digital model of the target object and having a plurality of viewpoints on the digital model, generating a plurality of pieces of image data obtained by viewing the digital model of the target object at the plurality of viewpoints of the at least one observation area, and generating object recognition library data for recognizing a real object implementing the digital model of the target object based on the generated plurality of pieces of image data.
AUTOMATED SAMPLE WEIGHT MEASUREMENT VIA OPTICAL INSPECTION
A method includes the steps collecting measurement data of a sample utilizing an adaptable inspection unit or while the sample is in-flight, determining a volume or area of the sample based at least in part on the measurement data, and calculating a weight of the sample based at least in part on the volume or area of the sample. The measurement data includes a captured image that includes a plurality of pixels. The determining of the volume of the sample includes determining the number of pixels in the captured image that display a portion of the sample, or determining the maximum number of consecutive pixels that display a portion of the sample in two or three dimensions.
Image coding method based on secondary transform, and device therefor
An image decoding method according to the present document comprises the steps of: deriving transform coefficients through inverse quantization on the basis of quantized transform coefficients for a target block; deriving modified transform coefficients on the basis of an inverse reduced secondary transform (RST) for the transform coefficients; and generating, on the basis of an inverse primary transform for the modified transform coefficients, a restoration picture based on residual samples for the target block, wherein the modified transform coefficients derived according to the inverse RST are two-dimensionally arranged according to the order of a row priority direction or a column priority direction according to an intra prediction mode to be applied to the target block.
BOOTSTRAPPING A SIMULATION-BASED ELECTROMAGNETIC OUTPUT OF A DIFFERENT ANATOMY
Systems are provided for generating data representing electromagnetic states of a heart for medical, scientific, research, and/or engineering purposes. The systems generate the data based on source configurations such as dimensions of, and scar or fibrosis or pro-arrhythmic substrate location within, a heart and a computational model of the electromagnetic output of the heart. The systems may dynamically generate the source configurations to provide representative source configurations that may be found in a population. For each source configuration of the electromagnetic source, the systems run a simulation of the functioning of the heart to generate modeled electromagnetic output (e.g., an electromagnetic mesh for each simulation step with a voltage at each point of the electromagnetic mesh) for that source configuration. The systems may generate a cardiogram for each source configuration from the modeled electromagnetic output of that source configuration for use in predicting the source location of an arrhythmia.
BOOTSTRAPPING A SIMULATION-BASED ELECTROMAGNETIC OUTPUT OF A DIFFERENT ANATOMY
Systems are provided for generating data representing electromagnetic states of a heart for medical, scientific, research, and/or engineering purposes. The systems generate the data based on source configurations such as dimensions of, and scar or fibrosis or pro-arrhythmic substrate location within, a heart and a computational model of the electromagnetic output of the heart. The systems may dynamically generate the source configurations to provide representative source configurations that may be found in a population. For each source configuration of the electromagnetic source, the systems run a simulation of the functioning of the heart to generate modeled electromagnetic output (e.g., an electromagnetic mesh for each simulation step with a voltage at each point of the electromagnetic mesh) for that source configuration. The systems may generate a cardiogram for each source configuration from the modeled electromagnetic output of that source configuration for use in predicting the source location of an arrhythmia.