Patent classifications
G06V10/806
VIDEO DESCRIPTION GENERATION METHOD AND APPARATUS, VIDEO PLAYING METHOD AND APPARATUS, AND STORAGE MEDIUM
The present disclosure discloses a video description generation method and apparatus, a video playing method and apparatus, and a computer-readable storage medium. The method includes: extracting video features, and obtaining a video feature sequence corresponding to video encoding moments in a video stream; encoding the video feature sequence by using a forward recurrent neural network and a backward recurrent neural network, to obtain a forward hidden state sequence and a backward hidden state sequence corresponding to each video encoding moment; and positioning, according to the forward hidden state sequence and the backward hidden state sequence, an event corresponding to each video encoding moment and an interval corresponding to the event at the video encoding moment, thereby predicting a video content description of the event. On the basis of distinguishing overlapping events, the interval corresponding to the event is introduced to predict and generate a word corresponding to the event at the video encoding moment, and events that overlap at the video encoding moment correspond to different intervals, so that the video content descriptions of events at this video encoding moment have a high degree of distinction. By analogy, events in the given video stream can be described more distinctively.
NATURAL LANGUAGE GENERATION BY AN EDGE COMPUTING DEVICE
Systems and methods for natural language generation by an edge computing device are disclosed. In one embodiments, a method comprises: receiving, by an edge computing device, event data from an edge event; determining, by the edge computing device, that a network connection to a cloud server is not available; extracting, by the edge computing device, features of the event data; predicting, by a local neural network of the edge computing device, an action for the edge computing device to take based on the features of the event data, wherein the action is associated with a confidence level; and determining, by the edge computing device, whether the confidence level meets a predetermined threshold value.
Multimodal data fusion by hierarchical multi-view dictionary learning
Techniques for multimodal data fusion having a multimodal hierarchical dictionary learning framework that learns latent subspaces with hierarchical overlaps are provided. In one aspect, a method for multi-view data fusion with hierarchical multi-view dictionary learning is provided which includes the steps of: extracting multi-view features from input data; defining feature groups that group together the multi-view features that are related; defining a hierarchical structure of the feature groups; and learning a dictionary using the feature groups and the hierarchy of the feature groups. A system for multi-view data fusion with hierarchical multi-view dictionary learning is also provided.
METHOD AND SYSTEM FOR FINGERPRINT IMAGE ENHANCEMENT
The present disclosure relates a method for fingerprint image enhancement comprising applying a first low pass filter and a first weight to raw fingerprint image data to produce a first filtered fingerprint image data set. Applying a second low pass filter and a second weight to the raw fingerprint image data to produce a second filtered fingerprint image data set. Filter coefficients of the second filter are different from filter coefficients of the first filter. The first filtered fingerprint image data set and the second filtered fingerprint image data set are combined to produce a final enhanced fingerprint image. The disclosure also relates to a fingerprint sensing system and to an electronic device comprising a fingerprint sensing system.
METHOD AND APPARATUS FOR SOUND OBJECT FOLLOWING
The present disclosure relates to a method and apparatus for processing a multimedia signal. More specifically, the present disclosure relates to a method comprising obtaining a video frame and an audio frame from the multimedia signal; obtaining at least one video object from the video frame and at least one audio object from the audio frame; determining a correlation between the at least one video object and the at least one audio object; and performing a directional rendering on a specific audio object of the at least one audio object, based on a screen location of a specific video object related with the specific audio object from among the at least one video object according to the determined correlation, and an apparatus therefor.
METHOD AND APPARATUS FOR SOUND OBJECT FOLLOWING
The present disclosure relates to a method and apparatus for processing a multimedia signal. More specifically, the present disclosure relates to a method comprising obtaining at least one video object from the multimedia signal and at least one audio object from the multimedia signal, extracting video feature information for the at least one video object and audio feature information for the at least one audio object, and determining a correlation between the at least one video object and the at least one audio object through an object matching engine based on the video feature information and the audio feature information, and an apparatus therefor.
LAYOUT-AWARE, SCALABLE RECOGNITION SYSTEM
Described herein is a mechanism for visual recognition of items or visual search using Optical Character Recognition (OCR) of text in images. Recognized OCR blocks in an image comprise position information and recognized text. The embodiments utilize a location-aware feature vector created using the position and recognized information in each recognized block. The location-aware features of the feature vector utilize position information associated with the block to calculate a weight for the block. The recognized text is used to construct a tri-character gram frequency, inverse document frequency (TGF-IDP) metric using tri-character grams extracted from the recognized text. Features in location-aware feature vector for the block are computed by multiplying the weight and the corresponding TGF-IDF metric. The location-aware feature vector for the image is the sum of the location-aware feature vectors for the individual blocks.
MULTI-LEVEL DEEP FEATURE AND MULTI-MATCHER FUSION FOR IMPROVED IMAGE RECOGNITION
A system, method and program product for implementing image recognition. A system is disclosed that includes a training system for generating a multi-feature multi-matcher fusion (MMF) predictor for scoring pairs of images, the training system having: a neural network configurable to extract a set of feature spaces at different resolutions based on a training dataset; and an optimizer that processes the training dataset, extracted feature spaces and a set of matcher functions to generate the MMF predictor having a series of weighted feature/matcher components; and a prediction system that utilizes the MMF predictor to generate a prediction score indicative of a match for a pair of images.
Gesture component with gesture library
A gesture component with a gesture library is described. The gesture component is configured to expose operations for execution by application of a computing device based on detected gestures. In one example, an input is detected using a three dimensional object detection system of a gesture component of the computing device. A gesture is recognized by the gesture component based on the detected input through comparison with a library of gestures maintained by the gesture component. An operation is then recognized that corresponds to the gesture by the gesture component using the library of gestures. The operation is exposed by the gesture component via an application programming interface to at least one application executed by the computing device to control performance of the operation by the at least one application.
Systems and methods for cross-modality image segmentation
Embodiments of the disclosure provide systems and methods for segmenting a medical image. The system includes a communication interface configured to receive the medical image acquired by an image acquisition device. The system also includes a memory configured to store a plurality of learning networks jointly trained using first training images of a first imaging modality and second training images of a second imaging modality. The system further includes a processor, configured to segment the medical image using a segmentation network selected from the plurality of learning networks.