Patent classifications
G06V20/41
SENSOR COMPENSATION USING BACKPROPAGATION
An embodiment includes training a first convolutional neural network (CNN) using a plurality of training images to generate first and second trained CNNs, and then adding an interface layer to the second trained CNN. The embodiment processes a first and second images in a sequence of images using the first trained CNN to generate a first and second result vectors. The embodiment also processes the second image using the second trained CNN and sensor data input to the interface layer to generate a third result vector. The embodiment modifies the sensor data using a compensation value. The embodiment compares the third result vector to the second result vector to generate an error value, and then calculates a modified compensation value using the error value. The embodiment then generates a sensor-compensated trained CNN based on the second trained CNN with the modified compensation value.
RESTRUCTURING TECHNIQUE FOR VIDEO FRAMES
A system for restructuring video frames.
Site-Based Calibration of Object Detection Parameters
Systems and methods for site-based calibration of object detection values, such as for surveillance video cameras, are described. Video data from a video image sensor may be processed using an object detector to determine object data and a confidence score for a detected object. The object data and confidence score may be post-processed to apply calibration values based on the camera location to one or more parameters used for determining detection events. Event notifications may be sent for detection events. The calibration values may be determined from a calibration period where a verification object detector is used to verify the object detections and failure analysis is applied to determine calibration values for the camera location.
HAND DETECTION TRIGGER FOR ITEM IDENTIFICATION
A device configured to capture a first overhead depth image of the platform using a three-dimensional (3D) sensor at a first time instance and a second overhead depth image of a first object using the 3D sensor at a second time instance. The device is further configured to determine that a first portion of the first object is within a region-of-interest and a second portion of the first object is outside the region-of-interest in the second overhead depth image. The device is further configured to capture a third overhead depth image of a second object placed on the platform using the 3D sensor at a third time instance. The device is further configured to capture a first image of the second object using a camera in response to determining that the first object is outside of the region-of-interest and the second object is within the region-of-interest for the platform.
SYSTEM AND METHOD FOR CAPTURING IMAGES FOR TRAINING OF AN ITEM IDENTIFICATION MODEL
A system for capturing images for training an item identification model obtains an identifier of an item. The system detects a triggering event at a platform, where the triggering event corresponds to a user placing the item on a platform. The system causes the platform to rotate. The system causes at least one camera to capture an image of the item while the platform is rotating. The system extracts a set of features associated with the item from the image. The system associates the item to the identifier and the set of features. The system adds a new entry to a training dataset of the item identification model, where the new entry represents the item labeled with the identifier and the set of features.
DEEP LEARNING SYSTEM FOR DETERMINING AUDIO RECOMMENDATIONS BASED ON VIDEO CONTENT
Embodiments are disclosed for determining an answer to a query associated with a graphical representation of data. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an unprocessed audio sequence and a request to perform an audio signal processing effect on the unprocessed audio sequence. The one or more embodiments further include analyzing, by a deep encoder, the unprocessed audio sequence to determine parameters for processing the unprocessed audio sequence. The one or more embodiments further include sending the unprocessed audio sequence and the parameters to one or more audio signal processing effects plugins to perform the requested audio signal processing effect using the parameters and outputting a processed audio sequence after processing of the unprocessed audio sequence using the parameters of the one or more audio signal processing effects plugins.
ITEM IDENTIFICATION USING MULTIPLE CAMERAS
A device configured to detect a triggering event corresponding with a user placing a first item on the platform, to capture a first image of the first item on the platform using a camera, and to input the first image into a machine learning model that is configured to output a first encoded vector based on features of the first item that are present in the first image. The device is further configured to identify a second encoded vector in an encoded vector library that most closely matches the first encoded vector and to identify a first item identifier in the encoded vector library that is associated with the second encoded vector. The device is further configured to identify the user, to identify an account that is associated with the user, and to associate the first item identifier with the account of the user.
MACHINE LEARNING MODEL AND NEURAL NETWORK TO PREDICT DATA ANOMALIES AND CONTENT ENRICHMENT OF DIGITAL IMAGES FOR USE IN VIDEO GENERATION
Systems, methods, and other embodiments for selecting, enriching and sequencing digital media content to produce a narrative-oriented, ordered sub-collection of media such as for movie creation. The method identifies, evaluates, assesses, stores, enriches, groups, and sequences content. The method identifies the content metadata. When metadata are missing or anomalous, the method attempts to populate or correct the metadata and store that new content in the database. The method evaluates content for focus quality and may exclude content based on rules. The method assesses the content storing the people and their emotional level, animals, objects, locations, landmarks and date/time in the database. The method can then enrich the remaining content by providing map, photo, video, text, and audio content. The method uses selecting criteria for grouping and sequencing content by date, time, person, etc. and compiling the sequenced groups into the final narrative ready for distribution, e.g., movie creation.
AUTO-ADJUSTING DISPLAY TIME OF SLIDES BASED ON CONTENT INTELLIGENCE
Systems and methods are directed to auto-adjusting play time of slides based on content intelligence. The system accesses media comprising a plurality of media items, wherein a media item of the plurality of media items comprises a first content type. The system performs machine analysis associated with the first content type. Based on the machine analysis, the system determines a first display time for the first content type and derives a total display time for the media item based on the first display time. If the media item includes a second content type, then the system performs machine analysis associated with the second content type and determines a second display time for the second content type. The total display time now comprises an aggregation of the first and second display times. The system can cause a machine action based on the total display time.
VIDEO ACTION RECOGNITION AND MODIFICATION
A system, method, and computer program product for implementing video action recognition is provided. The method includes receiving a video stream comprising user movement actions. Skeleton points associated with a video representation of a user executing the user movement actions are extracted and categorized with respect to multiple digital levels. Initial visual windows points are generated within video frames and an average movement distance for the group of skeleton points are determined with respect to the video frames. In response, sizes for the visual windows are adjusted and feature vectors are extracted from the group of skeleton points. Point coordinates of the skeleton points are extracted and linked with the feature vectors. A convolutional neural network associated with linking the feature vectors with the point coordinates is generated and the video stream is enabled with respect to video action recognition associated with accurate presentation of the video stream.