Patent classifications
G06K9/62
SEMANTIC IMAGE SEGMENTATION USING CONTRASTIVE CHANNELS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a segmentation neural network. In one aspect, a method comprises: obtaining data defining: (i) an image, and (ii) a respective class of each pixel in the image from a set of possible classes; determining a target segmentation of the image that comprises one or more target contrastive channels, wherein each target contrastive channel corresponds to a respective pair of classes including a respective first class and a respective second class from the set of possible classes; and training the segmentation neural network to process the image to generate a predicted segmentation that matches the target segmentation.
Mobile App riteTune to provide music instrument players instant feedback on note pitch and rhythms accuracy based on sheet music
A tool is needed for music instrument learners to get feedbacks on the correctness of their performances of a particular piece of music. The invention disclosed here is such a tool that can provide music instrument players instant feedback on note pitch and rhythms accuracy based on sheet music. This is accomplished through audio signal processing, sheet music image processing, and conversion of both analogue images and audio signals into standard digital music representation so a comparison can be done and hence a feedback can be presented to the player. An advanced feature will allow users to save the data to the cloud and retrieve later for comparison of progress. It also will allow user to participate an online competition with other players of the same piece of music.
ARTIFICIAL INTELLIGENCE (AI)-BASED SECURITY SYSTEMS FOR MONITORING AND SECURING PHYSICAL LOCATIONS
Various aspects of the disclosure relate to monitoring a physical location to determine and/or predict anomalous activities. One or more machine learning algorithms may be used to analyze inputs from one or more sensors, cameras, audio recording equipment, and/or any other types of sensors to detect anomalous measurements/patterns. Notifications may be sent one or more devices in a network based on the detection.
GEOMETRIC AGING DATA REDUCTION FOR MACHINE LEARNING APPLICATIONS
Techniques for geometric aging data reduction for machine learning applications are disclosed. In some embodiments, an artificial-intelligence powered system receives a first time-series dataset that tracks at least one metric value over time. The system then generates a second time-series dataset that includes a reduced version of a first portion of the time-series dataset and a non-reduced version of a second portion of the time-series dataset. The second portion of the time-series dataset may include metric values that are more recent than the first portion of the time-series dataset. The system further trains a machine learning model using the second time-series dataset that includes the reduced version of the first portion of the time-series dataset and the non-reduced version of the second portion of the time-series dataset. The trained model may be applied to reduced and/or non-reduced data to detect multivariate anomalies and/or provide other analytic insights.
SYSTEM AND METHOD FOR ANIMAL DETECTION
A system and a method for detecting animals in a region of interest are disclosed. An image that captures a scene in the region of interest is received. The image is fed to an animal detection model to produce a group of probability maps for a group of key points and a group of affinity field maps for a group of key point sets. One or more connection graphs are determined based on the group of probability maps and the group of affinity field maps. Each connection graph outlines a presence of an animal in the image. One or more animals present in the region of interest are detected based on the one or more connection graphs.
AUTO-ADJUSTING DISPLAY TIME OF SLIDES BASED ON CONTENT INTELLIGENCE
Systems and methods are directed to auto-adjusting play time of slides based on content intelligence. The system accesses media comprising a plurality of media items, wherein a media item of the plurality of media items comprises a first content type. The system performs machine analysis associated with the first content type. Based on the machine analysis, the system determines a first display time for the first content type and derives a total display time for the media item based on the first display time. If the media item includes a second content type, then the system performs machine analysis associated with the second content type and determines a second display time for the second content type. The total display time now comprises an aggregation of the first and second display times. The system can cause a machine action based on the total display time.
DATA AUGMENTATION USING BRAIN EMULATION NEURAL NETWORKS
In one aspect, there is provided a method performed by one or more data processing apparatus, the method including receiving a training dataset having multiple training examples, where each training example includes: (i) an image, and (ii) a segmentation defining a target region of the image that has been classified as including pixels in a target category. The method further includes determining a respective refined segmentation for each training example, including, for each training example, processing the target region of the image defined by the segmentation for the training example using a de-noising neural network to generate a network output that defines the refined segmentation for the training example. The method further includes training a segmentation machine learning model on the training examples of the training dataset, including, for each training example training the segmentation machine learning model to process the image included in the training example to generate a model output that matches the refined segmentation for the training example.
AUTOMATED LANGUAGE ASSESSMENT FOR WEB APPLICATIONS USING NATURAL LANGUAGE PROCESSING
A computer assesses language attributes of web application display text elements. The computer receives access to a selected web application. The computer parses hypertext markup language content of the web application and generating a parse tree representing the content. The computer identifies, using the parse tree, display text elements within the content and determining associated element selector queries that identify respective display text elements within the parse tree. The computer processes a set of display text elements, using a plurality of Natural Language Processing classifier models, each of the classifier models generates a relevant language prediction for the processed display text element. The computer collects, for each text element, groups of classifiers associated with substantially-similar predictions and indexed by relevant text element selector. The computer determines a target language match condition for each group. The computer initiates a corresponding at least one corrective action associated with the match condition.
EMOTIONAL RESPONSE EVALUATION FOR PRESENTED IMAGES
A method, computer system, and a computer program product for image evaluation is provided. The present invention may include extracting one or more individual objects from an image. The present invention may include determining a general sentiment for each of the one or more individual objects. The present invention may include determining a personal sentiment score each of the one or more individual objects. The present invention may include generating an overall sentiment score for the image based on at least the general sentiment score for each of the one or more individual objects and the personal sentiment score for each of the one or more individual objects. The present invention may include determining the overall sentiment score for the image exceeds a personal threshold of a user. The present invention may include providing one or more improvement mechanisms to the user.
VIDEO ACTION RECOGNITION AND MODIFICATION
A system, method, and computer program product for implementing video action recognition is provided. The method includes receiving a video stream comprising user movement actions. Skeleton points associated with a video representation of a user executing the user movement actions are extracted and categorized with respect to multiple digital levels. Initial visual windows points are generated within video frames and an average movement distance for the group of skeleton points are determined with respect to the video frames. In response, sizes for the visual windows are adjusted and feature vectors are extracted from the group of skeleton points. Point coordinates of the skeleton points are extracted and linked with the feature vectors. A convolutional neural network associated with linking the feature vectors with the point coordinates is generated and the video stream is enabled with respect to video action recognition associated with accurate presentation of the video stream.