G06V10/464

EXPLAINING A MODEL OUTPUT OF A TRAINED MODEL

The invention relates a computer-implemented method (500) of generating explainability information for explaining a model output of a trained model. The method uses one or more aspect recognition models configured to indicate a presence of respective characteristics in the input instance. A saliency method is applied to obtain a masked source representation of the input instance at a source layer of the trained model (e.g., the input layer or an internal layer), comprising those elements at the source layer relevant to the model output. The masked source representation is mapped to a target layer (e.g., input or internal layer) of an aspect recognition model, and the aspect recognition model is then applied to obtain a model output indicating a presence of the given characteristic relevant to the model output of the trained model. As explainability information, the characteristics indicated by the aspect recognition models are output.

Optimizing 360-degree video streaming with video content analysis

Aspects of the subject disclosure may include, for example, a method performed by a processing system of determining a present orientation of a display region presented at a first time on a display of a video viewer, predicting a future orientation of the display region occurring at a second time based on data collected, to obtain a predicted orientation of the display region to be presented at the second time on the display of the video viewer, identifying, based on the predicted orientation of the display region, a first group of tiles from a video frame of a panoramic video being displayed by the video viewer, wherein the first group of tiles covers the display region in the video frame at the predicted orientation, and a plurality of objects moving in the video frame from the first time to the second time, wherein each object of the plurality of objects is located in a separate spatial region of the video frame at the second time, wherein a second group of tiles collectively covers the separate spatial regions, wherein tiles in the first group of tiles and tiles in the second group of tiles are different, and facilitating wireless transmission of the first group of tiles and a second tile from the second group of tiles, for presentation at the video viewer at the second time. Other embodiments are disclosed.

Object detection device, method, and program

Even if an object to be detected is not remarkable in images, and the input includes images including regions that are not the object to be detected and have a common appearance on the images, a region indicating the object to be detected is accurately detected. A local feature extraction unit 20 extracts a local feature of a feature point from each image included in an input image set. An image-pair common pattern extraction unit 30 extracts, from each image pair selected from images included in the image set, a common pattern constituted by a set of feature point pairs that have similar local features extracted by the local feature extraction unit 20 in images constituting the image pair, the set of feature point pairs being geometrically similar to each other. A region detection unit 50 detects, as a region indicating an object to be detected in each image included in the image set, a region that is based on a common pattern that is omnipresent in the image set, of common patterns extracted by the image-pair common pattern extraction unit 30.

METHOD AND APPARATUS FOR UPDATING OBJECT RECOGNITION MODEL
20230020965 · 2023-01-19 ·

This application provides a method and apparatus for updating an object recognition model in the field of artificial intelligence. In the technical solution provided in this application, a target image and first voice information of a user are obtained. The first voice information indicates a first category of a target object in the target image. A feature library of a first object recognition model is updated based on the target image and the first voice information. The updated first object recognition model includes a feature of the target object and a first label indicating the first category, and the feature of the target object corresponds to the first label. A recognition rate of an object recognition model can be improved more easily according to the technical solution provided in this application.

ARTIFICIAL INTELLIGENCE BASED CLASSIFICATION FOR TASTE AND SMELL FROM NATURAL LANGUAGE DESCRIPTIONS

Taste and smell classification from multilanguage descriptions can be performed by extracting, by one or more processors using natural language processing, a text including one or more words associated with taste and smell perceptions from an input received from a plurality of users. The input includes multilanguage information regarding at least one of changes in smell and changes in taste perceived by each of the plurality of users. Feature vectors are generated for the text extracted from the input using global vectors, and a distance between the feature vectors and a plurality of reference descriptors associated with taste and smell is calculated for determining a similarity between the text and the reference descriptors and creating a training dataset based on which a classification model is generated for categorizing the plurality of users according to the at least one of changes in smell and changes in taste.

System and method for hashed compressed weighting matrix in neural networks
11531859 · 2022-12-20 · ·

A method for a neural network includes receiving an input from a vector of inputs, determining a table index based on the input, and retrieving a hash table from a plurality of hash tables, wherein the hash table corresponds to the table index. The method also includes determining an entry index of the hash table based on an index matrix, wherein the index matrix includes one or more index values, and each of the one or more index values corresponds to a vector in the hash table and determining an entry value in the hash table corresponding to the entry index. The method also includes determining a value index, wherein the vector in the hash table includes one or more entry values, and wherein the value index corresponds to one of the one or more entry values in the vector and determining a layer response.

METHOD FOR GENERATING 3D REFERENCE POINTS IN A MAP OF A SCENE

A method of complementing a map of a scene with 3D reference points including four steps. In a first step, data is collected and recorded based on samples of at least one of an optical sensor, a GNSS, and an IMU. A second step includes initial pose generation by processing of the collected sensor data to provide a track of vehicle poses. A pose is based on a specific data set, on at least one data set re-coded before that dataset and on at least one data set recorded after that data set. A third step includes SLAM processing of the initial poses and collected optical sensor data to generate keyframes with feature points. In a fourth step 3D reference points are generated by fusion and optimization of the feature points by using future and past feature points together with a feature point at a point of processing. This second and fourth steps provides significantly better results than SLAM or VIO methods known from prior art, as the second and the fourth steps are based on recorded data. Wherein a normal SLAM or VIO algorithm only can access data of the past, in these steps, processing may also be done by looking at positions ahead, by using the recorded data.

Visual-inertial positional awareness for autonomous and non-autonomous tracking
11501527 · 2022-11-15 · ·

The described positional awareness techniques employing visual-inertial sensory data gathering and analysis hardware with reference to specific example implementations implement improvements in the use of sensors, techniques and hardware design that can enable specific embodiments to provide positional awareness to machines with improved speed and accuracy.

DIAGNOSTIC TOOL FOR DEEP LEARNING SIMILARITY MODELS
20230091435 · 2023-03-23 ·

A diagnostic tool for deep learning similarity models and image classifiers provides valuable insight into neural network decision-making. A disclosed solution generates a saliency map by: receiving a baseline image and a test image; determining, with a convolutional neural network (CNN), a first similarity between the baseline image and the test image; based on at least determining the first similarity, determining, for the test image, a first activation map for at least one CNN layer; based on at least determining the first similarity, determining, for the test image, a first gradient map for the at least one CNN layer; and generating a first saliency map as an element-wise function of the first activation map and the first gradient map. Some examples further determine a region of interest (ROI) in the first saliency map, cropping the test image to an area corresponding to the ROI, and determine a refined similarity score.

AUTOMATED CATEGORIZATION AND ASSEMBLY OF LOW-QUALITY IMAGES INTO ELECTRONIC DOCUMENTS

An apparatus includes a memory and processor. The memory stores document categories, text generated from an image a physical document page, and a machine learning algorithm. The text includes errors associated with noise in the image. The machine learning algorithm is configured to extract features associated with natural language processing and features associated with the errors from the text. The machine learning algorithm is also configured to generate a feature vector that includes the first and second pluralities of features, and to generate, based on the feature vector, a set of probabilities, each of which is associated with a document category and indicates a probability that the physical document from which the text was generated belongs to that document category. The processor applies the machine learning algorithm to the text, to generate the set of probabilities, identifies a largest probability, and assigns the image to the associated document category.