G06N3/096

METHOD AND SYSTEM FOR WEIGHTED KNOWLEDGE DISTILLATION BETWEEN NEURAL NETWORK MODELS

A method of training a student model includes providing an input to a teacher model that is larger than the student model, where a layer of the teacher model outputs a first output vector, providing the input to the student model, where a layer of the student model outputs a second output vector, determining an importance value associated with each dimension of the first output vector based on gradients from the teacher model and updating at least one parameter of the student model to minimize a difference between the second output vector and the first output vector based on the importance values.

SYSTEMS AND METHODS OF DETERMINING DEGRADATION IN ANALOG COMPUTE-IN-MEMORY (ACIM) MODULES

Certain aspects of the present disclosure provide techniques for performing compute in memory (CIM) computations. A device comprises a CIM module configured to apply a plurality of analog weights to data using multiply-accumulate operations to generate an output. The device further comprises a digital weight storage unit configured to store digital weight references, wherein a digital weight reference corresponds to an analog weight of the plurality of analog weights. The device also comprises a device controller configured to program the plurality of analog weights to the CIM module based on the digital weight references and determine degradation of one or more analog weights. The digital weight references in the digital weight storage unit are populated with values from a host device. Degraded analog weights in the CIM module are replaced with corresponding digital weight references from the digital weight storage unit without reference to the host device.

Systems and Methods of Compensating Degradation in Analog Compute-In-Memory (ACIM) Modules

Certain aspects of the present disclosure provide techniques for performing compute in memory (CIM) computations. A device comprises a CIM module configured to apply analog weights to input data using multiply-accumulate operations to generate an output. The device further comprises a digital weight storage unit configured to store digital weight references, wherein a digital weight reference corresponds to an analog weight of the analog weights. The device also comprises a device controller configured to program the analog weights to the CIM module, cause the CIM module to process the input data, and reprogram one or more analog weights that are degraded. The digital weight references in the digital weight storage unit are populated with values from a host processing device. Degraded analog weights in the CIM module are reprogrammed based on the corresponding digital weight references from the digital weight storage unit without reference to the host processing device.

METHOD AND APPARATUS FOR TRANSFER LEARNING
20220398834 · 2022-12-15 ·

A method for transfer learning includes: obtaining a pre-trained model, and generating a model to be transferred based on the pre-trained model, in which the model to be transferred includes N Transformer layers, and N is a positive integer; obtaining a mini-batch by performing random sampling on a target training set; and training the model to be transferred based on the mini-batch, in which a loss value for each Transformer layer is generated based on an empirical loss value and a noise stability loss value.

CONTEXT ENABLED MACHINE LEARNING
20220400251 · 2022-12-15 ·

Certain aspects of the present disclosure provide techniques for generating context-aware inferences using a machine learning model. The method generally includes receiving a time-series data sequence and a contextual model specifying characteristics of how objects behave in an environment in which the time-series data sequence was captured. A feature data set from the contextual model is extracted using a first machine learning model. Generally, the extracted feature data set comprises a representation of the specified characteristics of how objects behave in the environment. A future state of an object in the environment is predicted using the time-series data sequence and the extracted feature data set representing the specified characteristics of how objects behave in the environment as input into a second machine learning model. One or more actions are taken based on the predicted future state of the object in the environment.

Systems and methods for deep multi-task learning for embedded machine vision applications

A computer-implemented method includes receiving data generated using at least one sensor of a vehicle; and simultaneously performing multiple different prediction tasks on the data using a multi-task neural network, wherein the multi-task neural network comprises at least one shared parameter inference matrix comprising parameters shared between the multiple different prediction tasks, and the at least one shared parameter inference matrix was over-parameterized during training into at least one shared parameter matrix and multiple task-specific parameter matrices, each of the multiple task-specific parameter matrices being associated with a different one of the multiple different tasks.

Sequential ensemble model training for open sets

Disclosed are systems and method for training an ensemble of machine learning models with a focus on feature engineering. For example, the training of the models encourages each machine learning model of the ensemble to rely on a different set of input features from the training data samples used to train the machine learning models of the ensemble. However, instead of telling each model explicitly which features to learn, in accordance with the disclosed implementations, ML models of the ensemble may be trained sequentially, with each new model trained to disregard input features learned by previously trained ML models of the ensemble and learn based on other features included in the training data samples.

Three-dimensional pose estimation

Devices and techniques are generally described for estimating three-dimensional pose data. In some examples, a first machine learning network may generate first three-dimensional (3D) data representing input 2D data. In various examples, a first 2D projection of the first 3D data may be generated. A determination may be made that the first 2D projection conforms to a distribution of natural 2D data. A second machine learning network may generate parameters of a 3D model based at least in part on the input 2D data and based at least in part on the first 3D data. In some examples, second 3D data may be generated using the parameters of the 3D model.

AGENT DECISION-MAKING METHOD AND APPARATUS

This application provides an agent decision-making method and an apparatus, to improve decision-making performance of an agent. The method is applied to a communications system. The communications system includes at least two function modules. The at least two function modules include a first function module and a second function module, where the first function module is configured with a first agent, and the second function module is configured with a second agent. The method further includes the first agent obtaining related information of the second agent, and makes a decision on the first function module based on the related information of the second agent.

METHOD AND SYSTEM FOR SCENE GRAPH GENERATION

Broadly speaking, the disclosure generally relates to relates to a computer-implemented methods and systems for scene graph generation, and in particular for training a machine learning, ML, model to generate a scene graph. The method includes inputting training a training image into a machine learning model, outputting a predicted label for at least two objects in the training image and a predicted label for a relationship between the at least two objects. The training method includes calculating a loss, which takes into account both a supervised loss calculated by comparing the predicted labels to the actual labels for the training image, and a logic-based loss calculated by comparing the predicted labels to stored integrity constraints comprising common-sense knowledge. Advantageously, this means that the performance of the model is improved without increasing processing at inference-time.