G06N3/0495

SPARSITY-AWARE COMPUTE-IN-MEMORY
20230049323 · 2023-02-16 ·

Certain aspects of the present disclosure provide techniques for performing machine learning computations in a compute in memory (CIM) array comprising a plurality of bit cells, including: determining that a sparsity of input data to a machine learning model exceeds an input data sparsity threshold; disabling one or more bit cells in the CIM array based on the sparsity of the input data prior to processing the input data; processing the input data with bit cells not disabled in the CIM array to generate an output value; applying a compensation to the output value based on the sparsity to generate a compensated output value; and outputting the compensated output value.

OPTIMIZATION OF MEMORY USE FOR EFFICIENT NEURAL NETWORK EXECUTION

Implementations disclosed describe methods and systems to perform the methods of optimizing a size of memory used for accumulation of neural node outputs and for supporting multiple computational paths in neural networks. In one example, a size of memory used to perform neural layer computations is reduced by performing nodal computations in multiple batches, followed by rescaling and accumulation of nodal outputs. In another example, execution of parallel branches of neural node computations include evaluating, prior to the actual execution, the amount of memory resources needed to execute a particular order of branches sequentially and select the order that minimizes this amount or keeps this amount below a target threshold.

CONTROLLING MACHINE LEARNING MODEL STRUCTURES

Examples of methods for controlling machine learning model structures are described herein. In some examples, a method includes controlling a machine learning model structure. In some examples, the machine learning model structure may be controlled based on an environmental condition. In some examples, the machine learning model structure may be controlled to control apparatus power consumption associated with a processing load of the machine learning model structure.

Optimizer based prunner for neural networks

A neural network pruning system can sparsely prune neural network models using an optimizer based approach that is agnostic to the model architecture being pruned. The neural network pruning system can prune by operating on the parameter vector of the full model and the gradient vector of the loss function with respect to the model parameters. The neural network pruning system can iteratively update parameters based on the gradients, while zeroing out as many parameters as possible based a preconfigured penalty.

MACHINE LEARNING MODEL SEARCH METHOD, RELATED APPARATUS, AND DEVICE
20230042397 · 2023-02-09 ·

This application relates to the field of artificial intelligence technologies, and discloses a machine learning model search method, a related apparatus, and a device. In the method, before model search and quantization, a plurality of single bit models are generated based on a to-be-quantized model, and evaluation parameters of layer structures in the plurality of single bit models are obtained. Further, after a candidate model selected from a candidate set is trained and tested, to obtain a target model, a quantization weight of each layer structure in the target model may be determined based on a network structure of the target model and evaluation parameters of all layer structures in the target model, a layer structure with a maximum quantization weight in the target model is quantized, and a model obtained through quantization is added to the candidate set.

MODEL COMPRESSION DEVICE, MODEL COMPRESSION METHOD, AND PROGRAM RECORDING MEDIUM
20230037904 · 2023-02-09 · ·

A model compression device includes a compression unit and a determination unit. The compression unit is configured to create a compression model arrived at by compressing a first prediction model created by machine learning. The determination unit is configured to determine whether or not a second prediction model created by re-learning the compression model can be further compressed on the basis of an index related to the performance of the second prediction model.

SEMICONDUCTOR DEVICE

To provide a semiconductor device with a novel structure. The semiconductor device includes an accelerator. The accelerator includes a first memory circuit, a second memory circuit, and an arithmetic circuit. The first memory circuit includes a first transistor. The second memory circuit includes a second transistor. Each of the first transistor and the second transistor includes a semiconductor layer including a metal oxide in a channel formation region. The arithmetic circuit includes a third transistor. The third transistor includes a semiconductor layer including silicon in a channel formation region. The first transistor and the second transistor are provided in different layers. The layer including the first transistor is provided over a layer including the third transistor. The layer including the second transistor is provided over the layer including the first transistor. The data retention characteristics of the first memory circuit are different from those of the second memory circuit.

NETWORK QUANTIZATION METHOD AND NETWORK QUANTIZATION DEVICE
20230042275 · 2023-02-09 ·

A network quantization method is a network quantization method of quantizing a neural network, and includes a database construction step of constructing a statistical information database on tensors that are handled by neural network, a parameter generation step of generating quantized parameter sets by quantizing values included in each tensor in accordance with the statistical information database and the neural network, and a network construction step of constructing a quantized network by quantizing the neural network with use of the quantized parameter sets. The parameter generation step includes a quantization-type determination step of determining a quantization type for each of a plurality of layers that make up the neural network.

NETWORK ACCURACY QUANTIFICATION METHOD AND SYSTEM, DEVICE, ELECTRONIC DEVICE AND READABLE MEDIUM
20230040375 · 2023-02-09 ·

Disclosed are a network accuracy quantification method, system, and device, an electronic device and a readable medium, which are applicable to a many-core chip. The method includes: determining a reference accuracy according to a total core resource number of the many-core chip and the number of core resources required by each network to be quantified, with the number of the core resources required by each network to be quantified being the number of the core resources which is determined after each network to be quantified is quantified; and determining a target accuracy corresponding to each network to be quantified according to the reference accuracy and the total core resource number of the many-core chip.

METHOD, DEVICE, AND COMPUTER PROGRAM PRODUCT FOR USER BEHAVIOR PREDICTION
20230041339 · 2023-02-09 ·

Embodiments of the present disclosure relate to a method, a device, and a computer program product for user behavior prediction. In some embodiments, at a client, a first user behavior embedding engine in the client generates behavior prediction information of a target user based on feature information of the target user. The client sends the behavior prediction information of the target user to a server, and receives information about a target item recommended for the target user from the server. Such method enables user privacy-related information to be processed only locally, thereby not only ensuring user privacy and security, but also significantly reducing overall resource overhead.