G06N3/063

HARDWARE ACCELERATED ANOMALY DETECTION IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

HARDWARE ACCELERATED ANOMALY DETECTION IN A SYSTEM ON A CHIP

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

SPARSITY-AWARE COMPUTE-IN-MEMORY
20230049323 · 2023-02-16 ·

Certain aspects of the present disclosure provide techniques for performing machine learning computations in a compute in memory (CIM) array comprising a plurality of bit cells, including: determining that a sparsity of input data to a machine learning model exceeds an input data sparsity threshold; disabling one or more bit cells in the CIM array based on the sparsity of the input data prior to processing the input data; processing the input data with bit cells not disabled in the CIM array to generate an output value; applying a compensation to the output value based on the sparsity to generate a compensated output value; and outputting the compensated output value.

DATA TRANSFER WITH CONTINUOUS WEIGHTED PPM DURATION SIGNAL
20230046980 · 2023-02-16 ·

A computer-implemented method for processing signals is provided including advantageously generating a temporally continuous weighted pulse position modulation (CW PPM) duration signal from an input analog signal, converting the CW PPM duration signal to a memory access signal, executing a multiply and accumulate (MAC) operation with the memory access signal, and advantageously generating the input analog signal from a result of the MAC operation by an activation function (AF).

INPUT CIRCUITRY FOR ANALOG NEURAL MEMORY IN A DEEP LEARNING ARTIFICIAL NEURAL NETWORK

Numerous embodiments of input circuitry for an analog neural memory in a deep learning artificial neural network are disclosed.

OPTIMIZATION OF MEMORY USE FOR EFFICIENT NEURAL NETWORK EXECUTION

Implementations disclosed describe methods and systems to perform the methods of optimizing a size of memory used for accumulation of neural node outputs and for supporting multiple computational paths in neural networks. In one example, a size of memory used to perform neural layer computations is reduced by performing nodal computations in multiple batches, followed by rescaling and accumulation of nodal outputs. In another example, execution of parallel branches of neural node computations include evaluating, prior to the actual execution, the amount of memory resources needed to execute a particular order of branches sequentially and select the order that minimizes this amount or keeps this amount below a target threshold.

OUTPUT CIRCUITRY FOR ANALOG NEURAL MEMORY IN A DEEP LEARNING ARTIFICIAL NEURAL NETWORK
20230049032 · 2023-02-16 ·

Numerous embodiments of output circuitry for an analog neural memory in a deep learning artificial neural network are disclosed. In some embodiments, a common mode circuit is used with differential cells, W+ and W−, that together store a weight, W. The common mode circuit can utilize current sources, variable resistors, or transistors as part of the structure for introducing a common mode voltage bias.

COMPUTING DEVICE, MEMORY CONTROLLER, AND METHOD FOR PERFORMING AN IN-MEMORY COMPUTATION

A method for performing an in-memory computation includes: storing data in memory cells of a memory array, the data including weights for computation; determining whether an update command to change at least one of the weights is received; in response to receiving the update command, performing a write operation on the memory array to update the at least one weight; and disabling the write operation on the memory array until receiving a next update command to change the at least one weight.

CHARGE DOMAIN MATHEMATICAL ENGINE AND METHOD

A multiplier has a pair of charge reservoirs. The pair of charge reservoirs are connected in series. A first charge movement device induces charge movement to or from the pair of charge reservoirs at a same rate. A second charge movement device induces charge movement to or from one of the pair of reservoirs, the rate of charge movement programmed to one of add or remove charges at a rate proportional to the first charge movement device. The first charge movement device loads a first charge into a first of the pair of charge reservoirs during a first cycle. The first charge movement device and the second charge movement device remove charges at a proportional rate from the pair of charge reservoirs during a second cycle until the first of the pair of charge reservoirs is depleted of the first charge. The second charge reservoir thereafter holding the multiplied result.

METHOD OF GENERATING PRE-TRAINING MODEL, ELECTRONIC DEVICE, AND STORAGE MEDIUM

A method of generating a pre-training model, an electronic device and a storage medium, which relate to a field of an artificial intelligence technology, in particular to a computer vision and deep learning technology. The method includes: determining a performance index set corresponding to a candidate model structure set, the candidate model structure set is determined from a plurality of model structures included in a search space, and the search space is a super-network-based search space; determining, from the candidate model structure set, a target model structure corresponding to each chip according to the performance index set, each target model structure is a model structure meeting a performance index condition; and determining, for each chip, the target model structure corresponding to the chip as a pre-training model corresponding to the chip, the chip is configured to run the pre-training model corresponding to the chip.