Patent classifications
G06N3/0495
METHODS AND APPARATUSES FOR HIGH PERFORMANCE AND ACCURACY FIXED-POINT BATCHNORM IMPLEMENTATION
A method to implement a fixed-point batchnorm layer in a neural network for data processing is provided in the present disclosure. The method includes: receiving fixed-point input data over a channel of a standalone floating-point batchnorm layer, and converting the floating-point input data into fixed-point input data of the standalone floating-point batchnorm layer; obtaining fixed-point quantization parameters in each channel based on the input data and floating-point parameters μ.sub.i, σ.sub.i, ε.sub.i in each channel; converting the standalone floating-point batchnorm layer based on the fixed-point quantization parameters into a fixed-point batchnorm layer for processing the fixed-point input data to generate fixed-point output data; and mapping the fixed-point batchnorm layer to a fixed-point convolution layer and the computation of convolution is done by matrix multiplication that can be executed on a GEMM engine.
NEURAL NETWORK ACCELERATION
For neural network acceleration, a datapath can be configured to implement a convolution computation. A control unit can be configured to coordinate operations of the datapath to implement the convolution computation based on coded instructions representative of a neural network system. The control unit can be configured to command the datapath to convolve at least one input feature element of a set of input feature elements of at least one input feature map with at least one discretized weight of a set of discretized weights to compute an influence that the at least one input feature element of the set of input feature elements of the least one input feature map has on one or more output feature elements of at least one output feature map.
MEMORY DEVICE FOR TERNARY COMPUTING
A memory device includes a pair of memory cells, an analog-to-digital converter (ADC), and a processing circuit. The pair of memory cells has a first memory cell and a second memory cell. The ADC, having a first input terminal and a second input terminal, is configured to convert a first data signal at the first input terminal and a second data signal at the second input terminal into a digital output indicating a data value associated with a particular state stored in the pair of memory cells. The processing circuit, coupled to a storage node of the first memory cell, a storage node of the second memory cell, and the first and the second input terminals, is configured to selectively adjust the first data signal and the second data signal according to first data stored in the first memory cell and second data stored in the second memory cell.
Methods and apparatuses for high performance and accuracy fixed-point scale implementation
A method to implement a fixed-point scale layer in a neural network for data processing is provided in the present disclosure. The method includes: receiving fixed-point input data over a channel of a standalone floating-point scale layer, and converting the floating-point input data into fixed-point input data of the standalone floating-point scale layer; obtaining fixed-point quantization parameters in each channel based on the input data and floating-point parameters γ.sub.i, β.sub.i in each channel; converting the standalone floating-point scale layer based on the fixed-point quantization parameters into a fixed-point scale layer for processing the fixed-point input data to generate fixed-point output data; and mapping the fixed-point scale layer to a fixed-point convolution layer and the computation of convolution is done by matrix multiplication that can be executed on a GEMM engine.
DATA PROCESSING DEVICE, DATA-PROCESSING METHOD AND RECORDING MEDIA
The data processing device includes the inference processor and learning processor. The inference processor includes a input data determination circuit for determining whether or not each of the binarized input data is a predetermined value, a memory for storing a plurality of coefficient and a coefficient address information including information about a coefficient address in which a plurality of coefficient are stored, an inference controller for reading coefficient address from the storage unit based on a determination result of the input data determination circuit and reading coefficient from the storage unit based on coefficient address, a arithmetic circuit for performing an operation using the binarized input data and coefficient acquired by the inference controller to generate the arithmetic operation result as a output data.
ACCELERATING CONVOLUTIONS FOR SPARSE INPUTS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing an accelerated convolution on sparse inputs. In one aspect, a method comprises receiving sensor data input comprising input features for input spatial locations; and processing the sensor data input using a convolutional neural network having a first convolutional layer with a filter having multiple filter spatial locations to generate a network output comprising output features for output spatial locations, wherein processing the sensor data input comprises: obtaining a rule book tensor that identifies for each filter spatial location (i) a subset of the input features, and (ii) for each input feature in the subset, a respective output feature; for each particular filter spatial location: generating input tile, filter tile, and output tile sets in accordance with the rule book tensor; and generating the output features in the output tile set based on the tile sets.
FIXED-POINT MULTIPLICATION FOR NETWORK QUANTIZATION
Techniques for training a neural network having a plurality of computational layers with associated weights and activations for computational layers in fixed-point formats include determining an optimal fractional length for weights and activations for the computational layers; training a learned clipping-level with fixed-point quantization using a PACT process for the computational layers; and quantizing on effective weights that fuses a weight of a convolution layer with a weight and running variance from a batch normalization layer. A fractional length for weights of the computational layers is determined from current values of weights using the determined optimal fractional length for the weights of the computational layers. A fixed-point activation between adjacent computational layers is related using PACT quantization of the clipping-level and an activation fractional length from a node in a following computational layer. The resulting fixed-point weights and activation values are stored as a compressed representation of the neural network.
NEUROMORPHIC DEVICE AND ELECTRONIC DEVICE INCLUDING THE SAME
A neuromorphic device includes a plurality of cell tiles including a cell array including a plurality of memory cells storing a weight of a neural network, a row driver connected to the plurality of memory cells, and cell analog-digital converters connected to the plurality of memory cells and converting cell currents into a plurality of pieces of digital cell data, a reference tile including a plurality of reference cells, a reference row driver connected to the plurality of reference cells, and reference analog-digital converters connected to the plurality of reference cells and converting reference currents read via the plurality of reference column lines into a plurality of pieces of digital reference data, and a comparator circuit configured to compare the plurality of pieces of digital cell data with the plurality of pieces of digital reference data, respectively.
COMPLEMENTARY SPARSITY IN PROCESSING TENSORS
A hardware accelerator that is efficient at performing computations related to tensors. The hardware accelerator may store a complementary dense process tensor that is combined from a plurality of sparse process tensors. The plurality of sparse process tensors have non-overlapping locations of active values. The hardware accelerator may perform elementwise operations between the complementary dense process tensor and an activation tensor to generate a product tensor. The hardware accelerator may re-arrange the product tensor based on a permutation logic to separate the products into groups. Each group corresponds to one of the sparse process tensors. Each group may be accumulated separately to generate a plurality of output values. The output values may be selected in an activation selection. The activation selection may be a dense activation or a sparse activation such as k winner activation that set non-winners to zeros.
COMPRESSED MATRIX REPRESENTATIONS OF NEURAL NETWORK ARCHITECTURES BASED ON SYNAPTIC CONNECTIVITY
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing brain emulation neural networks using compressed matrix representations. One of the methods includes obtaining a network input; and processing the network input using a neural network to generate a network output, comprising: processing the network input using an input subnetwork of the neural network to generate an embedding of the network input; and processing the embedding of the network input using a brain emulation subnetwork of the neural network, wherein the brain emulation subnetwork has a brain emulation neural network architecture that represents synaptic connectivity between a plurality of biological neurons in a brain of a biological organism, the processing comprising: obtaining a compressed matrix representation of a sparse matrix of brain emulation parameters; and applying the compressed matrix representation to the embedding of the network input to generate a brain emulation subnetwork output.