Patent classifications
H03M7/6058
POWER-EFFICIENT DEEP NEURAL NETWORK MODULE CONFIGURED FOR LAYER AND OPERATION FENCING AND DEPENDENCY MANAGEMENT
A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a fence, or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.
REDUCING POWER CONSUMPTION IN A NEURAL NETWORK PROCESSOR BY SKIPPING PROCESSING OPERATIONS
A deep neural network (DNN) module can determine whether processing of certain values in an input buffer or a weight buffer by neurons can be skipped. For example, the DNN module might determine whether neurons can skip the processing of values in entire columns of a neuron buffer. Processing of these values might be skipped if an entire column of an input buffer or a weight buffer are zeros, for example. The DNN module can also determine whether processing of single values in rows of the input buffer or the weight buffer can be skipped (e.g. if the values are zero). Neurons that complete their processing early as a result of skipping operations can assist other neurons with their processing. A combination operation can be performed following the completion of processing that transfers the results of the processing operations performed by a neuron to their correct owner.
NEURAL NETWORK PROCESSOR USING COMPRESSION AND DECOMPRESSION OF ACTIVATION DATA TO REDUCE MEMORY BANDWIDTH UTILIZATION
A deep neural network (DNN) module can compress and decompress neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit can receive an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit can receive a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion. This can reduce memory bus utilization, allow a DNN module to complete processing operations more quickly, and reduce power consumption.
MINIMIZING MEMORY READS AND INCREASING PERFORMANCE BY LEVERAGING ALIGNED BLOB DATA IN A PROCESSING UNIT OF A NEURAL NETWORK ENVIRONMENT
The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.
PROCESSING DISCONTIGUOUS MEMORY AS CONTIGUOUS MEMORY TO IMPROVE PERFORMANCE OF A NEURAL NETWORK ENVIRONMENT
The performance of a neural network (NN) can be limited by the number of operations being performed. Using a line buffer that is directed to shift a memory block by a selected shift stride for cooperating neurons, data that is operatively residing memory and which would require multiple write cycles into a cooperating line buffer can be processed as in a single line buffer write cycle thereby enhancing the performance of a NN/DNN. A controller and/or iterator can generate one or more instructions having the memory block shifting values for communication to the line buffer. The shifting values can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, shift the data of the memory block and write the data in the line buffer for subsequent processing.
POWER-EFFICIENT DEEP NEURAL NETWORK MODULE CONFIGURED FOR EXECUTING A LAYER DESCRIPTOR LIST
A deep neural network (DNN) processor is configured to execute descriptors in layer descriptor lists. The descriptors define instructions for performing a pass of a DNN by the DNN processor. Several types of descriptors can be utilized: memory-to-memory move (M2M) descriptors; operation descriptors; host communication descriptors; configuration descriptors; branch descriptors; and synchronization descriptors. A DMA engine uses M2M descriptors to perform multi-dimensional strided DMA operations. Operation descriptors define the type of operation to be performed by neurons in the DNN processor and the activation function to be used by the neurons. M2M descriptors are buffered separately from operation descriptors and can be executed at soon as possible, subject to explicitly set dependencies. As a result, latency can be reduced and, consequently, the neurons can complete their processing faster. The DNN module can then be powered down earlier than it otherwise would have, thereby saving power.
POWER-EFFICIENT DEEP NEURAL NETWORK MODULE CONFIGURED FOR PARALLEL KERNEL AND PARALLEL INPUT PROCESSING
A deep neural network (DNN) module utilizes parallel kernel and parallel input processing to decrease bandwidth utilization, reduce power consumption, improve neuron multiplier stability, and provide other technical benefits. Parallel kernel processing enables the DNN module to load input data only once for processing by multiple kernels. Parallel input processing enables the DNN module to load kernel data only once for processing with multiple input data. The DNN module can implement other power-saving techniques like clock-gating (i.e. removing the clock from) and power-gating (i.e. removing the power from) banks of accumulators based upon usage of the accumulators. For example, individual banks of accumulators can be power-gated when all accumulators in a bank are not in use, and do not store data for a future calculation. Banks of accumulators can also be clock-gated when all accumulators in a bank are not in use, but store data for a future calculation.
DYNAMICALLY PARTITIONING WORKLOAD IN A DEEP NEURAL NETWORK MODULE TO REDUCE POWER CONSUMPTION
A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.
ENHANCING PROCESSING PERFORMANCE OF ARTIFICIAL INTELLIGENCE/MACHINE HARDWARE BY DATA SHARING AND DISTRIBUTION AS WELL AS REUSE OF DATA IN NUERON BUFFER/LINE BUFFER
An exemplary artificial intelligence/machine learning hardware computing environment having an exemplary DNN module cooperating with one or more memory components can perform data sharing and distribution as well reuse of a buffer data to reduce the number of memory component read/writes thereby enhancing overall hardware performance and reducing power consumption. Illustratively, data from a cooperating memory component is read according to a selected operation of the exemplary hardware and written to corresponding other memory component for use by one or more processing elements (e.g., neurons). The data is read in such a manner to optimize the engagement of the one or more processing elements for each processing cycle as well as to reuse data previously stored in the one or more cooperating memory components. Operatively, the written data is copied to a shadow memory buffer prior to being consumed by the processing elements.
DATA PROCESSING PERFORMANCE ENHANCEMENT FOR NEURAL NETWORKS USING A VIRTUALIZED DATA ITERATOR
The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.