Patent classifications
G06F9/345
ACCELERATION OF OPERATIONS
Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.
METHOD AND APPARATUS FOR IMPLIED BIT HANDLING IN FLOATING POINT MULTIPLICATION
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
Apparatus and method for performing operations on capability metadata
An apparatus is provided comprising storage elements to store data blocks, where each data block has capability metadata associated therewith identifying whether the data block specifies a capability, at least one capability type being a bounded pointer. Processing circuitry is then arranged to be responsive to a bulk capability metadata operation identifying a plurality of the storage elements, to perform an operation on the capability metadata associated with each data block stored in the plurality of storage elements. Via a single specified operation, this hence enables query and/or modification operations to be performed on multiple items of capability metadata, hence providing more efficient access to such capability metadata.
Apparatus and method for performing operations on capability metadata
An apparatus is provided comprising storage elements to store data blocks, where each data block has capability metadata associated therewith identifying whether the data block specifies a capability, at least one capability type being a bounded pointer. Processing circuitry is then arranged to be responsive to a bulk capability metadata operation identifying a plurality of the storage elements, to perform an operation on the capability metadata associated with each data block stored in the plurality of storage elements. Via a single specified operation, this hence enables query and/or modification operations to be performed on multiple items of capability metadata, hence providing more efficient access to such capability metadata.
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
INFORMATION PROCESSING DEVICE
An information processing device used for a convolutional neural network includes a processor configured to acquire input data and process the input data by using a convolution layer that executes convolution processing and a pooling layer that executes pooling processing, in which the processor is configured to divide the acquired input data into processing areas having an overlapping area in which processing areas overlap and a non-overlapping area in which processing areas do not overlap, and the processor is configured to, when the processor executes processing of the input data in the processing area, execute the convolution processing or the pooling processing in the non-overlapping area, and execute the processing by reusing a processing result of the convolution processing or a processing result of the pooling processing in the overlapping area.
INFORMATION PROCESSING DEVICE
An information processing device used for a convolutional neural network includes a processor configured to acquire input data and process the input data by using a convolution layer that executes convolution processing and a pooling layer that executes pooling processing, in which the processor is configured to divide the acquired input data into processing areas having an overlapping area in which processing areas overlap and a non-overlapping area in which processing areas do not overlap, and the processor is configured to, when the processor executes processing of the input data in the processing area, execute the convolution processing or the pooling processing in the non-overlapping area, and execute the processing by reusing a processing result of the convolution processing or a processing result of the pooling processing in the overlapping area.
PARALLEL ACCELERATION METHOD FOR MEMRISTOR-BASED NEURAL NETWORK, PARALLEL ACCELERATION PROCESSOR BASED ON MEMRISTOR-BASED NEURAL NETWORK AND PARALLEL ACCELERATION DEVICE BASED ON MEMRISTOR-BASED NEURAL NETWORK
Disclosed are a parallel acceleration method for a memristor-based neural network, a parallel acceleration processor based on a memristor-based neural network and a parallel acceleration device based on a memristor-based neural network. The neural network includes a plurality of functional layers sequentially provided, wherein the plurality of functional layers include a first functional layer and a second functional layer following the first functional layer, the first functional layer includes a plurality of first memristor arrays in parallel, and the plurality of first memristor arrays are configured to execute an operation of the first functional layer in parallel and to output a result of the operation to the second functional layer. The parallel acceleration method includes: executing the operation of the first functional layer in parallel via the plurality of first memristor arrays and outputting the result of the operation to the second functional layer.
PARALLEL ACCELERATION METHOD FOR MEMRISTOR-BASED NEURAL NETWORK, PARALLEL ACCELERATION PROCESSOR BASED ON MEMRISTOR-BASED NEURAL NETWORK AND PARALLEL ACCELERATION DEVICE BASED ON MEMRISTOR-BASED NEURAL NETWORK
Disclosed are a parallel acceleration method for a memristor-based neural network, a parallel acceleration processor based on a memristor-based neural network and a parallel acceleration device based on a memristor-based neural network. The neural network includes a plurality of functional layers sequentially provided, wherein the plurality of functional layers include a first functional layer and a second functional layer following the first functional layer, the first functional layer includes a plurality of first memristor arrays in parallel, and the plurality of first memristor arrays are configured to execute an operation of the first functional layer in parallel and to output a result of the operation to the second functional layer. The parallel acceleration method includes: executing the operation of the first functional layer in parallel via the plurality of first memristor arrays and outputting the result of the operation to the second functional layer.