G06F9/3552

METHOD OF INTERLEAVED PROCESSING ON A GENERAL-PURPOSE COMPUTING CORE
20230367604 · 2023-11-16 ·

A method of “interleaved processing” (IP) is proposed which generalizes the functional principle of memory interleaving by extending the interleaved memory system into the processor chip and prepending each write access to one of the extended interleaved memory banks by a data transforming operation. The method opens a new dimension of large scale software parallelization and is implemented in autonomous processing units called “parallel processing channels” (PPC) that integrate processor and memory at a very low machine balance—which solves the memory wall problem— and execute on-chip machine transactions at a 1 Flop/cycle throughput. IP computing systems are linearly performance scalable and capable of pipelined execution of very large and complex HPC workloads. They have unique performance advantages in strided vector, tensor, and data set operations; for relevant HPC workload types, up to 10×-100× per-Watt single-processor performance gains compared to today's technologies are expected.

Streaming engine with compressed encoding for loop circular buffer sizes
11803477 · 2023-10-31 · ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.

Streaming engine with compressed encoding for loop circular buffer sizes
11449429 · 2022-09-20 · ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.

ACCELERATING PROCESSOR BASED ARTIFICIAL NEURAL NETWORK COMPUTATION
20220066776 · 2022-03-03 ·

An apparatus employed in a processing device comprises a processor configured to process data of a predefined data structure. A memory fetch device is coupled to the processor and is configured to determine addresses of the packed data for the processor. The packed data is stored on a memory device that is coupled to the processor. The memory fetch device is further configured to provide output data based on the addresses of the packed data to the processor, where the output data is configured according to the predefine data structure.

STREAMING ENGINE WITH COMPRESSED ENCODING FOR LOOP CIRCULAR BUFFER SIZES
20210248077 · 2021-08-12 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.

Streaming engine with compressed encoding for loop circular buffer sizes
10983912 · 2021-04-20 · ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register specifies a circular address mode for the loop, first and second block size numbers and a circular address block size selection. For a first circular address block size selection the block size corresponds to the first block size number. For a first circular address block size selection the block size corresponds to the first block size number. For a second circular address block size selection the block size corresponds to a sum of the first block size number and the second block size number.

STREAMING ENGINE WITH MULTI DIMENSIONAL CIRCULAR ADDRESSING SELECTABLE AT EACH DIMENSION
20210026776 · 2021-01-28 ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Compression/decompression instruction specifying a history buffer to be used in the compression/decompression of data

An instruction to perform a function of a plurality of functions is obtained. The instruction is a single architected instruction of an instruction set architecture that complies to an industry standard for compression. The instruction is executed, and the executing includes performing the function specified by the instruction. The performing includes, based on the function being a compression function or a decompression function, transforming state of input data between an uncompressed form of the input data and a compressed form of the input data to provide a transformed state of data accessing. During performing the function, history relating to the function is accessed. The history is to be used in transforming the state of input data between the uncompressed form and the compressed form.

Streaming engine with multi dimensional circular addressing selectable at each dimension
10810131 · 2020-10-20 · ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

COMPRESSION/DECOMPRESSION INSTRUCTION SPECIFYING A HISTORY BUFFER TO BE USED IN THE COMPRESSION/DECOMPRESSION OF DATA

An instruction to perform a function of a plurality of functions is obtained. The instruction is a single architected instruction of an instruction set architecture that complies to an industry standard for compression. The instruction is executed, and the executing includes performing the function specified by the instruction. The performing includes, based on the function being a compression function or a decompression function, transforming state of input data between an uncompressed form of the input data and a compressed form of the input data to provide a transformed state of data accessing. During performing the function, history relating to the function is accessed. The history is to be used in transforming the state of input data between the uncompressed form and the compressed form.