Patent classifications
G06F9/383
STREAM REFERENCE REGISTER WITH DOUBLE VECTOR AND DUAL SINGLE VECTOR OPERATING MODES
A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.
INSERTING NULL VECTORS INTO A STREAM OF VECTORS
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
Converting a stream of data using a lookaside buffer
A stream of data is accessed from a memory system by an autonomous memory access engine, converted on the fly by the memory access engine, and then presented to a processor for data processing. A portion of a lookup table (LUT) containing converted data elements is preloaded into a lookaside buffer associated with the memory access engine. As the stream of data elements is fetched from the memory system each data element in the stream of data elements is replaced with a respective converted data element obtained from the LUT in the lookaside buffer according to a content of each data element to thereby form a stream of converted data elements. The stream of converted data elements is then propagated from the memory access engine to a data processor.
SYSTEMS, METHODS, AND APPARATUSES FOR HETEROGENEOUS COMPUTING
- Rajesh M. Sankaran ,
- Gilbert Neiger ,
- Narayan Ranganathan ,
- Stephen R. Van Doren ,
- Joseph Nuzman ,
- Niall D. McDonnell ,
- Michael A. O'Hanlon ,
- Lokpraveen B. Mosur ,
- Tracy Garrett Drysdale ,
- Eriko Nurvitadhi ,
- Asit K. Mishra ,
- Ganesh Venkatesh ,
- Deborah T. Marr ,
- Nicholas P. Carter ,
- Jonathan D. Pearce ,
- Edward T. Grochowski ,
- Richard J. Greco ,
- Robert Valentine ,
- Jesus Corbal ,
- Thomas D. Fletcher ,
- Dennis R. Bradford ,
- Dwight P. Manley ,
- Mark J. Charney ,
- Jeffrey J. Cook ,
- Paul Caprioli ,
- Koichi Yamada ,
- Kent D. Glossop ,
- David B. Sheffield
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
TWO-DIMENSIONAL ZERO PADDING IN A STREAM OF MATRIX ELEMENTS
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
Methods and systems for nested stream prefetching for general purpose central processing units
A method and hardware system to remove the overhead caused by having stream handling instructions in nested loops. Where code contains inner loops, nested in outer loops, a compiler pass identifies qualified nested streams and generates ISA specific instructions for transferring stream information linking an inner loop stream with an outer loop stream, to hardware components of a co-designed prefetcher. The hardware components include a frontend able to decode and execute instructions for a stream linking information transfer mechanism, a stream engine unit with a streams configuration table (SCT) having a field for allowing a subordinate stream to stay pending for values from its master stream, and a stream prefetch manager with buffers for storing values of current elements of a master stream, and with a nested streams control unit for reconfiguring and iterating the streams.
PROCESSING APPARATUS, METHOD AND SYSTEM FOR EXECUTING DATA PROCESSING ON A PLURALITY OF CHANNELS
A processing apparatus, a method and a system for executing data processing on a plurality of channels are disclosed. The processing apparatus for executing data processing on a plurality of channels includes: a channel information acquiring circuit, configured to acquire channel information of the plurality of channels; a storing circuit, including a plurality of storage regions corresponding to the plurality of channels, in which the storage regions are configured to store data information for the plurality of channels; a data reading control circuit, configured to read target data information corresponding to the channel information from a target storage region among the plurality of storage regions of the storing circuit, according to the channel information; and a cache circuit, configured to pre-store the target data information read from the target storage region of the storing circuit, by the data reading control circuit, to wait for use in the data processing.
Main processor prefetching operands for coprocessor operations
Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
Method and apparatus for dual issue multiply instructions
A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.
ONE-DIMENSIONAL ZERO PADDING IN A STREAM OF MATRIX ELEMENTS
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for a selected dimension of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When the selected dimension in the stream of vectors exceeds the specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.