Patent classifications
G06F9/383
ARITHMETIC PROCESSING APPARATUS AND CONTROL METHOD OF THE ARITHMETIC PROCESSING APPARATUS
An arithmetic processing apparatus includes a prefetch unit configured to send a prefetch request to a subordinate cache memory for prefetching data of a main storage device into a primary cache memory. The arithmetic processing apparatus further includes a count unit configured to count a hit count of how many times it is detected that prefetch request target data is retained in the subordinate cache memory when executing a response process to respond to the prefetch request sent from the prefetch unit. The arithmetic processing apparatus yet further includes an inhibition unit configured to inhibit the prefetch unit from sending the prefetch request when the counted hit count reaches a threshold value.
Method for performing random read access to a block of data using parallel LUT read instruction in vector processors
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Anticipated prefetching for a parent core in a multi-core chip
Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. The method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core.
Inserting null vectors into a stream of vectors
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
DUAL DATA STREAMS SHARING DUAL LEVEL TWO CACHE ACCESS PORTS TO MAXIMIZE BANDWIDTH UTILIZATION
A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
Efficient load value prediction
Certain aspects of the present disclosure provide techniques for training load value predictors, comprising: determining if a prediction has been made by one or more of a plurality of load value predictors; determining a misprediction has been made by one or more load value predictors of the plurality of load value predictors; training each of the one or more load value predictors that made the misprediction; and resetting a confidence value associated with each of the one or more load value predictors that made the misprediction.
Methods and systems for caching data using behavioral event correlations
A method is disclosed including a client accessing a cache for a value of an object based on an object identification (ID), initiating a request to a cache loader if the cache does not include a value for the object, the cache loader performing a lookup in an object table for the object ID corresponding to the object, the cache loader retrieving a vector of execution context IDs, from an execution context table that correspond to the object IDs looked up in the object table and the cache loader performing an execution context lookup in an execution context table for every retrieved execution context ID in the vector to retrieve object IDs from an object vector.
Streaming engine with stream metadata saving for context switching
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A stream head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.
DATA PROCESSING APPARATUS AND METHOD FOR PREFETCH GENERATION
The invention provides a data processing apparatus and a data processing method for generating prefetches of data for use during execution of instructions by processing circuitry. The prefetches that are generated are based on a nested prefetch pattern. The nested prefetch pattern comprises a first pattern and a second pattern. The first pattern is defined by a first address offset between sequentially accessed addresses and a first observed number of the sequentially accessed addresses separated by the first address offset. The second pattern is defined by a second address offset between sequential iterations of the first pattern and a second observed number of the sequential iterations of the first pattern separated by the second address offset.
Method and apparatus for permuting streamed data elements
A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.