Patent classifications
G06F9/3808
Selective suppression of instruction translation lookaside buffer (ITLB) access
Processing of an instruction fetch from an instruction cache is provided, which includes: determining whether the next instruction fetch is from a same address page as a last instruction fetch from the instruction cache; and based, at least in part, on determining that the next instruction fetch is from the same address page, suppressing for the next instruction fetch an instruction address translation table access, and comparing for an address match results of an instruction directory access for the next instruction fetch with buffered results of a most-recent, instruction address translation table access for a prior instruction fetch from the instruction cache.
Instruction cache coherence
A data processing apparatus is provided, which includes a cache to store operations produced by decoding instructions fetched from memory. The cache is indexed by virtual addresses of the instructions in the memory. Receiving circuitry receives an incoming invalidation request that references a physical address in the memory. Invalidation circuitry invalidates entries in the cache where the virtual address corresponds with the physical address. Coherency is thereby achieved when using a cache that is indexed using virtual addresses.
Automatically introducing register dependencies to tests
Method, apparatus and product for automatically introducing register dependency into tests. A test template represents an abstract test scenario to be utilized for testing a target processor. The abstract test scenario requires that a value be assigned to a register. A test that implements the abstract test scenario is generated. The test is a set of instructions that are executable by the target processor. The generation of the test comprises: determining a memory address to retain the value in a memory that is accessible to the target processor; and adding to the test an instruction to load to the register the value from the memory address, whereby adding a register dependency to the test that is not required by the abstract test scenario. The test can be executed on the target processor or simulation thereof.
Repeat Instruction for Loading and/or Executing Code in a Claimable Repeat Cache a Specified Number of Times
A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.
Microprocessor with instruction fetching failure solution
A microprocessor with a solution to instruction fetching failure is shown. The branch predictor and the instruction cache are decoupled by a fetch target queue. In response to instruction fetching failure of a target fetching address, the instruction cache regains the target fetching address from the fetch target queue to restart the failed instruction fetching.
Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching
Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching is disclosed. An instruction processing circuit is configured to detect fetched performance degrading instructions (Pals) in a pre-execution stage in an instruction pipeline that may cause a precise interrupt that would cause flushing of the instruction pipeline. In response to detecting a PDI in an instruction pipeline, the instruction processing circuit is configured to capture the fetched PDI and/or its successor, younger fetched instructions that are processed in the instruction pipeline behind the PDI, in a pipeline refill circuit. If a later execution of the PDI in the instruction pipeline causes a flush of the instruction pipeline, the instruction processing circuit can inject the fetched PDI and/or its younger instructions previously captured from the pipeline refill circuit into the instruction pipeline to be processed without such instructions being re-fetched.
System and method for adaptively sampling application programming interface execution traces based on clustering
A system and method for sampling application programming interface (API) execution traces in a computer system uses feature vectors of the API execution traces that are generated using trace-context information. The feature vectors are then used to group the API execution traces into clusters. For the cluster, sampling rates are generated so that a sampling rate is assigned to each of the clusters. The sampling rates are then applied to the API execution traces to adaptively sample the API execution traces based on the clusters to which the API execution traces belong.
BIT STRING LOOKUP DATA STRUCTURE
Systems, apparatuses, and methods related to bit string operations using a computing tile are described. An example apparatus includes computing device (or “tile”) that includes a processing unit and a memory resource configured as a cache for the processing unit. A data structure can be coupled to the computing device. The data structure can be configured to receive a bit string that represents a result of an arithmetic operation, a logical operation, or both and store the bit string that represents the result of the arithmetic operation, the logical operation, or both. The bit string can be formatted in a format different than a floating-point format.
GENERAL-PURPOSE COMPUTING ACCELERATOR AND OPERATION METHOD THEREOF
Disclosed is a general-purpose computing accelerator which includes a memory including an instruction cache, a first executing unit performing a first computation operation, a second executing unit performing a second computation operation, an instruction fetching unit fetching an instruction stored in the instruction cache, a decoding unit that decodes the instruction, and a state control unit controlling a path of the instruction depending on an operation state of the second executing unit. The decoding unit provides the instruction to the first executing unit when the instruction is of a first type and provides the instruction to the state control unit when the instruction is of a second type. Depending on the operation state of the second executing unit, the state control unit provides the instruction of the second type to the second executing unit or stores the instruction of the second type as a register file in the memory.
Filtering micro-operations for a micro-operation cache in a processor
A processor includes a micro-operation cache having a plurality of micro-operation cache entries for storing micro-operations decoded from instruction groups and a micro-operation filter having a plurality of micro-operation filter table entries for storing identifiers of instruction groups for which the micro-operations are predicted dead on fill if stored in the micro-operation cache. The micro-operation filter receives an identifier for an instruction group. The micro-operation filter then prevents a copy of the micro-operations from the first instruction group from being stored in the micro-operation cache when a micro-operation filter table entry includes an identifier that matches the first identifier.