G06F9/3854

Instruction and logic for bulk register reclamation

A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.

Store nullification in the target field

Apparatus and methods are disclosed for nullifying memory store instructions identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions, based on a target field of the nullification instruction. The memory access instruction associated with the instruction identification is nullified. The memory access instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified memory access instruction, a subsequent memory access instruction from the first instruction block is executed.

High performance processor system and method based on general purpose units

This invention provides a high performance processor system and a method based on a common general purpose unit, it may be configured into a variety of different processor architectures; before the processor executes instructions, the instruction is filled into the instruction read buffer, which is directly accessed by the processor core, then instruction read buffer actively provides instructions to processor core to execute, achieving a high cache hit rate.

Processor arranged to operate as a single-threaded (nX)-bit processor and as an n-threaded X-bit processor in different modes of operation
10048967 · 2018-08-14 · ·

Methods of running a 32-bit operating system on a 64-bit processor are described. In an embodiment, the processor comprises 64-bit hardware and when running a 64-bit operating system operates as a single-threaded processor. However, when running a 32-bit operating system (which may be a guest operating system running on a virtual machine), the processor operates as a two-threaded core. The register file is logically divided into two portions, one for each thread, and logic within a functional unit may be split between threads, shared between threads or duplicated to provide an instance of the logic for each thread. Configuration bits may be set to indicate whether the processor should operate as a single-threaded or multi-threaded device.

Disambiguation-free out of order load store queue
10048964 · 2018-08-14 · ·

In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load.

METHOD FOR POPULATING AND INSTRUCTION VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS
20180225123 · 2018-08-09 ·

A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.

INFINITE PROCESSOR THREAD BALANCING

Embodiments include load-balancing a plurality of simultaneous threads of a processor. An example method includes computing a minimum group count for a thread from the plurality of threads. The minimum group count indicates a minimum number of groups of instructions to be assigned to the thread. The method further includes computing a maximum allowed group count for the thread. The maximum allowed group count indicates a maximum number of groups of instructions to be assigned to the thread. The method further includes issuing one or more groups of instructions for execution by the thread based on the minimum group count and the maximum allowed group count for the thread.

EXECUTING MULTIPLE PROGRAMS SIMULTANEOUSLY ON A PROCESSOR CORE

Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

System and method of merging partial write result during retire phase
10042646 · 2018-08-07 · ·

A processor including a physical register file, a rename table, mapping logic, size tracking logic, and merge logic. The rename table maps an architectural register with a larger index and a smaller index. The mapping logic detects a partial write instruction that specifies an architectural register that is already identified by an entry of the rename table mapped to a second physical register allocated for a larger write operation, and includes an index for the allocated register for the partial write instruction into the smaller index location of the entry. The size tracking logic provides a merge indication for the partial write instruction if the write size of the previous write instruction is larger. The merge logic merges the result of the partial write instruction with the second physical register during retirement of the partial write instruction.

Managing a divided load reorder queue

Managing a divided load reorder queue including storing load instruction data for a load instruction in an expanded LRQ entry in the LRQ; launching the load instruction from the expanded LRQ entry; determining that the load instruction is in a finished state; moving a subset of the load instruction data from the expanded LRQ entry to a compact LRQ entry in the LRQ, wherein the compact LRQ entry is smaller than the expanded LRQ entry; and removing the load instruction data from the expanded LRQ entry.