Patent classifications
G06F9/268
Scheduler for amp architecture with closed loop performance and thermal controller
Systems and methods are disclosed for scheduling threads on a processor that has at least two different core types, such as an asymmetric multiprocessing system. Each core type can run at a plurality of selectable voltage and frequency scaling (DVFS) states. Threads from a plurality of processes can be grouped into thread groups. Execution metrics are accumulated for threads of a thread group and fed into a plurality of tunable controllers for the thread group. A closed loop performance control (CLPC) system determines a control effort for the thread group and maps the control effort to a recommended core type and DVFS state. A closed loop thermal and power management system can limit the control effort determined by the CLPC for a thread group, and limit the power, core type, and DVFS states for the system. Deferred interrupts can be used to increase performance.
PROGRAMMABLE CONTROLLER
A method for a controller to execute a program comprising a sequence of functions on an accelerator with a pipelined architecture comprising a microcode buffer. The method comprises executing a function of the program as a sequence of operations, wherein the sequence of operations is represented by a sequence of templates, determining whether the template is non-colliding with previously inserted templates in the microcode buffer, determining whether data in local memory will be referenced before all previously inserted templates have taken effect, determining whether registers will be referenced before all previously inserted templates in the microcode buffer have taken effect, when it is determined that the template fits, that resources are available, that local data memory accesses will not collide, and that register accesses will not collide: creating a sequence of microcode instructions in the template, and inserting the template into the microcode buffer.
Memory devices, systems, and methods for updating firmware with single memory device
A method can include storing first instruction data in a first region of a nonvolatile memory device; mapping addresses of the first region to predetermined memory address spaces of a processor device; executing the first instruction data from the first region with the processor device; receiving second instruction data for the processor device. While the first instruction data remains available to the processor device, the second instruction data can be written into a second region of the nonvolatile memory device. By operation of the processor device, addresses of the second region can be remapped to the predetermined memory address spaces of the processor device; and executing the second instruction data from the second region with the processor device.
APPARATUS AND METHOD FOR SECURE, EFFICIENT MICROCODE PATCHING
An apparatus and method for efficient microcode patching. For example, one embodiment of an apparatus comprises: a package comprising one or more integrated circuit dies, the one or more integrated circuit dies comprising: a plurality of cores; and a security controller coupled to the plurality of cores, a first core of the plurality of cores comprising: a decoder to decode a microcode patching instruction, the microcode patching instruction comprising an operand to be used to identify an address; and execution circuitry to execute the microcode patching instruction, wherein responsive to the microcode patching instruction, the execution circuitry and/or security controller are to: retrieve a microcode patch from a location in memory based on the address, validate the microcode patch, apply the microcode patch to update or replace microcode associated with the one or more integrated circuit dies, and transmit the microcode patch to a persistent storage device; wherein the microcode patch is to be subsequently retrieved from the persistent storage device by one or more external security controllers of one or more external integrated circuit dies, the one or more external security controllers to cause the microcode patch to be applied to update or replace microcode associated with the one or more external integrated circuit dies.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
Microprocessor using compressed and uncompressed microcode storage
A microprocessor includes compressed and uncompressed microcode memory storages, having N-bit wide and M-bit wide addressable words, respectively, where N<M. The microprocessor also includes a fetch unit, an instruction translator, and an execution stage. When the instruction translator receives an architectural instruction, it writes information identifying source and destination registers specified by the architectural instruction to an indirection register. It also issues one or more fetch addresses to retrieve a sequence of one or more microcode instructions from one of the uncompressed microcode memory storage and the compressed microcode memory storage to implement the architectural instruction. It merges information in the indirection register with the sequence of one or more microcode instructions to generate a sequence of one or more implementing microinstructions.
Propagation of microcode patches to multiple cores in multicore microprocessor
A microprocessor includes a plurality of processing cores, wherein each of the plurality of processing cores executes microcode and comprises hardware to patch the microcode. A first core of the plurality of processing cores is configured to encounter an instruction that instructs the first core to apply a microcode patch. The first core of the plurality of processing cores is further configured to, in response to encountering the instruction, inform each core of the other of the plurality of processing cores of the microcode patch and apply the microcode patch to the hardware of the first core. Each core of the plurality of processing cores other than the first core is configured to apply the microcode patch to the hardware of the core, in response to being informed by the first core.
HARDWARE PROCESSORS AND METHODS FOR EXTENDED MICROCODE PATCHING
Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for: storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.
Apparatus and method to preclude X86 special bus cycle load replays in an out-of-order processor
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction comprises a load instruction resulting from execution of an x86 special bus cycle. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has received the operand, and is configured to preclude assertion of any indications that would otherwise result in a replay event.
Hardware processors and methods for extended microcode patching and reloading
Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for: storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.