Patent classifications
G06F9/30076
Tile-based result buffering in memory-compute systems
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.
Low-latency register error correction
To implement low-latency register error correction a register may be read as part of an instruction when that instruction is the currently executing instruction in a processor. A correctable error in data produced from reading the register can be detected. In response to detecting the correctable error, the currently executing instruction in the processor can be changed into a register update instruction that is executed to overwrite the data in the register with corrected data. Then, the original (e.g., unchanged) instruction can be rescheduled.
EXECUTING SYSTEM CALL VECTORED INSTRUCTIONS IN A MULTI-SLICE PROCESSOR
Executing system call vectored (SCV) instructions in a multi-slice processor including receiving, by an instruction fetch unit, a SCV instruction, wherein the SCV instruction is a system call from an operating system; sending the SCV instruction to a branch issue queue; determining, by the branch issue queue, that the SCV instruction is next-to-complete; issuing the SCV instruction to a branch resolution unit; and executing the SCV instruction by the branch resolution unit.
APPARATUS AND METHOD FOR PERFORMING A SPIN-LOOP JUMP
An apparatus and method for performing a spin-loop jump. One embodiment of a processor comprises: jump-pause execution logic to execute a jump-pause instruction, the jump-pause instruction to specify a condition and identify a destination instruction; wherein responsive to the execution of the jump-pause instruction, the jump-pause execution logic is to provide a hint that a loop between the jump-pause instruction and the destination instruction comprises a spin-wait loop and to test the condition, the jump-pause execution logic to delay execution by a specified amount prior to jumping to the destination instruction if the condition is satisfied. A second embodiment of a processor comprises test-subtract execution logic to execute a test-subtract instruction, the test-subtract instruction to decrement the counter value in a second source register, the test-subtract execution logic to further test the monitored value in a first source register or memory and the counter value in the second source register, wherein the test-subtract execution logic is to exit a spin-wait loop if the monitored value has a value indicating an exit condition or if the counter value is equal to zero.
CONDITIONAL EXECUTION SPECIFICATION OF INSTRUCTIONS USING CONDITIONAL EXTENSION SLOTS IN THE SAME EXECUTE PACKET IN A VLIW PROCESSOR
In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.
Processor having multiple cores, shared core extension logic, and shared core extension utilization instructions
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
SUPPORTING EVEN INSTRUCTION TAG ('ITAG') REQUIREMENTS IN A MULTI-SLICE PROCESSOR USING NULL INTERNAL OPERATIONS (IOPS)
Supporting even instruction tag (‘ITAG’) requirements in a multi-slice processor with null internal operations (IOPs) includes: receiving an IOP with an even ITAG requirement; determining that the IOP is to be assigned an odd ITAG; and inserting a null IOP into an instruction lane ahead of the IOP, wherein the null IOP is assigned the odd ITAG, and the IOP is assigned an even ITAG.
Command-type filtering based on per-command filtering indicator
An adjunct processor dynamically determines, on a per-command basis, whether commands obtained by the adjunct processor are to be processed by the adjunct processor. The adjunct processor obtains a command request of a requester. The command request includes at least one filtering indicator indicating at least one valid command type for processing by the adjunct processor for the requester. The adjunct processor determines using the at least one filtering indicator whether a command of the command request is valid for processing by the adjunct processor for the requester. Based on determining that the command is valid for processing by the adjunct processor, the command is processed by the adjunct processor.
INSTRUCTION EXECUTION CONTROL SYSTEM AND INSTRUCTION EXECUTION CONTROL METHOD
An instruction execution control system includes a plurality of instruction storage units configured to output instructions in an FIFO order to a plurality of instruction execution units configured to execute the instructions; an instruction control unit configured to assign each of a plurality of the sequentially input instructions to one of the instruction storage units, and an output control unit configured to control the output of the instructions from the instruction storage units. When the input instruction is a dummy instruction to be inserted between instructions that should be executed in an execution order, the instruction control unit distributes the input instruction to the plurality of instruction storage units. The output control unit stops the output of the instructions from the instruction storage unit, the instruction output therefrom has become the dummy instruction, to the instruction execution unit until instructions output from all instruction storage units become the dummy instructions.
Instruction and logic for tracking fetch performance bottlenecks
A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.