Patent classifications
G06F9/3861
Systems for performing instructions for fast element unpacking into 2-dimensional registers
Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
System and method for instruction unwinding in an out-of-order processor
A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.
Operating system deactivation of storage block write protection absent quiescing of processors
Operating system deactivation of write protection for a storage block is provided absent quiescing of processors in a multi-processor computing environment. The process includes receiving an address translation protection exception interrupt resulting from an attempted write access by a processor to a storage block, and determining by the operating system whether write protection for the storage block is active. Based on write protection for the storage block not being active, the operating system issues an instruction to clear or modify translation lookaside buffer entries of the processor associated with the storage block, absent waiting for an action by another processor of multiple processors of the computing environment, to facilitate write access to the storage block proceeding at the processor.
DOMAIN TRANSITION DISABLE CONFIGURATION PARAMETER
A processing circuitry having a secure domain and a less secure domain. A control storage location stores a domain transition disable configuration parameter specifying whether domain transitions between the secure domain and the less secure domain are enabled or disabled in at least one mode of the process-ing circuitry. In the at least one mode of the processing circuitry, when the domain transition disable configuration parameter specifies that said domain transitions are disabled in said at least one mode, a disabled domain transition fault is signalled in response to an attempt to transition between domains in either direction. This can help support lazy configuration of resources for the secure domain or less secure domain for a thread expected only to need the other domain.
INFERRING FUTURE VALUE FOR SPECULATIVE BRANCH RESOLUTION
Aspects of the invention include includes determining a first instruction in a processing pipeline, wherein the first instruction includes a compare instruction, determining a second instruction in the processing pipeline, wherein the second instruction includes a conditional branch instruction relying on the compare instruction, determining a predicted result of the compare instruction, and completing the conditional branch instruction using the predicted result prior to executing the compare instruction.
Apparatus and method for configuring sets of interrupts
An apparatus and method are described for efficiently processing and reassigning interrupts. For example, one embodiment of an apparatus comprises: a plurality of cores; and an interrupt controller to group interrupts into a plurality of interrupt domains, each interrupt domain to have a set of one or more interrupts assigned thereto and to map the interrupts in the set to one or more of the plurality of cores.
UNFORWARDABLE LOAD INSTRUCTION RE-EXECUTION ELIGIBILITY BASED ON CACHE UPDATE BY IDENTIFIED STORE INSTRUCTION
A microprocessor includes a cache memory, a store queue, and a load/store unit. Each entry of the store queue holds store data associated with a store instruction. The load/store unit, during execution of a load instruction, makes a determination that an entry of the store queue holds store data that includes some but not all bytes of load data requested by the load instruction, cancels execution of the load instruction in response to the determination, and writes to an entry of a structure from which the load instruction is subsequently issuable for re-execution an identifier of a store instruction that is older in program order than the load instruction and an indication that the load instruction is not eligible to re-execute until the identified older store instruction updates the cache memory with store data.
FETCH QUEUES USING CONTROL FLOW PREDICTION
A data processing apparatus is provided. It includes control flow detection prediction circuitry that performs a presence prediction of whether a block of instructions contains a control flow instruction. A fetch queue stores, in association with prediction information, a queue of indications of the instructions and the prediction information comprises the presence prediction. An instruction cache stores fetched instructions that have been fetched according to the fetch queue. Post-fetch correction circuitry receives the fetched instructions prior to the fetched instructions being received by decode circuitry, the post-fetch correction circuitry includes analysis circuitry that causes the fetch queue to be at least partly flushed in dependence on a type of a given fetched instruction and the prediction information associated with the given fetched instruction.
STORE-TO-LOAD FORWARDING CORRECTNESS CHECKS AT STORE INSTRUCTION COMMIT
A microprocessor includes a load queue, a store queue, and a load/store unit that, during execution of a store instruction, records store information to a store queue entry. The store information comprises store address and store size information about store data to be stored by the store instruction. The load/store unit, during execution of a load instruction that is younger in program order than the store instruction, performs forwarding behavior with respect to forwarding or not forwarding the store data from the store instruction to the load instruction and records load information to a load queue entry, which comprises load address and load size information about load data to be loaded by the load instruction, and records the forwarding behavior in the load queue entry. The load/store unit, during commit of the store instruction, uses the recorded store information and the recorded load information and the recorded forwarding behavior to check correctness of the forwarding behavior.
User mode event handling
A method includes asserting a field of an event flag mask register configured to inhibit an event handler. The method also includes, responsive to an event that corresponds to the field of the event flag mask register being triggered: asserting a field of an event flag register associated with the event; and based the field in the event flag register being asserted, taking an action by a task being executed by the data processor core.