Patent classifications
G06F9/312
Memory controller and memory system for generating instruction set based on non-interleaving block group information
Embodiments of the present invention include a memory controller including a buffer memory configured to store program data, an instruction set configurator configured to configure an instruction set describing a procedure for programming the program data stored in the buffer memory to target memory blocks, an instruction set performer configured to sequentially perform instructions in the instruction set and generate an interrupt at a time of completion of performance of a last instruction among the instructions, and a central processing unit configured to erase the program data stored in the buffer memory when the interrupt is received from the instruction set performer. The instruction set configurator may configure the instruction set differently according to whether a non-interleaving block group exists among the target memory blocks.
Marking current context data to control a context-data-dependent processing operation to save current or default context data to a data location
A data processing system includes processing circuitry for executing context-data-dependent program instructions which are decoded by decoder circuitry. Such context-data-dependent program instructions perform processing which is dependent upon currently existing context data. As an example, the context-data-dependent program instructions may be floating point instructions and the context data may be rounding mode information. The decoder circuitry supports a context save instruction which saves context data when it is marked as having been used and saves default context data when the current context data is marked as not having been used. The decoder circuitry further supports a context restore instruction which restores context data when the current context data is marked as having been used and permits the current context data to continue for future use when it is marked as currently unused.
Arithmetic processing device having multicore ring bus structure with turn-back bus for handling register file push/pull requests
An arithmetic processing device includes arithmetic processing units, each having a calculator unit; a scheduler that controls a push instruction to write data to a register file in one of the arithmetic processing units and a pull instruction to read data from the register file; a pull request bus to which the scheduler outputs a pull request and which is connected to the arithmetic processing units; a push request bus to which the scheduler outputs a push request and which is connected to the arithmetic processing units; and a pull data bus that inputs, into the scheduler, pull data read from the register file in response to the pull request. Each of the arithmetic processing units includes a pull data turn-back bus that propagates pull data read from its register file to the pull data bus.
Speculatively executing instructions that follow a status updating instruction
A data processing apparatus is provided that comprises fetch circuitry to fetch an instruction stream comprising a plurality of instructions, including a status updating instruction, from storage circuitry. Status storage circuitry stores a status value. Execution circuitry executes the instructions, wherein at least some of the instructions are executed in an order other than in the instruction stream. For the status updating instruction, the execution circuitry is adapted to update the status value based on execution of the status updating instruction. Flush circuitry flushes, when the status storage circuitry is updated, following instructions that appear after the status updating instruction in the instruction stream.
Load-store instruction for performing multiple loads, a store, and strided increment of multiple addresses
A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each of the two load operations and a respective store address for the store operation. The load-store instruction further includes three stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.
Instruction and logic to provide stride-based vector load-op functionality with mask duplication
Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.
Method and apparatus for efficiently managing architectural register state of a processor
An apparatus and method for efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: a source mask register to be logically subdivided into at least a first portion to store a usable portion of a mask value and a second portion to store an indication of whether the usable portion of the mask value has been updated; a control register to store an unusable portion of the mask value; architectural state management logic to read the indication to determine whether the mask value has been updated prior to performing a store operation, wherein if the mask value has been updated, then the architectural state management logic is to read the usable portion of the mask value from the first portion of the source mask register and zero out bits of the unusable portion of the mask value to generate a final mask value to be saved to memory, and wherein if the mask value has not been updated, then the architectural state management logic is to concatenate the usable portion of the mask value with the unusable portion of the mask value read from the control register to generate a final mask value to be saved to memory.
Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering
Instructions and logic provide vector scatter-op and/or gather-op functionality. In some embodiments, responsive to an instruction specifying: a gather and a second operation, a destination register, an operand register, and a memory address; execution units read values in a mask register, wherein fields in the mask register correspond to offset indices in the indices register for data elements in memory. A first mask value indicates the element has not been gathered from memory and a second value indicates that the element does not need to be, or has already been gathered. For each having the first value, the data element is gathered from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. When all mask register fields have the second value, the second operation is performed using corresponding data in the destination and operand registers to generate results.
Non-serialized push instruction for pushing a message payload from a sending thread to a receiving thread
In at least some embodiments, a processor core executes a sending thread including a first push instruction and a second push instruction subsequent to the first push instruction in a program order. Each of the first and second push instructions requests that a respective message payload be pushed to a mailbox of a receiving thread. In response to executing the first and second push instructions, the processor core transmits respective first and second co-processor requests to a switch in the data processing system via an interconnect fabric of the data processing system. The processor core transmits the second co-processor request to the switch without regard to acceptance of the first co-processor request by the switch.
Variable latency instructions
Techniques related to executing instructions by a processor comprising receiving a first instruction for execution, determining a first latency value based on an expected amount of time needed for the first instruction to be executed, storing the first latency value in a writeback queue, beginning execution of the first instruction on the instruction execution pipeline, adjusting the latency value based on an amount of time passed since beginning execution of the first instruction, outputting a first result of the first instruction based on the latency value, receiving a second instruction, determining that the second instruction is a variable latency instruction, storing a ready value indicating that a second result of the second instruction is not ready in the writeback queue, beginning execution of the second instruction on the instruction execution pipeline, updating the ready value to indicate that the second result is ready, and outputting the second result.