G06F9/40

Operand size control

A data processing system is provided with processing circuitry as well as a bank of 64-bit registers. An instruction decoder decodes arithmetic instructions and logical instruction specifying arithmetic operations and logical operations to be performed upon operands stored within the 64-bit registers. The instruction decoder is responsive to an operand size field SF within the arithmetic instructions and the logical instructions specifying whether the operands are 64-bit operands or 32-bit operands where all of the operands are 64-bit operands or all of the operands are 32-bit operands. If a switch is made to a lower exception level, then a check is made as to whether or not a register being used was previously subject to a 64-bit write to that register. If such a 64-bit write had previously taken place, then the upper 32-bits are flushed so as to avoid data leakage from the higher exception level.

Methods and apparatus for scheduling instructions using pre-decode data

Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

Flexible instruction execution in a processor pipeline
09747109 · 2017-08-29 · ·

Executing instructions in a processor includes analyzing operations to be performed by instructions, including: determining a latency associated with a first operation to be performed by a first instruction, determining a second operation to be performed by a second instruction, where a result of the second operation depends on a result of the first operation, and assigning a value to the second instruction corresponding to the determined latency associated with the first operation. One or more instructions are selected to be issued together in the same clock cycle of the processor from among instructions whose operations have been analyzed, the selected instructions occurring consecutively according to a program order. A start of execution of the second instruction is delayed by a particular number of clock cycles after the clock cycle in which the second instruction is issued according to the value assigned to the second instruction.

Stack pointer and memory access alignment control
09760374 · 2017-09-12 · ·

A data processing system 2 includes a stack pointer register 26, 28, 30, 32 storing a stack pointer value for use in stack access operations to a stack data store 44, 46, 48, 50. Stack alignment checking circuitry 36 which is selectively disabled may be provided to check memory address alignment of the stack pointer value associated with a stack memory access. The action of the stack alignment checking circuitry 36 is independent of any further other alignment checking performed in respect of all memory accesses. Thus, general alignment checking circuitry 38 may be provided and independently selectively disabled in respect of any memory access.

Retrieving instructions of a single branch, backwards short loop from a local loop buffer or virtual loop buffer

A method, system, and computer program product for instruction fetching within a processor instruction unit, utilizing a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. During instruction fetch, modified instruction buffers coupled to an instruction cache (I-cache) temporarily store instructions from a single branch, backwards short loop. The modified instruction buffers may be a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. The instruction fetch within the instruction unit of a processor retrieves the instructions for the short loop from the modified buffers during the loop cycles of the single branch, backwards short loop, rather than from the instruction cache.

Method and apparatus for sorting elements in hardware structures
09753734 · 2017-09-05 · ·

A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer.

Instruction emulation processors, methods, and systems

A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems.

Computer processor with deferred operations

A computer processor and corresponding method of operation employs execution logic that includes at least one functional unit and operand storage that stores data that is produced and consumed by the at least one functional unit. The at least one functional unit is configured to execute a deferred operation whose execution produces result data. The execution logic further includes a retire station that is configured to store and retire the result data of the deferred operation in order to store such result data in the operand storage, wherein the retire of such result data occurs at a machine cycle following issue of the deferred operation as controlled by statically-assigned parameter data included in the encoding of the deferred operation.

Flexible instruction execution in a processor pipeline
09690590 · 2017-06-27 · ·

Executing instructions in a processor includes: selecting or more instructions to be issued together in the same clock cycle of the processor from among a plurality of instructions, the selected one or more instructions occurring consecutively according to a program order; and executing instructions that have been issued, through multiple execution stages of a pipeline of the processor. The executing includes: determining a delay assigned to a first instruction, and sending a result of a first operation performed by the first instruction in a first execution stage to a second execution stage, where the number of execution stages between the first execution stage and the second execution stage is based on the determined delay.

Processing sytem with a secure set of executable instructions and/or addressing scheme
09684631 · 2017-06-20 · ·

A method for securing a data processing system having a processing unit is disclosed. At least a group of N1 digital words of m1 bits is selected from among the set of M1 digital words. N1 is less than M1. These words are selected in such a way that each selected digital word differs from all the other selected digital words by a number of bits at least equal to an integer p which is at least equal to 2. The group of N1 digital words of m1 bits form at least one group of N1 executable digital instructions. The processing unit is configured to make it capable of executing each instruction of the at least one group of N1 executable digital instructions.