G06F9/38585

Determining a restart point in out-of-order execution

There is provided a data processing apparatus comprising decode circuitry responsive to receipt of a block of instructions to generate control signals indicative of each of the block of instructions, and to analyse the block of instructions to detect a potential hazard instruction. The data processing apparatus is provided with decode circuitry to encode information indicative of a clean restart point into the control signals associated with the potential hazard instruction. The data processing apparatus is provided with data processing circuity to perform out-of-order execution of at least some of the block of instructions, and control circuitry responsive to a determination, at execution of the potential hazard instruction, that data values used as operands for the potential hazard instruction have been modified by out-of-order execution of a subsequent instruction, to restart execution from the clean restart point and to flush held data values from the data processing circuitry.

MICROPROCESSOR THAT PREVENTS STORE-TO-LOAD FORWARDING BETWEEN DIFFERENT TRANSLATION CONTEXTS
20230315838 · 2023-10-05 ·

A processor and a method are disclosed that mitigate side channel attacks (SCAs) that exploit store-to-load forwarding operations. In one embodiment, the processor detects a translation context change from a first translation context (TC) to a second TC and responsively disallows store-to-load forwarding until all store instructions older than the TC change are committed. The TC comprises an address space identifier (ASID), a virtual machine identifier (VMID), a privilege mode (PM) or a combination of two or more of the ASID, VMID and PM, or a derivative thereof, such as a TC hash, TC generation value, or a RobID associated with the last TC-updating instruction. In other embodiments, TC generation values of load and store instructions are compared or RobIDs of the load and store instructions are compared with the RobID associated with the last TC-updating instruction. If the instructions' RobIDs straddle the TC boundary, store-to-load forwarding is not allowed.

Method of superposition of multiple commands execution

In a method for superposition of multiple commands, one or more memory pages is received. The one or more memory pages include information corresponding to one or more code lines and one or more data lines. The one or more code lines correspond to a first set of layers in a memory layer and are configured to execute one or more functions. The one or more data lines correspond to a second set of layers in the memory layer and are configured to store one or more sets of data. Each of the one or more code lines from the one or more memory pages is executed to perform one or more corresponding functions, based on the one or more data lines from the one or more memory pages. A result of each of the one or more functions is stored within the one or more data lines.

Apparatus and method for segmenting a data stream of a physical layer
11829759 · 2023-11-28 · ·

The invention introduces an apparatus for segmenting a data stream, installed in a physical layer, to include a host interface, a data register and a boundary detector. The data register is arranged to operably store data received from the host side through the host interface. The boundary detector is arranged to operably detect the content of a data register. When the data register includes a special symbol, the boundary detector outputs a starting address that the special symbol is stored in the data register to an offset register to update a value stored in the offset register, thereby enabling a stream splitter to divide data bits of the data register according to the updated value of the offset register.

Techniques to identify improper information in call stacks
11461220 · 2022-10-04 · ·

Embodiments are disclosed for obtaining a call stack for binaries, where the call stack includes a sequence of frames, and each frame has a “from” address and a “to” address for a call instruction, and for determining basic blocks of instructions for the binaries, where each basic block of instruction has one or more instructions. Further, the embodiments include traversing the call stack to validate from/to address pairs of sequential frames based on control flow routes existing between “from” addresses and “to” addresses of the from/to address pairs, where each from/to address pair has a “from” address of a frame and a “to” address of an immediate previous frame on the call stack.

Systems, methods, and apparatuses for heterogeneous computing

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

Data processing apparatus and operating method thereof
11449449 · 2022-09-20 · ·

A data processing apparatus includes a master device configured to transmit commands for destinations, a slave device including a plurality of command processing regions respectively corresponding to the destinations, and a controller configured to relay communication between the master device and the slave device. The controller assigns time stamp value to the commands as an initial value when the commands was received by the controller and increment the time stamp value every command arbitration cycle, selects a command having a largest time stamp value among the commands in a tournament manner by comparing commands having different destinations every command arbitration cycle, stores a command selection history of each comparison of commands, selects the command based on a command selection history corresponding to the compared commands when respective time stamp values of the compared commands are the same or substantially the same as each other.

Apparatus and method for segmenting a data stream of a physical layer
11422813 · 2022-08-23 · ·

The invention introduces an apparatus for segmenting a data stream, installed in a physical layer, to include a host interface, a data register and a boundary detector. The data register is arranged to operably store data received from the host side through the host interface. The boundary detector is arranged to operably detect the content of the data register. When the data register includes a boundary-lock pattern or a special symbol, the boundary detector outputs a starting address that the boundary-lock pattern or the special symbol is stored in the data register to an offset register to update a value stored in the offset register, thereby enabling a stream splitter to divide data bits of the data register according to the updated value of the offset register.

Device and processing architecture for resolving execution pipeline dependencies without requiring no operation instructions in the instruction memory

Different processor architectures are described to evaluate and track dependencies required by instructions. The processors may hold or queue instructions that require output of other instructions until required data and resources are available which may remove the requirement of NOPs in the instruction memory to resolve dependencies and pipeline hazards. The processor may divide instruction data into bundles for parallel execution and provide speculative execution. The processor may include various components to implement an evaluation unit, execution unit and termination unit.

HIGHLY PARALLEL PROCESSING ARCHITECTURE USING DUAL BRANCH EXECUTION
20220107812 · 2022-04-07 · ·

Techniques for task processing in a highly parallel processing architecture using dual branch execution are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide, variable length, control words generated by the compiler. The control includes a branch. Two sides of the branch in the array are executed while waiting for a branch decision to be acted upon by control logic. The branch decision is based on computation results in the array. Data produced by a taken branch path is promoted. Results from a side of the branch not indicated by the branch decision are ignored or invalidated.