G06F9/30145

Instruction-based multi-thread multi-mode PDCCH decoder for cellular data device
11595154 · 2023-02-28 · ·

A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. To implement PDCCH decoding, a cellular modem can include a pipeline having multiple processing engines, with the processing engines including functional units that execute instructions corresponding to different stages in the PDCCH decoding process. Flow control and data synchronization between instructions can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management.

Handling an input/output store instruction

An input/output store instruction is handled. A data processing system includes a system nest coupled to at least one input/output bus by an input/output bus controller. The data processing system further includes at least a data processing unit including a core, system firmware and an asynchronous core-nest interface. The data processing unit is coupled to the system nest via an aggregation buffer. The system nest is configured to asynchronously load from and/or store data to at least one external device which is coupled to the at least one input/output bus. The data processing unit is configured to complete the input/output store instruction before an execution of the input/output store instruction in the system nest is completed. The asynchronous core-nest interface includes an input/output status array with multiple input/output status buffers. The system firmware includes a retry buffer and the core includes an analysis and retry logic.

Apparatus and method for inhibiting instruction manipulation

An apparatus and method are provided for inhibiting instruction manipulation. The apparatus has execution circuitry for performing data processing operations in response to a sequence of instructions from an instruction set, and decoder circuitry for decoding each instruction in the sequence in order to generate control signals for the execution circuitry. Each instruction comprises a plurality of instruction bits, and the decoder circuitry is arranged to perform a decode operation on each instruction to determine from the value of each instruction bit, and knowledge of the instruction set, the control signals to be issued to the execution circuitry in response to that instruction. An input path to the decoder circuitry comprises a set of wires over which the instruction bits of each instruction are provided. Scrambling circuitry is used to perform a scrambling function on each instruction using a secret scrambling key, such that the wire within the set of wires over which any given instruction bit is provided to the decoder circuitry is dependent on the secret scrambling key. The decode operation performed by the decoder circuitry is then adapted to incorporate a descrambling function using the secret scrambling key to reverse the effect of the scrambling function. As a result, independent of which wire any given instruction bit is provided on, the decode operation is arranged when decoding a given instruction to correctly interpret each instruction bit of that given instruction, based on knowledge of the instruction set, in order to determine from the value of each instruction bit the control signals to be issued to the execution circuitry in response to that given instruction.

SORT ACCELERATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS
20180004520 · 2018-01-04 · ·

A processor of an aspect includes packed data registers, and a decode unit to decode an instruction. The instruction may indicate a first source packed data to include at least four data elements, to indicate a second source packed data to include at least four data elements, and to indicate a destination storage location. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, is to store a result packed data in the destination storage location. The result packed data may include at least four indexes that may identify corresponding data element positions in the first and second source packed data. The indexes may be stored in positions in the result packed data that are to represent a sorted order of corresponding data elements in the first and second source packed data.

PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO SUPPORT LIVE MIGRATION OF PROTECTED CONTAINERS

A processor includes a decode unit to decode an instruction that is to indicate a page of a protected container memory, and a storage location outside of the protected container memory. An execution unit, in response to the instruction, is to ensure that there are no writable references to the page of the protected container memory while it has a write protected state. The execution unit is to encrypt a copy of the page of the protected container memory. The execution unit is to store the encrypted copy of the page to the storage location outside of the protected container memory, after it has been ensured that there are no writable references. The execution unit is to leave the page of the protected container memory in the write protected state, which is also valid and readable, after the encrypted copy has been stored to the storage location.

Streaming engine with multi dimensional circular addressing selectable at each dimension
11709779 · 2023-07-25 · ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Computing System and controller thereof
20180011710 · 2018-01-11 ·

Computing system and controller thereof are disclosed for ensuring the correct logical relationship between multiple instructions during their parallel execution. The computing system comprises: a plurality of functional modules each performing a respective function in response to an instruction for the given functional module; and a controller for determining whether or not to send an instruction to a corresponding functional module according to dependency relationship between the plurality of instructions.

Method for migrating CPU state from an inoperable core to a spare core

An apparatus is disclosed in which the apparatus may include a plurality of cores, including a first core, a second core and a third core, and circuitry coupled to the first core. The first core may be configured to process a plurality of instructions. The circuitry may be may be configured to detect that the first core stopped committing a subset of the plurality of instructions, and to send an indication to the second core that the first core stopped committing the subset. The second core may be configured to disable the first core from further processing instructions of the subset responsive to receiving the indication, and to copy data from the first core to a third core responsive to disabling the first core. The third core may be configured to resume processing the subset dependent upon the data.

APPARATUS TO OPTIMIZE GPU THREAD SHARED LOCAL MEMORY ACCESS

One embodiment provides for a graphics processor comprising first logic coupled with a first execution unit, the first logic to receive a first single instruction multiple data (SIMD) message from the first execution unit; second logic coupled with a second execution unit, the second logic to receive a second SIMD message from the second execution unit; and third logic coupled with a bank of shared local memory (SLM), the third logic to receive a first request to access the bank of SLM from the first logic, a second request to access the bank of SLM from the second logic, and in a single access cycle, schedule a read access to a read port for the first request and a write access to a write port for the second request.

Enabling removal and reconstruction of flag operations in a processor

In one embodiment, a processor includes fetch logic to fetch instructions, decode logic to decode the fetched instructions, and execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer.