G06F9/30076

Core Processor and Redundant Branch Processor with Control Flow Attack Detection

A secure processor with fault detection has a core thread which executes with a redundant branch processor thread. In one configuration, the core thread is operative on a fully functional core processor configured to execute a complete instruction set, and the redundant branch processor thread contains only initialization instructions and flow control instructions such as branch instructions and is operative on a redundant branch processor which is configured to execute a subset of the complete instruction set, specifically a branch control variable initialization and a branch instruction, thereby greatly simplifying the redundant branch processor architecture. Fault conditions are detected by comparing either a history of branch taken/not taken and branch targets, or a comparison of program counter activity for the core thread and redundant branch processor thread.

Systems and methods for performing 16-bit floating-point matrix dot product instructions

Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Method performed by a microcontroller for managing a NOP instruction and corresponding microcontroller

Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.

Extended asynchronous data mover functions compatibility indication

A method is provided that is executable by a processor of a computer. Note that the processor is communicatively coupled to a memory of the computer, and the memory stores a response block of a call command. In implementing the method, the processor defines a sub-functions field in the response block of the call command. Further the processor indicates that a set of functions of a set of instructions are installed and available at an interface based on a corresponding sub-functions flag within the sub-functions field being set. Note that the interface is also being executed on the computer and that the set of functions being represented by the corresponding sub-functions flag. The processor further indicates that the set of functions of the set of instructions are not installed based on the corresponding sub-functions flag not being set.

PROCESSOR, PROCESSING METHOD, AND RELATED DEVICE
20230093393 · 2023-03-23 ·

This application discloses a processor, a processing method, and a related device. The processor includes a processor core. The processor core includes an instruction dispatching unit and a graph flow unit and at least one general-purpose operation unit that are connected to the instruction dispatching unit. The instruction dispatching unit is configured to: allocate a general-purpose calculation instruction in a decoded to-be-executed instruction to the at least one general-purpose calculation unit, and allocate a graph calculation control instruction in the decoded to-be-executed instruction to the graph calculation unit, where the general-purpose calculation instruction is used to instruct to execute a general-purpose calculation task, and the graph calculation control instruction is used to instruct to execute a graph calculation task. The at least one general-purpose operation unit is configured to execute the general-purpose calculation instruction. The graph flow unit is configured to execute the graph calculation control instruction.

Systems and methods to load a tile register pair

Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

Apparatus and method for performing operations on capability metadata
11481384 · 2022-10-25 · ·

An apparatus is provided comprising storage elements to store data blocks, where each data block has capability metadata associated therewith identifying whether the data block specifies a capability, at least one capability type being a bounded pointer. Processing circuitry is then arranged to be responsive to a bulk capability metadata operation identifying a plurality of the storage elements, to perform an operation on the capability metadata associated with each data block stored in the plurality of storage elements. Via a single specified operation, this hence enables query and/or modification operations to be performed on multiple items of capability metadata, hence providing more efficient access to such capability metadata.

Processing Device Using Variable Stride Pattern

For certain applications, parts of the application data held in memory of a processing device (e.g. that are produced as a result of operations performed by the execution unit) are arranged in regular repeating patterns in the memory, and therefore, the execution unit may set up a suitable striding pattern for use by a send engine. The send engine accesses the memory at locations in accordance with the configured striding pattern so as to access a plurality of items of data that are arranged together in a regular pattern. In a similar manner as done for sends, the execution may set up a striding pattern for use by a receive engine. The receive engine, upon receiving a plurality of items of data, causes those items of data to be stored at locations in the memory, as determined in accordance with the configured striding pattern.

CIRCUIT ENABLING DEVICE, NON-TRANSITORY COMPUTER READABLE MEDIUM, AND USER-SPECIFIC CIRCUIT

A circuit enabling device includes a processor configured to, when multiple predetermined setting values are written in a register in predetermined order, enable a module corresponding to the multiple predetermined setting values and the predetermined order, the multiple predetermined setting values being determined in advance for each of a multiple modules, the register being used to enable the multiple modules individually, the multiple modules being included in a user-specific circuit, the user-specific circuit including a general-purpose circuit and a user circuit, the user circuit including the multiple modules.

Memory controller
11604605 · 2023-03-14 · ·

A memory controller circuit is disclosed which is coupleable to a first memory circuit, such as DRAM, and includes: a first memory control circuit to read from or write to the first memory circuit; a second memory circuit, such as SRAM; a second memory control circuit adapted to read from the second memory circuit in response to a read request when the requested data is stored in the second memory circuit, and otherwise to transfer the read request to the first memory control circuit; predetermined atomic operations circuitry; and programmable atomic operations circuitry adapted to perform at least one programmable atomic operation. The second memory control circuit also transfers a received programmable atomic operation request to the programmable atomic operations circuitry and sets a hazard bit for a cache line of the second memory circuit.