G06F9/30076

Portions of configuration state registers in-memory

Portions of configuration state registers in-memory. An instruction is obtained, and a determination is made that the instruction accesses a configuration state register. A portion of the configuration state register is in-memory and another portion of the configuration state register is in-processor. Processing associated with the configuration state register is performed. The performing processing is based on a type of access and whether the portion or the other portion is being accessed.

ACCESSING TOPOLOGICAL MAPPING OF CORES

A method, computer program product, and system include a processor(s) issues an instruction that includes processing core information that includes locations of processing cores of the computing system (logical cores and/or physical cores), and an operator selection. The processor(s) sets security parameters for information returned by the instruction which is topological information for mapping of the logical cores to the physical cores. The processor(s) obtains the topological information and utilizes an operating system to map the logical cores to the physical cores.

Security enhancement in hierarchical protection domains

Methods and systems for allowing software components that operate at a specific exception level (e.g., EL-3 to EL-1, etc.) to repeatedly or continuously observe or evaluate the integrity of software components operating at a lower exception level (e.g., EL-2 to EL-0) to ensure that the software components have not been corrupted or compromised (e.g., subjected to malware, cyberattacks, etc.) include a computing device that identifies, by a component operating at a higher exception level (“HEL component”), at least one of a current vector base address (VBA), an exception raising instruction (ERI) address, or a control and system register value associated with a component operating at a lower exception level (“LEL component”). The computing device may perform a responsive action in response to determining that the current VBA, the ERT address, or control and system register value do not match the corresponding reference data.

Configurable delay insertion in compiled instructions
11556342 · 2023-01-17 · ·

Techniques are disclosed for utilizing configurable delays in an instruction stream. A set of instructions to be executed on a set of engines are generated. The set of engines are distributed between a set of hardware elements. A set of configurable delays are inserted into the set of instructions. Each of the set of configurable delays includes an adjustable delay amount that delays an execution of the set of instructions on the set of engines. The adjustable delay amount is adjustable by a runtime application that facilitates the execution of the set of instructions on the set of engines. The runtime application is configured to determine a runtime condition associated with the execution of the set of instructions on the set of engines and to adjust the set of configurable delays based on the runtime condition.

INTERMODAL CALLING BRANCH INSTRUCTION
20230010863 · 2023-01-12 ·

Processing circuitry has a handler mode and a thread mode. In response to an exception condition, a switch to handler mode is made. In response to an intermodal calling branch instruction specifying a branch target address when the processing circuitry is in the handler mode, an instruction decoder controls the processing circuitry to save a function return address to a function return address storage location; switch a current mode of the processing circuitry to the thread mode; and branch to an instruction identified by the branch target address. This can be useful for deprivileging of exceptions.

SHARED UNIT INSTRUCTION EXECUTION

A data processing apparatus comprises receiver circuitry for receiving instructions from each of a plurality of requester devices. Processing circuitry executes the instructions associated with each of a subset of the requester devices at a time and arbitration circuitry determines the subset of the requester devices and causes the instructions associated with each of the subset of the requester devices to be executed next. In response to the receiver circuitry receiving an instruction of a predetermined type from one of the requester devices outside the subset of requester devices, the arbitration circuitry causes the instruction of the predetermined type to be executed next.

Handling an input/output store instruction

An input/output store instruction is handled. A data processing system includes a system nest coupled to at least one input/output bus by an input/output bus controller. The data processing system further includes at least a data processing unit including a core, system firmware and an asynchronous core-nest interface. The data processing unit is coupled to the system nest via an aggregation buffer. The system nest is configured to asynchronously load from and/or store data to at least one external device which is coupled to the at least one input/output bus. The data processing unit is configured to complete the input/output store instruction before an execution of the input/output store instruction in the system nest is completed. The asynchronous core-nest interface includes an input/output status array with multiple input/output status buffers. The system firmware includes a retry buffer and the core includes an analysis and retry logic.

SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Data Object Delivery for Distributed Cluster Computing
20230007081 · 2023-01-05 ·

Methods and systems for delivering data for cluster computing are described herein. A worker device may receive a dataset and store the dataset in a local storage media. This may prevent the need for the dataset to be sent over a network each time the applications are used to perform a task. Each application may be able to access the dataset in the local storage area. This may prevent the need to copy the dataset to memory associated with each application. A worker device may store a dataset, for example, if it determines that the frequency of updates to the dataset satisfy a threshold. The worker device may receive updates to the dataset via a messaging system and may store the updated data in the local storage media.

SYSTEMS AND METHODS TO LOAD A TILE REGISTER PAIR

Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.