G06F9/30105

SAVING AND RESTORING REGISTERS
20230069266 · 2023-03-02 ·

There is provided a data processing apparatus comprising a plurality of registers, each of the registers having data bits to store data and metadata bits to store metadata. Each of the registers is adapted to operate in a metadata mode in which the metadata bits and the data bits are valid, and a data mode in which the data bits are valid and the metadata bits are invalid. Mode bit storage circuitry indicates whether each of the registers is in the data mode or the metadata mode. Execution circuitry is responsive to a memory operation that is a store operation on one or more given registers.

APPARATUS AND METHOD FOR CAPABILITY-BASED PROCESSING

Apparatus comprises a processor to execute program instructions stored at respective memory addresses, processing of the program instructions being constrained by a prevailing capability defining at least access permissions to a set of one or more memory addresses; the processor comprising: control flow change handling circuitry to perform a control flow change operation, the control flow change operation defining a control flow change target address indicating the address of a program instruction for execution after the control flow change operation; and capability generating circuitry to determine, in dependence on the control flow change target address, an address at which capability access permissions data is stored; the capability generating circuitry being configured to retrieve the capability access permissions data and to generate a capability for use as a next prevailing capability in dependence upon at least the capability access permissions data.

Circular buffer accessing device, system and method

A device includes a circular buffer, which, in operation, is organized into a plurality of subsets of buffers, and control circuitry coupled to the circular buffer. The control circuitry, in operation, receives a memory load command to load a set of data into the circular buffer. The memory load command has an offset parameter indicating a data offset and a subset parameter indicating a subset of the plurality of subsets into which the circular buffer is organized. The control circuitry responds to the command by identifying a set of buffer addresses of the circular buffer based on a value of the offset parameter and a value of the subset parameter, and loading the set of data into the circular buffer using the identified set of buffer addresses.

ROTATING ACCUMULATOR
20230116419 · 2023-04-13 · ·

A processing unit for generating an output vector is provided. The processing unit comprises an output vector register and a vector unit and is configured to execute machine code instructions, each instruction being an instance of a predefined set of instruction types in an instruction set of the processing unit. The instruction set includes a vector processing instruction defined by a corresponding opcode, which causes the processing unit to: i) process, using the vector unit, at least two input vectors to generate a result value; ii) perform a rotation operation on the plurality of elements of the output register in which the result value or a value based on the result value is placed in the first end element of the output register.

Vector bit transpose

A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.

VIRTUAL MODE EXECUTION MANAGER

Disclosed herein is hardware for easing the process of changing the execution mode of a virtual machine and its associated resources. By adopting the hardware, it is possible to trigger a change in the execution mode in an automatic way, without software intervention, and without interfering with the execution of other virtual machines. In addition, in case an error has occurred for a virtual machine and it is detected, the hardware can be used to disable the resources associated with that virtual machine and generate notification of the completion this operation to other hardware, which will complete the reset of the virtual machine. By adopting the hardware, the execution mode change is simplified and offers configurability and flexibility for a system running multiple virtual machines.

Mechanism for reducing coherence directory controller overhead for near-memory compute elements

A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

Method of using multidimensional blockification to optimize computer program and device thereof

Disclosed embodiments relate to a method and device for optimizing compilation of source code. The proposed method receives a first intermediate representation code of a source code and analyses each basic block instruction of the plurality of basic block instructions contained in the first intermediate representation code for blockification. In order to blockify the identical instructions, the one or more groups of basic block instructions are assessed for eligibility of blockification. Upon determining as eligible, the group of basic block instructions are blockified using one of one dimensional SIMD vectorization and two-dimensional SIMD vectorization. The method further generates a second intermediate representation of the source code which is translated to executable target code with more efficient processing capacity.

Apparatuses, methods, and systems for instructions to multiply floating-point values of about one

Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about one threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about one threshold, cause a second comparison of an exponent of the second floating-point number to the about one threshold, provide as a resultant of the single instruction a value of the first floating-point number one when both the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold and the second comparison indicates the exponent of the second floating-point number does not exceed the about one threshold, provide as the resultant of the single instruction the second floating-point number when the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about one threshold or and the second comparison indicates the exponent of the second floating-point number exceeds the about one threshold.

COALESCING ADJACENT GATHER/SCATTER OPERATIONS

According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.