G06F9/3818

Object-Oriented Memory Client
20230273823 · 2023-08-31 ·

A hardware client and corresponding method employ an object-oriented memory device. The hardware client generates an object-oriented message associated with an object of an object class. The object class includes at least one data member and at least one method. The hardware client transmits the object-oriented message generated to the object-oriented memory device via a hardware communications interface. The hardware communications interface couples the hardware client to the object-oriented memory device. The object is instantiated or to-be instantiated in at least one physical memory of the object-oriented memory device according to the object class. The at least one method enables the object-oriented memory device to access the at least one data member for the hardware client.

Apparatuses, methods, and systems for enhanced matrix multiplier architecture
11334647 · 2022-05-17 · ·

Systems, methods, and apparatuses relating to enhanced matrix multiplier architecture are described. In one embodiment, an apparatus includes a matrix operations accelerator circuit having a two-dimensional grid of multiplier circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to store each element of the first two-dimensional matrix from the first plurality of registers into a respective clocked flip-flop circuit of each multiplier circuit of the two-dimensional grid of multiplier circuits, store a first element of a first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single first clocked flip-flop circuit coupled to a first proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, store a second element of the first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single second clocked flip-flop circuit coupled to a second proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, multiply the first element of the first proper subset of elements from the single first clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the first proper subset of multiplier circuits to generate a first plurality of resultants, and multiply the second element of the first proper subset of elements from the single second clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the second proper subset of multiplier circuits to generate a second plurality of resultants.

Method and apparatus for dual issue multiply instructions

A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.

LOOK-UP TABLE INITIALIZE

A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.

DUAL BRANCH FORMAT
20220129278 · 2022-04-28 ·

In one embodiment, a branch processing method, comprising: assigning plural branch instructions for a given clock cycle to primary branch information and secondary branch information; routing the primary branch information along a first path having adder logic and the secondary branch information along a second path having no adder logic; and writing the primary branch information including a displacement branch target address to a branch order table (BOT) and the secondary branch information without a target address to the BOT.

TRANSIENT SIDE-CHANNEL AWARE ARCHITECTURE FOR CRYPTOGRAPHIC COMPUTING

In one embodiment, a processor includes circuitry to decode an instruction referencing an encoded data pointer that includes a set of plaintext linear address bits and a set of encrypted linear address bits. The processor also includes circuitry to perform a speculative lookup in a translation lookaside buffer (TLB) using the plaintext linear address bits to obtain physical address, buffer a set of architectural predictor state values based on the speculative TLB lookup, and speculatively execute the instruction using the physical address obtained from the speculative TLB lookup. The processor also includes circuitry to determine whether the speculative TLB lookup was correct and update a set of architectural predictor state values of the core using the buffered architectural predictor state values based on a determination that the speculative TLB lookup was correct.

PROCESSOR USING TARGET INSTRUCTIONS
20230305992 · 2023-09-28 ·

Various example embodiments for supporting processor capabilities are presented herein. Various example embodiments for supporting processor capabilities may be configured to provide a processor configured to support execution of a program that is based on an instruction set architecture of the processor, where the program includes a target instruction configured to mark a beginning of an execution sequence of the program, wherein the target instruction is a target of a branch instruction of the program.

Processing unit with mixed precision operations

A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.

Look-up table initialize

A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.

METHOD AND APPARATUS FOR VECTOR SORTING
20210357218 · 2021-11-18 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.