G06F9/3013

Register pressure target function splitting

Provided are embodiments for a method of performing register pressure targeted function splitting. The method can include determining a candidate region of a function, the candidate region comprising variables, and determining a number of available registers in a computing system for allocating the variables of the function. The method can also include grouping the variables in the candidate region into first variables and second variables based at least in part on the number of available registers, and splitting the candidate region of the function into split functions based at least in part on the grouping of the variables. Also provided are embodiments for a computer program product and a system for performing register pressure targeted function splitting.

Providing code sections for matrix of arithmetic logic units in a processor
11687346 · 2023-06-27 · ·

The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.

INSTRUCTION SET ARCHITECTURE FOR A VECTOR COMPUTATIONAL UNIT
20230195458 · 2023-06-22 ·

A microprocessor system comprises a vector computational unit and a control unit. The vector computational unit includes a plurality of processing elements. The control unit is configured to provide at least a single processor instruction to the vector computational unit. The single processor instruction specifies a plurality of component instructions to be executed by the vector computational unit in response to the single processor instruction and each of the plurality of processing elements of the vector computational unit is configured to process different data elements in parallel with other processing elements in response to the single processor instruction.

CIRCUITRY AND METHODS FOR IMPLEMENTING CAPABILITIES USING NARROW REGISTERS
20230195461 · 2023-06-22 ·

Systems, methods, and apparatuses for implementing capabilities using narrow registers are described. In certain examples, a hardware processor core comprises a capability management circuit to check a capability for a memory access request, the capability comprising an address field, a validity field, and a bounds field that is to indicate a lower bound and an upper bound of an address space to which the capability authorizes access; a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction comprising fields to indicate a memory address that stores the capability and a single destination register, and an opcode to indicate that an execution circuit is to load a first proper subset of the capability from the memory address into the single destination register and load a second proper subset of the capability from the memory address into an implicit second destination register; and the execution circuit to execute the decoded single instruction according to the opcode.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

INTEGRATED CIRCUIT COMPRISING A HARDWARE CALCULATOR AND CORRESPONDING CALCULATION METHOD

In an embodiment an integrated circuit includes a hardware calculator configured to calculate in parallel a first output component Y.sub.n−1 of a first rank n−1 and a second output component Y.sub.n of a second rank n which is higher than and consecutive to the first rank, according to the formula: Y.sub.m=Σ.sub.k=0.sup.N−1b.sub.kx.sub.m−k, in a series of operations, wherein the hardware calculator includes a first calculation path dedicated to the first output component Y.sub.n−1, a second calculation path dedicated to the second output component Y.sub.n, wherein, for each operation, a first register is configured to contain a pair of first factors {x.sub.i, x.sub.i−1} corresponding to terms {b.sub.kx.sub.m−k}.sub.[k;k+1].sup.m=n−1 of an operation in the first path, a second register is configured to contain a pair of second factors {b.sub.j, b.sub.j+1} corresponding to terms {b.sub.kx.sub.m−k}.sub.[k;k+1].sup.m=n−1 of the operation in the first path, and a third register is configured to contain a pair of second factors {b.sub.j+2, b.sub.j+3} corresponding to terms {b.sub.kx.sub.m−k}.sub.[k+2;k+3].sup.m=n−1 of the next operation in the first path.

Apparatus and methods for vector operations

Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.

METHOD OF IMPLEMENTING AN ARM64-BIT FLOATING POINT EMULATOR ON A LINUX SYSTEM
20230176869 · 2023-06-08 ·

The present invention provides a method of implementing an ARM64-bit floating point emulator on a Linux system, which includes: running an ARM64-bit instruction on the Linux system; applying an instruction classifier to a first feature code of a machine code indicated by the ARM64-bit instruction to determine whether the ARM64-bit instruction is an ARM64-bit floating point instruction; and, if the ARM64-bit instruction is an ARM64-bit floating point instruction, applying the instruction classifier to a second feature code of the machine code indicated by the ARM64-bit instruction to determine the ARM64-bit floating instruction to be a specific ARM64-bit floating instruction.

Method for identifying at least one function of an operating system kernel
11263065 · 2022-03-01 · ·

A method for identifying a function of an operating system kernel of a virtual machine. The method includes: identifying an initial instruction included in the code of the operating system kernel of the virtual machine, and locating at least one following block of instructions belonging to a function of the operating system kernel of the virtual machine, the following block being situated in a memory area following the initial instruction; locating at least one preceding block of instructions belonging to a function of the operating system kernel, the proceeding block situated in a memory area preceding the initial instruction; identifying a first block and a last block of instructions of the function of the operating system kernel from among the at least one following and preceding blocks, and recording start and end function addresses in association with the function of the operating system kernel.

CONDITIONAL EXECUTION SPECIFICATION OF INSTRUCTIONS USING CONDITIONAL EXTENSION SLOTS IN THE SAME EXECUTE PACKET IN A VLIW PROCESSOR

In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.