G06F9/30123

MULTIBIT SHIFT INSTRUCTION

An article of manufacture includes a non-transitory machine-readable medium. The medium includes instructions that cause a processor to execute a shift instruction. The shift instruction is to cause a source data in memory to be shifted left or shifted right. The shift instruction is to include a source parameter and a bit size parameter. The processor is to execute the shift instruction through a shift of a first source word of the source data by the bit size parameter to yield a first intermediate word, a shift of a second source word of the source data by the bit size parameter to yield a second intermediate word and a first set of shifted-out bits, and through execution of a logical OR operation on the first intermediate word and the first set of shifted-out bits to yield a first result word.

Thread transition management

A system and process for managing thread execution includes providing two data register sets coupled to a processor and using, by the processor, the two register sets as first-level registers for thread execution. A portion of main memory or cache memory is assigned as second-level registers where the second-level registers serve as registers of at least one of the two data register sets for executing the threads. Data for the threads may be moved between the first-level registers and second-level registers for different modes of thread processing.

INSTRUCTION HANDLING FOR ACCUMULATION OF REGISTER RESULTS IN A MICROPROCESSOR

A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.

Parallel load of mapping containers for database system start and restart operations

Aspects of the current subject matter are directed to an approach in which a parallel load operation of file ID mapping containers is accomplished at start and/or restart of a database system. Parallel load operation of file ID mapping and/or large binary object (LOB) file ID mapping is done among a plurality of scanning engines into a plurality of data buffers that are associated with each of the plurality of scanning engines. Each scanning engine operates on a certain path of a page chain of a page structure including the mapping, causing the page chain to be split among scanning engines to process maps. Contents of the data buffers are pushed to mapping engines via a queue. The mapping engines load the file ID mapping and the LOB file ID mapping into maps for in-system access.

HIERARCHICAL REGISTER FILE AT A GRAPHICS PROCESSING UNIT
20170278213 · 2017-09-28 ·

A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.

Register caching techniques for thread switches
09817664 · 2017-11-14 · ·

Techniques are disclosed relating to register caching techniques for thread switches. In one embodiment, an apparatus includes a register file and caching circuitry. In this embodiment, the register file includes a plurality of registers and the caching circuitry is configured to store information that indicates threads that correspond to data stored in respective ones of the plurality of registers. In this embodiment, the apparatus is configured to store, at a point in time at which a first register of the plurality of registers includes first valid data corresponding to a first thread, second valid data corresponding to a second thread in a second register of the plurality of registers. In some embodiments, the disclosed techniques may reduce context switch latency, reduce pressure on a data cache, and/or allow smaller slices of thread execution, for example.

Processor element, programmable device, and processor element control method
11249753 · 2022-02-15 · ·

In order to provide processor elements and programmable devices, which have little restriction on logical block arrangement and wiring, and which can improve a degree of integration, processor element 1 includes: arithmetic units 11 that perform arithmetic processing on the basis of functions which have implemented instruction sets generated in accordance with programs; registers 12 that store arguments of the functions; bypass switches 13 of the arithmetic units 11; bypass switches 14 of the registers 12; a connection setting unit 16 that switches the connections of function units 20; a multiplexer 17 that switches the input to the connection setting unit 16; a demultiplexer 18 that switches the output destination of the output from the connection setting unit 16; and a selection unit 15 that switches, in accordance with a state, the bypass switches 13, 14, the connection setting unit 16, the multiplexer 17 and the demultiplexer 18.

Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
09766893 · 2017-09-19 · ·

A method for executing instructions using a plurality of virtual cores for a processor. The method includes receiving an incoming instruction sequence using a global front end scheduler, and partitioning the incoming instruction sequence into a plurality of code blocks of instructions. The method further includes generating a plurality of inheritance vectors describing interdependencies between instructions of the code blocks, and allocating the code blocks to a plurality of virtual cores of the processor, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines. The code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors.

SUPER-THREAD PROCESSOR
20220229662 · 2022-07-21 ·

The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation.

NESTED VIRTUAL MACHINE SUPPORT FOR HYPERVISORS OF ENCRYPTED STATE VIRTUAL MACHINES
20210406051 · 2021-12-30 ·

A method includes creating, by a hypervisor executing on a processing device, a first virtual machine nested within a second virtual machine. The method further includes identifying a context of the second virtual machine and providing, to a context of the first virtual machine, a parent context pointer indicating the context of the second virtual machine.