G06F9/30138

METHOD OF OPTIMIZING SCALAR REGISTER ALLOCATION AND A SYSTEM THEREOF

The present disclosure relates to a system and a method of optimizing scalar register allocation by a processor. The method comprises receiving an intermediate code and information about one or more available physical registers in a memory of the processor, as input. The method further comprises allocating one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register. The method also comprises mapping one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. The method further comprises identifying a plurality of scalar variables from the input intermediate code, and dynamically assigning the one or more available physical registers to the identified scalar variables using the one or more register classes.

Dynamic update of the number of architected registers assigned to software threads using spill counts

A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.

WRITE OUT STAGE GENERATED BOUNDING VOLUMES
20220076479 · 2022-03-10 · ·

Systems, apparatuses and methods may provide for technology that optimizes tiled rendering for workloads in a graphics pipeline including tessellation and use of a geometry shader. More particularly, systems, apparatuses and methods may provide a way to generate, by a write out fixed-function stage, one or more bounding volumes based on geometry data, as inputs to one or more stages of the graphics pipeline. The systems, apparatuses and methods may compute multiple bounding volumes in parallel, and improve the gamer experience, and enable photorealistic renderings at full speed, (e.g., such as human skin and facial expressions) that render three-dimensional (3D) action more realistically.

RUNTIME CONFIGURABLE REGISTER FILES FOR ARTIFICIAL INTELLIGENCE WORKLOADS

There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.

Write power optimization for hardware employing pipe-based duplicate register files

Methods and systems for controlling writing to register files in a processing system having at least two execution pipelines are provided. Aspects include obtaining a micro operation for execution by an execution unit of a first pipeline in the processing system, wherein the micro operation includes writing data to a register file. Aspects also include determining whether the data will be accessed by an execution unit of a second pipeline in the processing system. Based on a determination that the data will only be accessed by the execution unit of the first pipeline, aspects include blocking writing of the data to a register file of the second pipeline.

Virtualized multicore systems with extended instruction heterogeneity
11106623 · 2021-08-31 · ·

A system on a chip may include a plurality of data plane processor cores sharing a common instruction set architecture. At least one of the data plane processor cores is specialized to perform a particular function via extensions to the otherwise common instruction set architecture. Such systems on a chip may have reduced physical complexity, cost, and time-to-market, and may provide improvements in core utilization and reductions in system power consumption.

Block-based processor core composition register

Systems, apparatuses, and methods related to a block-based processor core composition register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include one or more sharable resources and a programmable composition control register. The programmable composition control register can be used to configure which resources of the one or more sharable resources are shared with other processor cores of the plurality of processor cores.

Write out stage generated bounding volumes
11094102 · 2021-08-17 · ·

Systems, apparatuses and methods may provide for technology that optimizes tiled rendering for workloads in a graphics pipeline including tessellation and use of a geometry shader. More particularly, systems, apparatuses and methods may provide a way to generate, by a write out fixed-function stage, one or more bounding volumes based on geometry data, as inputs to one or more stages of the graphics pipeline. The systems, apparatuses and methods may compute multiple bounding volumes in parallel, and improve the gamer experience, and enable photorealistic renderings at full speed, (e.g., such as human skin and facial expressions) that render three-dimensional (3D) action more realistically.

Virtualization of process address space identifiers for scalable virtualization of input/output devices

A processing device comprises an address translation circuit to intercept a work request from an I/O device. The work request comprises a first ASID to map to a work queue. A second ASID of a host is allocated for the first ASID based on the work queue. The second ASID is allocated to at least one of: an ASID register for a dedicated work queue (DWQ) or an ASID translation table for a shared work queue (SWQ). Responsive to receiving a work submission from the SVM client to the I/O device, the first ASID of the application container is translated to the second ASID of the host machine for submission to the I/O device using at least one of: the ASID register for the DWQ or the ASID translation table for the SWQ based on the work queue associated with the I/O device.

PORTIONS OF CONFIGURATION STATE REGISTERS IN-MEMORY
20210224000 · 2021-07-22 ·

Portions of configuration state registers in-memory. An instruction is obtained, and a determination is made that the instruction accesses a configuration state register. A portion of the configuration state register is in-memory and another portion of the configuration state register is in-processor. Processing associated with the configuration state register is performed. The performing processing is based on a type of access and whether the portion or the other portion is being accessed.