G06F2212/652

VIRTUAL MEMORY WITH DYNAMIC SEGMENTATION FOR MULTI-TENANT FPGAS

At least one example embodiment provides a programmable logic device comprising: a plurality of reconfigurable slots programmed to execute functions requested by a plurality of users, the plurality of reconfigurable slots allocated among the plurality of users; a memory divided into a plurality of memory segments, the plurality of memory segments allocated among the plurality of reconfigurable slots; and a memory management circuit configured to dynamically adjust the plurality of memory segments based on at least one of activity or memory requirements of the plurality of reconfigurable slots.

Fine-grained access memory controller

Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.

Systems and methods for improving cache efficiency and utilization

Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

TECHNOLOGIES FOR EXECUTE ONLY TRANSACTIONAL MEMORY
20220382684 · 2022-12-01 · ·

Technologies for execute only transactional memory include a computing device with a processor and a memory. The processor includes an instruction translation lookaside buffer (iTLB) and a data translation lookaside buffer (dTLB). In response to a page miss, the processor determines whether a page physical address is within an execute only transactional (XOT) range of the memory. If within the XOT range, the processor may populate the iTLB with the page physical address and prevent the dTLB from being populated with the page physical address. In response to an asynchronous change of control flow such as an interrupt, the processor determines whether a last iTLB translation is within the XOT range. If within the XOT range, the processor clears or otherwise secures the processor register state. The processor ensures that an XOT range starts execution at an authorized entry point. Other embodiments are described and claimed.

Burst translation look-aside buffer

A comparand that includes a virtual address is received. Upon determining a match of the comparand to a burst entry tag, a candidate matching translation data unit is selected. The selecting is from a plurality of translation data units associated with the burst entry tag, and is based at least in part on at least one bit of the virtual address. Content of the candidate matching translation data unit is compared to at least a portion of the comparand. Upon a match, a hit is generated.

Translation lookaside buffer

Embodiments disclosed pertain to apparatuses, systems, and methods for Translation Lookaside Buffers (TLBs) that support virtualization and multi-threading. Disclosed embodiments pertain to a TLB that includes a content addressable memory (CAM) with variable page size entries and a set associative memory with fixed page size entries. The CAM may include: a first set of logically contiguous entry locations, wherein the first set comprises a plurality of subsets, and each subset comprises logically contiguous entry locations for exclusive use of a corresponding virtual processing element (VPE); and a second set of logically contiguous entry locations, distinct from the first set, where the entry locations in the second set may be shared among available VPEs. The set associative memory may comprise a third set of logically contiguous entry locations shared among the available VPEs distinct from the first and second set of entry locations.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

ADDRESS TRANSLATION CACHE AND SYSTEM INCLUDING THE SAME
20230169013 · 2023-06-01 ·

An address translation cache (ATC) is configured to store translation entries indicating mapping information between a virtual address and a physical address of a memory device. The ATC includes a plurality flexible page group caches, a shared cache and a cache manager. Each flexible page group cache stores translation entries corresponding to a page size allocated to the flexible group cache. The shared cache stores, regardless of page sizes, translation entries that are not stored in the plurality of flexible page group caches. The cache manager allocates a page size to each flexible page group cache, manages cache page information on the page sizes allocated to the plurality of flexible page group caches, and controls the plurality of flexible page group caches and the shared cache based on the cache page information.

CREATING A DYNAMIC ADDRESS TRANSLATION WITH TRANSLATION EXCEPTION QUALIFIERS

An enhanced dynamic address translation facility product is created such that, in one embodiment, a virtual address to be translated and an initial origin address of a translation table of the hierarchy of translation tables are obtained. Dynamic address translation of the virtual address proceeds. In response to a translation interruption having occurred during dynamic address translation, bits are stored in a translation exception qualifier (TXQ) field to indicate that the exception was either a host DAT exception having occurred while running a host program or a host DAT exception having occurred while running a guest program. The TXQ is further capable of indicating that the exception was associated with a host virtual address derived from a guest page frame real address or a guest segment frame absolute address. The TXQ is further capable of indicating that a larger or smaller host frame size is preferred to back a guest frame.