G06F9/30123

Speculative buffer for speculative memory accesses with entries tagged with execution context identifiers
11210102 · 2021-12-28 · ·

An apparatus comprises processing circuitry to execute instructions from one or more of a plurality of execution contexts each associated with a respective execution context identifier; a cache; and a speculative buffer. Control circuitry controls allocation of data to the cache and the speculative buffer. A speculative entry, for which allocation is caused by a speculative memory access associated with a given execution context, is allocated to the speculative buffer instead of to the cache while the speculatively executed memory access instruction remains speculative. The speculative entry specifies, as a tagged execution context identifier, the execution context identifier associated with the given execution context. Presence of the speculative entry in the speculative buffer is prevented from being observable to execution contexts other than the execution context identified by the tagged execution context identifier.

Coprocessor context priority

A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.

Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
11204769 · 2021-12-21 · ·

A global front end scheduler to schedule instruction sequences to a plurality of virtual cores implemented via a plurality of partitionable engines. The global front end scheduler includes a thread allocation array to store a set of allocation thread pointers to point to a set of buckets in a bucket buffer in which execution blocks for respective threads are placed, a bucket buffer to provide a matrix of buckets, the bucket buffer including storage for the execution blocks, and a bucket retirement array to store a set of retirement thread pointers that track a next execution block to retire for a thread.

Deferred GPR allocation for texture/load instruction block

A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.

VIRTUAL NETWORK PRE-ARBITRATION FOR DEADLOCK AVOIDANCE AND ENHANCED PERFORMANCE
20210382822 · 2021-12-09 ·

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request.

Instruction issue according to in-order or out-of-order execution modes

Apparatus for processing data (2) includes issue circuitry (22) for issuing program instructions (processing operations) to execute either within real time execution circuitry (32) or non real time execution circuitry (24, 26, 28, 30). Registers within a register file (18) are marked as non real time dependent registers if they are allocated to store a data value which is to be written by an uncompleted program instruction issued to the non real time execution circuitry and not yet completed. Issue policy control circuitry (42) responds to a trigger event to enter a real time issue policy mode to control the issue circuitry (22) to issue candidate processing operations (such as program instruction, micro-operations, architecturally triggered processing operations etc.) to one of the non real time execution circuitry or the real time execution circuitry in dependence upon whether that candidate processing operation reads a register marked as a non real time dependent register.

PARALLEL LOAD OF MAPPING CONTAINERS FOR DATABASE SYSTEM START AND RESTART OPERATIONS

Aspects of the current subject matter are directed to an approach in which a parallel load operation of file ID mapping containers is accomplished at start and/or restart of a database system. Parallel load operation of file ID mapping and/or large binary object (LOB) file ID mapping is done among a plurality of scanning engines into a plurality of data buffers that are associated with each of the plurality of scanning engines. Each scanning engine operates on a certain path of a page chain of a page structure including the mapping, causing the page chain to be split among scanning engines to process maps. Contents of the data buffers are pushed to mapping engines via a queue. The mapping engines load the file ID mapping and the LOB file ID mapping into maps for in-system access.

APPARATUS FOR VIRTUALIZED REGISTERS AND METHOD AND COMPUTER PROGRAM PRODUCT FOR ACCESSING TO THE SAME

The invention relates to an apparatus for virtualized registers. The apparatus includes register space, group selectors, and a block selector. The register space is divided into physical blocks, each of which includes register groups, and each register group contains registers. Each group selector is coupled to a portion of the register groups in a corresponding physical block, and is arranged operably to enable one of the portion of the register groups in the corresponding physical block in accordance with a first control signal corresponding to a virtual device, or a function performed by the virtual device. The block selector, coupled to the group selectors, is arranged operably to enable one of the group selectors in accordance with a second control signal corresponding to a virtual machine instruction. The virtual machine instruction is translated into an operation of the virtual device.

AUTONOMOUS AND EXTENSIBLE RESOURCE CONTROL BASED ON SOFTWARE PRIORITY HINT

Embodiments of apparatuses, methods, and systems for resource control based on software priority are described. In embodiments, an apparatus includes resource sharing hardware and multiple cores. The resource sharing hardware is to share the shared resource among the cores. A first core includes first execution circuitry to execute multiple threads. The first core also includes registers programmable by software. A first register is to store a first identifier of a first thread and a first priority tag to indicate a first priority of the first thread relative to a second priority of a second thread. A second register to store a second identifier of the second thread and a second priority tag to indicate the second priority of the second thread relative to the first priority of the first thread. The resource sharing hardware is to use the first priority and the second priority to control access to the shared resource by the first thread and the second thread.

SOFTWARE VISIBLE AND CONTROLLABLE LOCK-STEPPING WITH CONFIGURABLE LOGICAL PROCESSOR GRANULARITIES

A processor is described. The processor includes model specific register space that is visible to software above a BIOS level. The model specific register space is to specify a granularity of a processing entity of a lock-step group. The processor also includes logic circuitry to support dynamic entry/exit of the lock-step group's processing entities to/from lock-step mode including: i) termination of lock-step execution by the processing entities before the program code to be executed in lock-step is fully executed; and, ii) as part of the exit from the lock-step mode, restoration of a state of a shadow processing entity of the processing entities as the state existed before the shadow processing entity entered the lock-step mode and began lock-step execution of the program code.