G06F9/462

VIRTUAL PROCESSOR SYSTEM AND METHOD UTILIZING DISCRETE COMPONENT ELEMENTS
20210357240 · 2021-11-18 · ·

A system and method for the dynamic, run-time configuration of logic core register files, and the provision of an associated execution context. The dynamic register files as well as the associated execution context information are software-defined so as to be virtually configured in random-access memory. This virtualization of both the processor execution context and register files enables the size, structure and performance to be specified at run-time and tailored to the specific processing, instructions and data associated with a given processor state or thread, thereby minimizing both the aggregate memory required and the context switching time. In addition, the disclosed system and method provides for processor virtualization which further enhances the flexibility and efficiency.

HIERARCHICAL GENERAL REGISTER FILE (GRF) FOR EXECUTION BLOCK

In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.

Fast thread execution transition
11169837 · 2021-11-09 · ·

Systems and methods for thread execution transition are disclosed. An example system includes a memory and a processor with first and second registers. An application and a supervisor are configured to execute on the processor, which suspends execution of a first thread executing the supervisor. One execution state of the first thread is stored in the first register. The application stores a request in a first shared memory location. The application executes on a second thread and another execution state of the second thread is stored in the second register. The processor suspends execution of the second thread and resumes execution of the first thread. The supervisor retrieves data for the request from the first shared memory location, and processes the data, including storing a result to a second shared memory location. The processor suspends execution of the first thread and resumes execution of the second thread.

METHOD FOR IMPLEMENTING A LINE SPEED INTERCONNECT STRUCTURE
20210342159 · 2021-11-04 · ·

A method and apparatus including a cache controller coupled to a cache memory, wherein the cache controller receives a plurality of cache access requests, performs a pre-sorting of the plurality of cache access requests by a first stage of the cache controller to order the plurality of cache access requests, wherein the first stage functions by performing a presorting and pre-clustering process on the plurality of cache access requests in parallel to map the plurality of cache access requests from a first position to a second position corresponding to ports or banks of a cache memory, performs the combining and splitting of the plurality of cache access request by a second stage of the cache controller, and applies the plurality of cache access requests to the cache memory at line speed.

Save and restore register
11300614 · 2022-04-12 · ·

A save and restore (SR) register system is disclosed. Some embodiments include a first memory state element (MSE), a second MSE, and a control circuit. The first MSE is configured to: clock in a first data value during a normal mode and hold the first data value during a first testing mode; and clock in a first test sequence during a second testing mode. The second MSE is configured to: clock in the first data value during the normal mode; and clock in a second test sequence during the first testing mode. The control circuit configured to: restore the second MSE to the first data value based on an output port of the first MSE after the second MSE clocks in the second test sequence; and restore the first MSE based on an output port of the second MSE after the first MSE clocks in the first test sequence.

METHOD AND SYSTEM FOR EXECUTING CONTEXT SWITCHING BETWEEN MULTIPLE THREADS
20220019433 · 2022-01-20 ·

A context switching system includes a processor and a scheduler. The processor is configured to execute a first thread. A first context associated with the first thread is stored in a register set of the processor. While the first thread is being executed, the scheduler is configured to select a second thread from a set of threads, and receive and store a second context associated with the second thread in a register set of the scheduler. The second thread is to be scheduled for execution after the first thread. The scheduler is further configured to swap the first and second contexts when the execution of the first thread is halted, thereby executing the context switching. Further, the processor is configured to execute the second thread based on the second context. While the second thread is being executed, the first context is stored in the data memory.

Supporting speculative microprocessor instruction execution

Recovering microprocessor logical register values by: partitioning a register mapper by logical register type; providing a plurality of recovery ports; assigning a logical register type to a recovery port; receiving a restore required instruction; and mapping SRB (save and restore buffer) values to the register mapper by logical register type.

Combining states of multiple threads in a multi threaded processor
11113060 · 2021-09-07 · ·

A processing apparatus comprising one or more processing modules, each comprising an execution unit. The one or more processing modules are operable to run a plurality of parallel or concurrent threads, and the processing apparatus further comprises a storage location for storing an aggregated exit state of the plurality of threads. An instruction set of the processing apparatus comprises an exit instruction for inclusion in each of the plurality of threads, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective thread and also causes the individual exit state specified in the operand to contribute to the aggregated exit state.

LOADING APPARATUS AND METHOD FOR CONVOLUTION WITH STRIDE OR DILATION OF 2
20210264560 · 2021-08-26 ·

The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.

PRIORITY DETERMINATION CIRCUIT AND METHOD OF OPERATING THE PRIORITY DETERMINATION CIRCUIT
20210191885 · 2021-06-24 · ·

Provided herein may be a priority determination circuit and a method of operating the priority determination circuit. The priority determination circuit may receive request signals from a plurality of microcontrollers respectively corresponding to the plurality of planes, and output response signals corresponding to the request signals depending on a determined priority.