Patent classifications
G06F15/167
MEMORY DATA MIGRATION METHOD AND APPARATUS
A memory data migration method, apparatus, and system are provided. During memory migration, data is classified into two parts based on a hot and cold degree of the data. Hot data is directly migrated, and cold data is written into a shared storage device shared by memories. When needing to be used in a destination-end memory, the cold data may be read from the shared storage device. This reduces an amount of data that needs to be migrated to the destination-end memory, thereby improving memory migration efficiency.
LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC.
Various examples are directed to systems and methods for performing operations in a reconfigurable compute fabric. A dispatch interface may send a first asynchronous message to a first flow controller of a first synchronous flow. The first asynchronous message may instruct the first flow controller to begin execution of a first-level loop. The first synchronous flow may send a second asynchronous message to a second flow controller of a second synchronous flow. The second asynchronous message may instruct the second flow controller to execute a second-level loop. The first flow controller may receive a third asynchronous message indicating that the second-level loop has completed and that a synchronous flow thread is free for executing a next iteration of the first-level loop.
MULTI-CHIP INTERCONNECTION SYSTEM BASED ON PCIE BUSES
Provided is a multi-chip interconnection system based on Peripheral Component Interconnect Express (PCIE) buses. The system includes: N accelerators, M processors, and M PCIE buses, N and M being positive integers, and M being greater than N. Each accelerator includes at least two endpoints. Each processor includes one root complex. One endpoint and one root complex are connected by means of one PCIE bus, so that the at least two endpoints of each accelerator are connected to at least two processors by means of different PCIE buses.
SUB-QUEUE INSERTION SCHEMES EXECUTABLE BY QUEUE MANAGERS AND RELATED SYSTEMS AND OPERATIONS
Introduced here are insertion schemes in which queues can be branched into one or more sub-queues for more effective management of queuing elements. Often, a computing device will have a primary buffer into which queuing elements are populated for execution by a processor. However, the amount of contiguous memory space allocated for the primary buffer may be fixed. To address this, a queue manager may insert indicators that link to secondary buffers into the primary buffer in order to expand the number of effective entries in the primary buffer.
MECHANISM TO PROVIDE RELIABLE RECEIPT OF EVENT MESSAGES
Devices and techniques for providing receipts for event messages in a processor are described herein. A system includes multiple memory-compute nodes coupled to one another over a scale fabric; a set of registers; and an event manager hardware circuitry to: receive an event message corresponding to an event, and the event associated with an event mode; track a counter value representing a number of received event messages related to the event, the counter value stored in the set of registers; compare the number of received event messages to a trigger value; and in response to the number of received event messages equaling the trigger value: use an atomic operation to reset the counter value in the set of registers while maintaining the event mode; and alert a thread of the event.
Multiple dies hardware processors and methods
- Nevine Nassif ,
- Yen-Cheng Liu ,
- Krishnakanth V. Sistla ,
- Gerald Pasdast ,
- Siva Soumya Eachempati ,
- Tejpal Singh ,
- Ankush Varma ,
- Mahesh K. Kumashikar ,
- Srikanth Nimmagadda ,
- Carleton L. Molnar ,
- Vedaraman Geetha ,
- Jeffrey D. Chamberlain ,
- William R. Halleck ,
- George Z. Chrysos ,
- John R. Ayers ,
- Dheeraj R. Subbareddy
Methods and apparatuses relating to hardware processors with multiple interconnected dies are described. In one embodiment, a hardware processor includes a plurality of physically separate dies, and an interconnect to electrically couple the plurality of physically separate dies together. In another embodiment, a method to create a hardware processor includes providing a plurality of physically separate dies, and electrically coupling the plurality of physically separate dies together with an interconnect.
Multiple dies hardware processors and methods
- Nevine Nassif ,
- Yen-Cheng Liu ,
- Krishnakanth V. Sistla ,
- Gerald Pasdast ,
- Siva Soumya Eachempati ,
- Tejpal Singh ,
- Ankush Varma ,
- Mahesh K. Kumashikar ,
- Srikanth Nimmagadda ,
- Carleton L. Molnar ,
- Vedaraman Geetha ,
- Jeffrey D. Chamberlain ,
- William R. Halleck ,
- George Z. Chrysos ,
- John R. Ayers ,
- Dheeraj R. Subbareddy
Methods and apparatuses relating to hardware processors with multiple interconnected dies are described. In one embodiment, a hardware processor includes a plurality of physically separate dies, and an interconnect to electrically couple the plurality of physically separate dies together. In another embodiment, a method to create a hardware processor includes providing a plurality of physically separate dies, and electrically coupling the plurality of physically separate dies together with an interconnect.
Processor device for executing SIMD instructions
In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.
SYSTEMS AND METHODS FOR EFFICIENT DATA BUFFERING
In one embodiment, one or more control units may store a position tracker associated with a first window of memory blocks and allow a first processing unit to write data within the first window. The control units may receive, from a second processing unit, a request for reading data with a memory-reading address, compare the memory-reading address to a first starting address of the first window, and prevent the second processing unit from reading the data when the memory-reading address is greater than or equal to the first starting address of the first window. The control units may store, when the data writing process is complete, an updated position tracker of a second window of memory blocks and allow the second processing unit to read the data based on a determination that the memory-reading address is less than a second starting address of the second window.
System and method for providing additional functionality to existing software in an integrated manner
An improved system and method are disclosed for improving functionality in software applications. In one example, the method includes a computing entity having a network interface, a processor, and a memory configured to store a plurality of instructions. The instructions include instructions for a superblock application having instructions for a function block included therein. The function block is configured to provide functions that are accessible to the superblock application via an application programming interface (API). The functions are provided within the superblock application itself and are accessible within the superblock application without switching context to another application on the computing entity.