Memory-fabric-based processor context switching system
11003488 ยท 2021-05-11
Assignee
Inventors
- Shyamkumar T. Iyer (Austin, TX, US)
- William Price Dawkins (Lakeway, TX, US)
- Kurtis John Bowman (Austin, TX, US)
- Jimmy Doyle Pike (Georgetown, TX, US)
Cpc classification
G06F3/0646
PHYSICS
G06F3/0607
PHYSICS
International classification
G06F13/12
PHYSICS
G06F15/173
PHYSICS
G06F9/30
PHYSICS
Abstract
A memory-fabric-based processor context switching system includes server devices coupled to a memory fabric. A first processing system in a first server device receives a request to move a process it is executing and, in response, copies first processing system context values to its first local memory system in the first server device, and generates a first data mover instruction that causes a first data mover device in the first server device to transmit the first processing system context values from the first local memory system to the memory fabric. A second processing system in a second server device generates a second data mover instruction that causes a second data mover device in the second server device to retrieve the first processing system context values from the memory fabric and provide the first processing system context values in a second local memory system included in the second server device.
Claims
1. A memory-fabric-based processor context switching system, comprising: a memory fabric; a first server device that is coupled to the memory fabric and that includes: a first local memory system; a first processing system that is coupled to the first local memory system and that is configured to: receive a request to move a process executing on the first processing system and, in response: copy first processing system context values from respective processing system context value storage elements to the first local memory system; and generate, subsequent to the copying the first processing system context values from the respective processing system context value storage elements to the first local memory system, a first data mover instruction to move the first processing system context values to the memory fabric, wherein the first data mover instruction includes a push primitive instruction provided in a first Instruction Set Architecture (ISA) utilized by the first processing system; and a first data mover device that is configured to receive the first data mover instruction generated by the first processing system and, in response, move the first processing system context values from the first local memory system to the memory fabric, wherein the push primitive instruction included in the first data mover instruction is a cache flush instruction that causes the first data mover device to move the first processing system context values from the first local memory system to the memory fabric; and a second server device that is coupled to the memory fabric and that includes: a second local memory system; a second processing system that is coupled to the second local memory system and that is configured to generate a second data mover instruction to retrieve the first processing system context values from the memory fabric, wherein the second data mover instruction includes a pop primitive instruction provided in a second ISA utilized by the second processing system; and a second data mover device that is configured to receive the second data mover instruction generated by the second processing system and, in response, retrieve the first processing system context values from the memory fabric and provide the first processing system context values in the second local memory system.
2. The system of claim 1, wherein the second processing system is configured to: provide the first processing system context values from the second local memory system in the second processing system; and execute, using the first processing system context values provided in the second processing system, the process.
3. The system of claim 1, wherein the first data mover device is included in the first processing system, and wherein the second data mover device is included in the second processing system.
4. The system of claim 1, wherein the first data mover device is configured to move, in response to receiving the first data mover instruction, process data associated with the execution of the process to the memory fabric, and wherein the second data mover device is configured to retrieve, in response to receiving the second data mover instruction, the process data from the memory fabric and provide the process data in the second local memory system.
5. An Information Handling System (IHS), comprising: a memory system; a processing system that is coupled to the memory system and that is configured to receive a first request to move a first process executing on the processing system and, in response: copy first processing system context values from respective processing system context value storage elements to the memory system; and generate, subsequent to the copying the first processing system context values from the respective processing system context value storage elements to the first local memory system, a first data mover instruction to move the first processing system context values to a memory fabric, wherein the first data mover instruction includes a push primitive instruction provided in a first Instruction Set Architecture (ISA) utilized by the first processing system; and a data mover device that is configured to receive the first data mover instruction generated by the processing system and, in response, move the first processing system context values from the memory system to the memory fabric, wherein the push primitive instruction included in the first data mover instruction is a cache flush instruction that causes the data mover device to move the first processing system context values from the memory system to the memory fabric.
6. The IHS of claim 5, wherein processing system is configured to: generate a second data mover instruction to retrieve second processing system context values from the memory fabric, wherein the data mover device is configured to: receive the second data mover instruction generated by the processing system and, in response, retrieve the second processing system context values from the memory fabric and provide the second processing system context values in the memory system.
7. The IHS of claim 6, wherein the processing system is configured to: provide the second processing system context values from the memory system in the processing system; execute, using the second processing system context values provided in the processing system, a second process.
8. The IHS of claim 5, wherein the data mover device is included in the processing system.
9. The IHS of claim 5, wherein the data mover device is configured to move, in response to receiving the first data mover instruction, first process data associated with the execution of the first process to the memory fabric.
10. A method for switching processor context via a memory fabric, comprising: receiving, by a processing system that is included in a server device, a first request to move a first process executing on the processing system; copying, by the processing system in response to receive the first request, first processing system context values from respective processing system context value storage elements to the memory system; and generating, by the processing system in response to receive the first request and subsequent to the copying the first processing system context values from the respective processing system context value storage elements to the first local memory system, a first data mover instruction to transmit the first processing system context values to a memory fabric, wherein the first data mover instruction includes a push primitive instruction provided in a first Instruction Set Architecture (ISA) utilized by the first processing system; receiving, by a data mover device that is included in the server device, the first data mover instruction generated by the processing system; and moving, by the data mover device in response to receiving the first data mover instruction, the first processing system context values from the memory system to the memory fabric, wherein the push primitive instruction included in the first data mover instruction is a cache flush instruction that causes the data mover device to move the first processing system context values from the memory system to the memory fabric.
11. The method of claim 10, further comprising: generating, by the processing system, a second data mover instruction to retrieve second processing system context values from the memory fabric; receiving, by the data mover device, the second data mover instruction generated by the processing system; and retrieving, by the data mover device in response to receiving the second data mover instruction, the second processing system context values from the memory fabric and providing the second processing system context values in the memory system.
12. The method of claim 11, further comprising: providing, by the processing system, the second processing system context values from the memory system in the processing system; executing, by the processing system using the second processing system context values provided in the processing system, a second process.
13. The method of claim 10, wherein the data mover device is included in the processing system.
14. The method of claim 10, further comprising: moving, by the data mover device in response to receiving the first data mover instruction, first process data associated with the execution of the first process to the memory fabric.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, calculate, determine, classify, process, transmit, receive, retrieve, originate, switch, store, display, communicate, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer (e.g., desktop or laptop), tablet computer, mobile device (e.g., personal digital assistant (PDA) or smart phone), server (e.g., blade server or rack server), a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.
(12) In one embodiment, IHS 100,
(13) Referring now to
(14) In the Illustrated embodiment, the server devices 202, 204, and 206 are each coupled to a switch device 208 (e.g., via a network that may be provided in the networked system 200 by, for example, a Local Area Network (LAN), the Internet, and/or any other network (or combination of networks) that would be apparent to one of skill in the art in possession of the present disclosure.) In an embodiment, the switch device 208 may be provided by the IHS 100 discussed above with reference to
(15) For example, the network-connected memory fabric may be a Generation Z (Gen-Z) memory fabric created and commercialized by the Gen-Z consortium, and which one of skill in the art in possession of the present disclosure will recognize extends the processing system/memory system byte-addressable load/store model to the entire networked system 200 by decoupling the processing system/compute functionality in the server devices 202, 204, and 206 from the memory system/media functionality in the memory system 210, allowing processing systems and memory system to act as peers that communicate using the same language via simplified, high performance, low latency communication paths that do not incur the translation penalties and software overhead in conventional systems, thus eliminating bottlenecks and increasing efficiency via the unification of communication paths and simplification of software required for processing system/memory system communications. However, one of skill in the art in possession of the present disclosure will recognize that other types of memory fabrics will fall within the scope of the present disclosure as well. Furthermore, the server devices 202, 204, and 206 are illustrated as each coupled to the memory system 210 via the switch device 208, one of skill in the art in possession of the present disclosure will recognize that in other embodiments the switch device 208 and the memory system 210 may be provided in a server device to enable the functionality described below while remaining within the scope of the present disclosure as well. As such, while a specific networked system 200 has been illustrated and described, one of skill in the art in possession of the present disclosure will recognize that the memory-fabric-based processor context switching system of the present disclosure may utilize a variety of other components and component configurations while remaining within the scope of the present disclosure as well.
(16) Referring now to
(17) As discussed below, the CPU device 306a may include a plurality of registers and/or any other processing system context value storage element that would be recognize by one of skill in the art in possession of the present disclosure as providing for the storage of processing system context values utilized in providing a process, thread, and/or other processing system result. In a specific example, the CPU device registers/context value storage elements may include an instruction pointer (IP) context value storage element; general purposes registers that include an accumulator (AX) register, a base (BX) register, a counter (CX) register, a data (DX) register, a stack pointer (SP) register, a stack base pointer (BP) register, a source index (SI) register, a destination index (DI) register; segment registers that include a stack segment (SS) register, a code segment (CS) register, a data segment (DS) register, an extra segment (ES) register, an F segment (FS) register, a G segment (GS) register; a flags (EFLAGS) register; and/or a variety of other registers/context value storage elements that would be apparent to one of skill in the art in possession of the present disclosure. Furthermore, one of skill in the art in possession of the present disclosure will recognize that the examples of registers/context value storage elements discussed above that may store the processing system context values utilized by the present disclosure are specific to CPU devices, and that the GPU devices, accelerator devices, and/or other processing devices discussed above utilize different registers/context value storage elements that store different context values, and the use of those different types of processing devices with their different registers/context value storage elements and different context values will fall within the scope of the present disclosure as well.
(18) In the illustrated embodiment, the processing system 306 also includes a data mover device 306b. For example, the data mover device 306b may be provided by a data mover processing system (not illustrated, but which may include the processor 102 discussed above with reference to
(19) In some embodiments, in addition to the processing system context movement operations discussed below, the data mover device 306b may be configured to perform read, write, copy, and/or other data movement operations for the processing system 306 (e.g., to and from its local memory system) in order to, for example, relieve the processing system 306 from having to use processing cycles to perform those operations. However, one of skill in the art in possession of the present disclosure will recognize that the functionality of the data mover device 306b discussed below may be provided by itself and/or with other functionality while remaining within the scope of the present disclosure as well. While a few examples of data mover device implementations and functionality have been described, one of skill in the art in possession of the present disclosure will recognize that the a variety of different functionality for the data mover device 306 may be enabled in a variety of manners that will fall within the scope of the present disclosure as well.
(20) In the illustrated embodiment, the chassis 302 also houses a memory system 308 (which may include the memory 114 discussed above with reference to
(21) Referring now to
(22) The method 400 begins at block 402 where a first processing system in a first server device receives a request to move a process. In an embodiment, at block 402, any of the server devices 202-206 may be providing/executing a process, thread, or other similar processing system action (referred to below as a process) and may receive a request to move that process such that it may be provided by another processing system. With reference to
(23) With reference to
(24) As such, at block 402, the processing system 306.sub.1 in the server device 202 may receive a request to move a process currently being executed by the CPU device 306a.sub.1 in that processing system 306.sub.1. For example, the request to move the process received by the processing system 306.sub.1 at block 402 may include a kernel-based context switch request that may, for example, result from a need by the process being executed by the CPU device 306a.sub.1 for privileges that are not available when that process is being provided by a first operating system kernel via its execution by the CPU device 306a.sub.1, but that will be available when the process is provided by a second operating system kernel via its execution by the CPU device 306a.sub.2 in the processing system 306.sub.2 included in the server device 204. In another example, the request to move the process received by the processing system 306.sub.1 at block 402 may include a thread-container context switch request that may, for example, provide for the moving of a thread context between containers (e.g., from one thread context to another thread context in a logical server device that shares the same operating system kernel.)
(25) In another example, the request to move the process received by the processing system 306.sub.1 at block 402 may include a thread-Virtual Machine (VM) context switch request that may, for example, provide for the moving of a thread context between virtual machines (e.g., from a first thread context in a first virtual machine to a second thread context in a second virtual machine, each running in a virtualization server device with a common hypervisor.) In yet another example, the request to move the process received by the processing system 306.sub.1 at block 402 may include a Virtual Machine (VM)-server device context switch request that may, for example, provide for the moving of a virtual machine between server devices (e.g., moving a virtual machine context from a first server device to a second server device, sometimes referred to as live migration.) However, while several specific examples have been provided, one of skill in the art in possession of the present disclosure will recognize that the request to move the process received at block 402 may include a variety of requests provided for a variety of process movement requirements that would be apparent to one of skill in the art in possession of the present disclosure.
(26) The method 400 then proceeds to block 404 where the first processing system in the first server device copies first processing system context values to a first memory system in the first server device. As illustrated in
(27) The method 400 then proceeds to block 406 where the first processing system in the first server device generates a first data mover instruction to transmit the first processing system context values to a memory fabric. In an embodiment, at block 406, the processing system 306.sub.1 may operate to generate a first data mover instruction for the data mover device 306b.sub.1 that includes instructions to transmit the first processing system context values 500a-500e, which were copied to the memory system 308.sub.1, to the memory system 210 that provides the memory fabric in the networked system 200. In a specific example, the first data mover instruction may be generated by the CPU device 306a.sub.1 and may include a primitive instruction or other microarchitecture control signal such as, for example, a push primitive instruction that may be provided as an enhancement to an Instruction Set Architecture (ISA) utilized by the processing system 306.sub.1 and the data mover device 306b.sub.1, although one of skill in the art in possession of the present disclosure will recognize that other first data mover instructions will fall within the scope of the present disclosure as well. Furthermore, in some embodiments of block 406, the processing system 306.sub.1 may provide instruction(s) for the data mover device 306b.sub.1 to transmit process data that was utilized by the CPU device 306a.sub.1 in providing the process from the memory system 308.sub.1 to the memory system 210 that provides the memory fabric in the networked system 200. As such, in some embodiments, at block 406 the processing system 306.sub.1 may provide one or more instructions to the data mover device 306b.sub.1 to move process data (sometimes referred to as working data) and processing system context data from the memory system 308.sub.1 to the memory system 210/memory fabric at substantially the same time.
(28) The method then proceeds to block 408 where a first data mover device in the first server device receives the first data mover instruction and transmits the first processing system context values from the first memory system in the first server device to the memory fabric. As illustrated in
(29) Furthermore, as discussed above, in some embodiments of block 408, the data mover device 306b.sub.1 may receive instructions from the processing system 306.sub.1 to transmit process data that was utilized by the CPU device 306a.sub.1 in providing the process from the memory system 308.sub.1 to the memory system 210 that provides the memory fabric in the networked system 200 and, in response, may copy the process data from the memory system 308.sub.1 and then transmit that process data to the switch device 208 for storage in the memory system 210. As such, in some embodiments, at block 408 the data mover device 306b.sub.1 may move process data (sometimes referred to as working data) and processing system context data from the memory system 308.sub.1 to the memory system 210/memory fabric at substantially the same time.
(30) While the processing system 306.sub.1 is described as copying the first processing system context values 500a-500e to the memory system 308.sub.1 at block 404, and the data mover device 306b.sub.1 is discussed as transferring the first processing system context values 500a-500e from the memory system 308.sub.1 to the memory system 210 that provides the memory fabric, in some embodiments, the data mover device 306b.sub.1 may operate to transfer the first processing system context values 500a-500e from the processing system 306.sub.1 directly to the memory system 210 that provides the memory fabric. As such, the processing system 306.sub.1 may receive the request to move the process at block 402, and may generate the first data mover instructions similarly as described with reference to block 406, but with the exception that those first data mover instructions are to transmit the first processing system context values 500a-500e from the processing system 306.sub.1 directly to the memory system 210 that provides the memory fabric. As such, at block 408 the data mover device 306b.sub.1 may copy the first processing system context values 500a-500e from the processing system 306.sub.1 (i.e., from the CPU device registers/context value storage elements), and then transmit those first processing system context values 500a-500e to the switch device 208 for storage in the memory system 210.
(31) The method then proceeds to block 410 where a second processing system in a second server device generates a second data mover instruction to retrieve the first processing system context values from the memory fabric. In an embodiment, at block 410, the processing system 306.sub.2 may operate to generate a second data mover instruction for the data mover device 306b.sub.2 that includes instructions to retrieve the first processing system context values 500a-500e that were provided on the memory system 210 that provides the memory fabric in the networked system 200 at block 408. In some embodiments, the coordination of the first processing system and second processing system may be determined by a higher level job scheduler subsystem, the operations of which one of skill in the art will recognize is akin to an operating system migrating a job to different CPU core, with the pushing and popping of context values akin to a loader program that switches context by pushing state information into the stack memory (a memory fabric in the case of the present disclosure), and popping the context from stack memory on a different processing system to resume the job. In a specific example, the second data mover instruction may be generated by the CPU device 306a.sub.2 and may include a primitive instruction or other microarchitecture control signal such as, for example, a pop primitive instruction that may be provided as an enhancement to an Instruction Set Architecture (ISA) utilized by the processing system 306.sub.2 and the data mover device 306b.sub.2, although one of skill in the art in possession of the present disclosure will recognize that other second data mover instructions will fall within the scope of the present disclosure as well. Furthermore, in some embodiments of block 410, the processing system 306.sub.2 may provide instruction(s) for the data mover device 306b.sub.2 to retrieve process data that was utilized by the CPU device 306a.sub.1 in providing the process and that was provided on the memory system 210 that provides the memory fabric in the networked system 200 at block 408. As such, in some embodiments, at block 410 the processing system 306.sub.2 may provide one or more instructions to the data mover device 306b.sub.2 to retrieve process data (sometimes referred to as working data) and processing system context data from the memory system 210/memory fabric at substantially the same time.
(32) The method then proceeds to block 412 where a second data mover device in the second server device receives the second data mover instruction and retrieves the first processing system context values from the memory fabric and copies the first processing system context values to a second memory system in the second server device. As illustrated in
(33) Furthermore, as discussed above, in some embodiments of block 412, the data mover device 306b.sub.2 may receive instructions from the processing system 306.sub.2 to retrieve process data that was utilized by the CPU device 306a.sub.1 in providing the process and that was provided in the memory system 210 that provides the memory fabric in the networked system 200 at block 408 and, in response, may retrieve the process data via the switch device 208 from the memory system 210 and store that process data in the memory system 308.sub.2. As such, in some embodiments, at block 412 the data mover device 306b.sub.2 may move process data (sometimes referred to as working data) and processing system context data from the memory system 210/memory fabric to the memory system 308.sub.2 at substantially the same time.
(34) The method then proceeds to block 414 where the second processing system in the second server device retrieves the first processing system context values from the second memory system in the second server device. As illustrated in
(35) One of skill in the art in possession of the present disclosure will recognize that the example discussed above with reference to
(36) One of skill in the art in possession of the present disclosure will recognize that different processing devices may arrive at a common way of working with different types of context values which may involve compiler-based optimizations that utilize specific type of push and pop primitive instructions. For example, variations in push primitive instruction types may include:
(37) Push(PUSHING_CONTEXT_FOR_GPU, context values)
(38) Push(PUSHING_CONTEXT_FOR_FPGA, context values)
(39) Push(PUSHING_CONTEXT_FOR_CPU, context values)
(40) Similarly, variations in pop primitive instruction types may include:
(41) Pop(POPPING_CONTEXT_FROM_CPU, context values)
(42) Pop(POPPING_CONTEXT_FROM_GPU, context values)
(43) Pop(POPPING_CONTEXT_FROM_FPGA, context values)
(44) As will be appreciated by one of skill in the art in possession of the present disclosure, push context for CPU context may include pushing the register context and memory context. Similarly, pushing for gpu context may include translating current context values for a GPU kernel context (which is essentially equivalent to GPU kernel code and is optimized to run on a GPU (e.g.: using CUDA or OpenCL). Furthermore, in order to resume work in the GPU context, the CPU context values may need to be transformed, which may be assisted by the data mover device in combination with a source-to-source compiler.
(45) While the data mover device 306b.sub.2 is discussed as transferring the first processing system context values 500a-500e from the memory system 210 that provides the memory fabric to memory system 308.sub.2 at block 412, and the processing system 306.sub.2 is described as retrieving the first processing system context value 500a-500e from the memory system 308.sub.2 for use in the processing system 306.sub.2 at block 414, in some embodiments the data mover device 306b.sub.2 may operate to transfer the first processing system context values 500a-500e from the memory system 210 that provides the memory fabric directly to the processing system 306.sub.2. As such, the processing system 306.sub.2 may generate the second data mover instructions similarly as described above with reference to block 410, with the exception that those second data mover instructions are to retrieve the first processing system context values 500a-500e from the memory system 210 that provides the memory fabric and provide them directly to the processing system 306.sub.2. As such, at block 412 the data mover device 306b.sub.2 may retrieve the first processing system context values 500a-500e via the switch device 208 and from the memory system 210 that provides the memory fabric, and then provide those first processing system context values 500a-500e directly in the processing system 306.sub.2 (i.e., in the CPU device registers/context value storage elements) similarly as described above as being performed by the processing system 306.sub.2 at block 414.
(46) The method then proceeds to block 416 where the second processing system in the second server device executes the process using the first processing system context values retrieved from the second memory system in the second server device. In an embodiment, at block 416, the CPU device 306a.sub.2 may operate to execute the process that was being executed by the CPU device 306a.sub.1 at or prior to block 402 of the method 400. For example, the CPU device 306a.sub.2 may access process data that provides for the execution of the process, and utilize the instruction pointer (IP) context value (which was included in the first processing system context values 500a-500e) in its instruction pointer (IP) context value storage element in order to return to a portion of the process data (e.g., the line of code at which the process was stopped at block 402) and resume the execution of the process according to any or all of the first processing system context values 500a-500e included in its context value registers. As discussed above, in some embodiments the accessing of the process data by the CPU device 306a.sub.2 at block 416 may include accessing process data that was copied to the memory system 308.sub.2 from the memory system 210 by the data mover device 306.sub.2. However, one of skill in the art in possession of the present disclosure will recognize that the process data may be made accessible to the CPU device 306a.sub.2 via a variety of techniques that would be apparent to one of skill in the art in possession of the present disclosure.
(47) Thus, systems and methods have been described that provide for the switching of CPU context between CPUs by a data mover device and via a memory fabric. For example, a first CPU may be executing a process, and a request to move the process to a second CPU may be received. In response, the first CPU may copy its first CPU context to a first local memory system provided for the first CPU, and generate a first data mover instruction to transmit the first CPU context to a Gen-Z memory fabric. A first data mover device (included in the first CPU, coupled to the first CPU, etc.) may receive the first data mover instruction and, in response, may transmit the first CPU context from the first memory system to the Gen-Z memory fabric. A second CPU may then generate a second data mover instruction to retrieve the first CPU context from the memory fabric, and a second data mover device (included in the second CPU, coupled to the second CPU, etc.) may receive the second data mover instruction and, in response, may retrieve the first CPU context from the memory fabric and copy the first CPU context to a second memory system provided for the second CPU. The second CPU may then retrieve the first CPU context from the second memory system, and use the first CPU context to execute the process. As such, CPU context switching is provided via a memory fabric by data mover device(s) that offload many of the CPI context switching operations from the CPUs, thus providing for improved CPU context switching.
(48) Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. Accordingly, it is appropriate that the appended claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.