INCREASING GRANULARITY OF DIRTY BIT INFORMATION IN HARDWARE ASSISTED MEMORY MANAGEMENT SYSTEMS
20180004679 · 2018-01-04
Inventors
Cpc classification
International classification
Abstract
In a computer system having virtual machines, one or more unused bits of a guest physical address range are allocated for aliasing so that multiple virtually addressed sub-pages can be mapped to a common memory page. When one bit is allocated for aliasing, dirty bit information can be provided at a granularity that is one-half of a memory page. When M bits are allocated for aliasing, dirty bit information can be provided at a granularity that is 1/(2.sup.M)-th of a memory page.
Claims
1. A data structure used by a memory management unit (MMU) of a computer system having virtual machines instantiated therein, wherein the MMU is configured to execute a first mapping from a guest virtual address space to a guest physical address space and a second mapping from the guest physical address space to a machine memory address space, the data structure comprising: a set of first mapping tables that are hierarchically arranged and define mappings between guest virtual addresses and guest physical addresses; and a set of second mapping tables that are hierarchically arranged and define mappings between guest physical addresses and machine memory addresses, the set of second mapping tables including a root table and a plurality of bottom-level tables, wherein each of a plurality of entries of the bottom-level tables references a machine memory page in common with at least one other entry of the bottom-level tables, and each of the plurality of entries of the bottom-level tables includes a status bit value that indicates whether or not data stored in the machine memory page referenced by said entry is dirty or not, and said status bit values of at least two entries of the bottom-level tables that reference the same machine memory page are different.
2. The data structure of claim 1, wherein said at least two entries of the bottom-level tables that reference the same machine memory page are associated with at least two different guest physical addresses.
3. The data structure of claim 2, wherein the two different guest physical addresses that reference the same memory page include a first guest physical address that references one of the first mapping tables and a second guest physical address that differs from the first guest physical address by at least one bit value.
4. The data structure of claim 3, wherein the first guest physical address references a root table of the first mapping tables.
5. The data structure of claim 3, wherein the first guest physical address references an intermediate-level table of the first mapping tables.
6. The data structure of claim 3, wherein the first guest physical address references a bottom-level table of the first mapping tables.
7. The data structure of claim 2, wherein the two different guest physical addresses that reference the same memory page include a first guest physical address that references a page of data and a second guest physical address that differs from the first guest physical address by at least one bit value.
8. A method of mapping guest virtual addresses to machine memory addresses in a computer system having virtual machines instantiated therein, comprising: receiving a guest virtual address to be mapped; traversing guest page tables using portions of the guest virtual address to obtain a guest physical address corresponding to the guest virtual address; modifying a binary representation of the guest physical address corresponding to the guest virtual address by copying the value of a first bit of the binary representation to a second bit of the binary representation, wherein the second bit is more significant than the first bit; and translating the guest physical address corresponding to the guest virtual address to a machine memory address using the modified binary representation.
9. The method of claim 8, wherein the guest physical address is translated to the machine memory address using nested page tables.
10. The method of claim 8, wherein said traversing includes: generating a guest physical address corresponding to one of the guest page tables; modifying a binary representation of the guest physical address corresponding to one of the guest page tables by copying the value of a first bit of the binary representation to a second bit of the binary representation, wherein the second bit is more significant than the first bit; and traversing the nested page tables to translate the guest physical address corresponding to one of the guest page tables to a machine memory address.
11. The method of claim 10, wherein bit positions of the first and second bits in the modified binary representation of the guest physical address corresponding to the guest virtual address are the same as bit positions of the first and second bits in the modified binary representation of the guest physical address corresponding to one of the guest page tables.
12. The method of claim 8, wherein the binary representation of the guest physical address corresponding to the guest virtual address is further modified by copying the value of one or more additional bits of the binary representation to one or more additional bits of the binary representation that are more significant.
13. The method of claim 8, further comprising: prior to said modifying, verifying that the second bit has a value of zero.
14. The method of claim 8, further comprising: adding a mapping of the guest virtual address to the machine memory address as an entry to a translation lookaside buffer.
15. A method of migrating an executing state of a virtual machine running in a first computer system to a second computer system, the first and second computer systems each having a memory management unit (MMU) that executes a first mapping from a guest virtual address space to a guest physical address space and a second mapping from the guest physical address space to a machine memory address space using the data structure of claim 1, said method comprising: scanning entries of first and second page tables in the set of second mapping tables that reference a common machine memory page; determining that a first section of the common machine memory page is dirty based on the entry of the first page table that references the common machine memory page and determining that a second section of the common machine memory page is not dirty based on the entry of the second page table that references the common machine memory page; and transmitting the first section of the common machine memory page to the second computer system.
16. The method of claim 15, further comprising: stunning the virtual machine; and transmitting all sections of machine memory pages that are dirty to the second computer system.
17. The method of claim 15, wherein the first and second sections of the common machine memory page do not overlap and each has a size that is 1/(2.sup.M) of the size of the common machine memory page, where M is an integer greater than or equal to 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
DETAILED DESCRIPTION
[0022]
[0023] It should be recognized that the various terms, layers and categorizations used to describe the virtualization components in
[0024]
[0025] Page tables depicted in
[0026] The structure of nested page tables shown in
[0027] In one embodiment of the present invention, when aliasing is permitted, values of M bits of the guest physical address are respectively copied into bits [47:47-(M-1)] of the guest physical address. For example, if M is two, the values of bits [11:10] from the guest physical address are copied into bits [47:46]. This copying is performed by MMU 231. Thus, for example, existing x86 processors may be modified to perform this copying when performing memory accesses. This copying creates 2.sup.M aliased physical address regions within the nested page tables. The contents of corresponding entries of bottom-level page tables in multiple aliased physical address regions are configured to reference the same machine memory page (e.g., bottom-level page tables 324i, 324j reference machine memory page 325E in
[0028] In the page table data structure illustrated in
[0029] Embodiments of the present invention described herein employ guest and nested page tables having a 4-level hierarchy and page sizes of 4 KB. It should be understood that the present invention is applicable to page tables having different levels of hierarchy and to different page sizes by monitoring D bits at higher levels in the page table hierarchy.
[0030] In an alternative embodiment, a hardware configuration register, whose bits [47:12] can be set or cleared by hypervisor 210, is provided. For each bit that is set in this bit vector, the corresponding bit of the guest physical address is claimed as an alias bit. So if M bits are set in this configuration register, there are 2.sup.M aliases. The hardware will then copy bits [11:11-(M-1)] into the bit positions of the guest physical address corresponding to the bits that are set in the hardware configuration register, from highest-to-lowest. The bits that are set to 1 in the hardware configuration register need not be contiguous.
[0031]
[0032] The first time through the loop, the guest physical address stored in gCR3 is retrieved at step 421. During subsequent passes through the loop, the guest physical addresses are retrieved from machine memory addresses obtained from nested page walks carried out at step 425. At step 422, bits [47:47-(M-1)] of the retrieved guest physical address are examined for zeroes. If they are all zeroes, the loop continues onto step 424. If one or more of the bits [47:47-(M-1)] are not zero, a page fault is issued at step 423 and the method terminates. At step 424, bit values at bits [11:11-(M-1)] are copied into bits [47:47-(M-1)] to produce the guest physical address to be used for traversing the nested page tables. The nested page tables are traversed at step 425 using the guest physical address produced at step 424 to obtain a machine memory address of the next guest page table to be traversed or the machine memory page corresponding to the guest virtual address to be mapped. If, according to step 426, the guest table page walk is complete (e.g., the page walk shown in
[0033]
[0034] Steps 510 and 512 are carried out to see if the timer that has been set to the backup time interval has lapsed. If the timer has lapsed, hypervisor 210, at step 514, scans all bottom-level nested page tables for entries that have the dirty bit set to one. Then, at step 516, hypervisor performs a diff operation on all machine memory page sections that are indicated as being dirty by entries of bottom-level nested page tables that are dirty. In some cases, the diff operation is also performed on a portion of an adjacent machine memory page section if it is determined that a write operation that caused the machine memory page section to be dirtied may have also dirtied (i.e., spilled over to) the adjacent machine memory page section. For example, referring to
[0035] In the embodiment of the present invention described above, the diff operation is used to minimize the amount of data being transmitted over the network. It should be recognized that other operations that reduce network bandwidth consumption, such as compression and precopy, may be employed in place of the diff operation.
[0036]
[0037] At step 610, all machine memory pages of the VM are transmitted to the destination server. While this is happening, the VM continues to run and some of these machine memory pages become dirtied and D bits in the entries of bottom-level nested page tables corresponding to these machine memory pages will be set to one. At step 612, bottom-level nested page tables are scanned for entries that have the dirty bit set to one. Then, at step 614, the total size of data to be transmitted to the destination server is computed and compared against a threshold. The data to be transmitted includes machine memory page sections referenced by entries in bottom-level nested page tables that have the D bit set to one. In some cases, the data to be transmitted includes a portion of an adjacent machine memory page section if it is determined that a write operation that caused the machine memory page section to be dirtied may have also dirtied the adjacent machine memory page section. If the total size computed at step 614 is not less than the threshold, all dirty machine memory page sections referenced by entries in bottom-level nested page tables and any portions of adjacent machine memory page sections that could have been dirtied are transmitted to the destination server. The method then returns to step 612 to identify machine memory page sections that may have become dirtied while step 615 was being carried out.
[0038] Returning to the decision block at step 614, if the total size computed at step 614 is less than the threshold, the VM is stunned at step 616 and, at step 618, all dirty machine memory page sections referenced by entries in bottom-level nested page tables and any portions of adjacent machine memory page sections that could have been dirtied are transmitted to the destination server. After step 618, the method terminates, and hypervisor 210 can hand over execution control of the VM to the destination server.
[0039] Alternative embodiments of the present invention include a backup method where the diff operation is not performed and entire machine memory page sections are transmitted to the backup machine, and a migration method where the diff operation is performed and only the changed parts of machine memory page sections are transmitted to the destination server In the examples given above, more granular dirty bit information provides savings in computational power in the case where diff operations are performed and only the changed portions are transmitted over the network to the target machine, and provides savings in network bandwidth consumption in the case where diff operations are not performed and machine memory page sections in their entirety are transmitted over the network to the target machine.
[0040] In a further embodiment of the present invention, the conventional component of the MMU that handles write operations that span more than one machine memory page is modified to also handle write operations that span more than one machine memory page section within a single machine memory page. With this modification, a write operation that spans more than one machine memory page section within a single machine memory page is translated into two separate write operations or two separate TLB and MMU interactions, each of which is confined to a single machine memory page section. As a result, a write operation that dirties a machine memory page section and spills over to another machine memory page section across a page section boundary to dirty the adjacent machine memory page section is translated into two separate write operations that cause the dirty bits in the PTEs that reference these two machine memory page sections to be set to 1. In addition, when checking to see if a write operation spans more than one machine memory page section within a single machine memory page, only the first and last bytes of the write operation are checked. It should be recognized that checking the first and last bytes in this manner is valid because write operations are strictly smaller than a page size.
[0041] It should be recognized that an additional benefit of the embodiments of the present invention described herein is that accessed bits acquire the same improvement in granularity that the dirty bits gain. Therefore, further embodiments of the present invention contemplate the use of the more granular accessed bit information in various applications in ways that would be evident to persons of skill in the art.
[0042] The various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities which usually, though not necessarily, take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the invention may be useful machine operations. In addition, one or more embodiments of the invention also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the description provided herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
[0043] The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
[0044] One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system; computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD-ROM (Compact Disc-ROM), a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
[0045] Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
[0046] Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s).