Patent classifications
G06F2212/655
Systems and methods for reducing first level cache energy by eliminating cache address tags
Methods and systems which, for example, reduce energy usage in cache memories are described. Cache location information regarding the location of cachelines which are stored in a tracked portion of a memory hierarchy is stored in a cache location table. Address tags are stored with corresponding location information in the cache location table to associate the address tag with the cacheline and its cache location information. When a cacheline is moved to a new location in the memory hierarchy, the cache location table is updated so that the cache location information indicates where the cacheline is located within the memory hierarchy.
Stride prefetching across memory pages
A prefetcher maintains the state of stored prefetch information, such as a prefetch confidence level, when a prefetch would cross a memory page boundary. The maintained prefetch information can be used both to identify whether the stride pattern for a particular sequence of demand requests persists after the memory page boundary has been crossed, and to continue to issue prefetch requests according to the identified pattern. The prefetcher therefore does not have re-identify a stride pattern each time a page boundary is crossed by a sequence of demand requests, thereby improving the efficiency and accuracy of the prefetcher.
PREFETCHING DATA TO REDUCE CACHE MISSES
A first memory request including a first virtual address is received. An entry in memory is accessed. The entry is selected using information associated with the first memory request, and includes at least a portion of a second virtual address (first data) and at least a portion of a third virtual address (second data). The difference between the first data and the second data is compared with differences between a corresponding portion of the first virtual address and the first data and the second data respectively. When a result of the comparison is true, then a fourth virtual address is determined by adding the difference between the first data and the second data to the first virtual address, and then data at the fourth virtual address is prefetched into the cache.
HYPERVISOR DEDUPLICATION PAGE COMPARISON SPEEDUP
A hypervisor deduplcation system includes a memory, a processor in communication with the memory, and a hypervisor executing on the processor. The hypervisor is configured to scan a first page, detect that the first page is an unchanged page, check a first free page hint, and insert the unchanged page into a tree. Responsive to inserting the unchanged page into the tree, the hypervisor compares the unchanged page to other pages in the tree and determine a status of the unchanged page as matching one of the other pages or mismatching the other pages in the tree. Responsive to determining the status of the page as matching another page, the hypervisor deduplicates the unchanged page. Additionally, the hypervisor is configured to scan a second page of the memory, check a second free page hint, deduplicate the second page if the free page hint indicates the page is unused.
Executing load-store operations without address translation hardware per load-store unit port
Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.
Effective address based load store unit in out of order processors
Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
Effective address based load store unit in out of order processors
Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
Handling effective address synonyms in a load-store unit that operates without address translation
Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address being used by a first instruction, the first effective address corresponding to a first real address, creating an effective real table (ERT) entry in an ERT, the ERT entry mapping the first effective address to the first real address. Further, the execution includes in response to determining an effective address synonym used by a second instruction, the effective address synonym being a second effective address that is also corresponding to said first real address: creating a synonym detection table (SDT) entry in an SDT, wherein the SDT entry maps the second effective address to the ERT entry, and relaunching the second instruction by replacing the second effective address in the second instruction with the first effective address.
Handling effective address synonyms in a load-store unit that operates without address translation
Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address being used by a first instruction, the first effective address corresponding to a first real address, creating an effective real table (ERT) entry in an ERT, the ERT entry mapping the first effective address to the first real address. Further, the execution includes in response to determining an effective address synonym used by a second instruction, the effective address synonym being a second effective address that is also corresponding to said first real address: creating a synonym detection table (SDT) entry in an SDT, wherein the SDT entry maps the second effective address to the ERT entry, and relaunching the second instruction by replacing the second effective address in the second instruction with the first effective address.
Filtering of redundantly scheduled write passes
Improving access to a cache by a processing unit. One or more previous requests to access data from a cache are stored. A current request to access data from the cache is retrieved. A determination is made whether the current request is seeking the same data from the cache as at least one of the one or more previous requests. A further determination is made whether the at least one of the one or more previous requests seeking the same data was successful in arbitrating access to a processing unit when seeking access. A next cache write access is suppressed if the at least one of previous requests seeking the same data was successful in arbitrating access to the processing unit.