Priority-Based Cache-Line Fitting in Compressed Memory Systems of Processor-Based Systems
20230236961 · 2023-07-27
Inventors
- Norris GENG (San Diego, CA, US)
- Richard SENIOR (San Diego, CA, US)
- Gurvinder Singh CHHABRA (San Diego, CA, US)
- Kan WANG (San Diego, CA, US)
Cpc classification
International classification
Abstract
A compressed memory system of a processor-based system includes a memory partitioning circuit for partitioning a memory region into data regions with different priority levels. The system also includes a cache line selection circuit for selecting a first cache line from a high priority data region and a second cache line from a low priority data region. The system also includes a compression circuit for compressing the cache lines to obtain a first and a second compressed cache line. The system also includes a cache line packing circuit for packing the compressed cache lines such that the first compressed cache line is written to a first predetermined portion and the second cache line or a portion of the second compressed cache line is written to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is larger than the second predetermined portion.
Claims
1. A method for compressing data in a compressed memory system of a processor-based system, comprising: partitioning a memory region into a plurality of data regions, each data region associated with a respective priority level; selecting (i) a first cache line from a first data region of the plurality of data regions and (ii) a second cache line from a second data region of the plurality of data regions, wherein the first data region has a higher priority level than the second data region; compressing (i) the first cache line to obtain a first compressed cache line and (ii) the second cache line to obtain a second compressed cache line; and in accordance with a determination that the first cache line is compressible: writing (i) the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) either the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line, wherein the first predetermined portion is larger than the second predetermined portion.
2. The method of claim 1, further comprising: in accordance with a determination that (i) the first cache line is not compressible, (ii) the second cache line is not compressible, or (iii) the second compressed cache line does not fit within the second predetermined portion of the candidate compressed cache line, setting an overflow pointer in the candidate compressed cache line, wherein the overflow pointer points to one of a plurality of overflow blocks depending on compressibility of the second cache line or a size of the second compressed cache line, and wherein each overflow block of the plurality of overflow blocks is of a different size.
3. The method of claim 2, further comprising: receiving a read request for the first cache line or the second cache line; and in response to receiving the read request: in accordance with a determination that the overflow pointer is set, retrieving data from an overflow block of the plurality of overflow blocks according to the overflow pointer.
4. The method of claim 1, further comprising: in accordance with a determination that the first cache line is compressible: setting a first compressibility control bit in the candidate compressed cache line; and in accordance with a determination that the first cache line is not compressible: writing a first portion of the first cache line to the candidate compressed cache line; writing a remaining portion of the first cache line to an overflow block; resetting the first compressibility control bit in the candidate compressed cache line; and setting an overflow pointer in the candidate compressed cache line to point to the overflow block.
5. The method of claim 4, further comprising: in response to receiving a read request for the first cache line: in accordance with a determination that the first compressibility control bit is set: retrieving the first cache line from the candidate compressed cache line; and in accordance with a determination that the first compressibility control bit is reset: retrieving the first portion of the first cache line from the candidate compressed cache line; and retrieving the second portion of the first cache line from the overflow block based on the overflow pointer.
6. The method of claim 4, further comprising: in accordance with a determination that the first cache line is not compressible: writing either the second cache line or the second compressed cache line to the overflow block depending on whether the second cache line is compressible; and resetting a second compressibility control bit in the candidate compressed cache line to indicate if the second cache line is not compressible.
7. The method of claim 6, further comprising: in response to receiving a read request for the second cache line: retrieving either the second cache line or the second compressed cache line from the overflow block, based on the second compressibility control bit.
8. The method of claim 2, further comprising: in response to receiving a cache line write request for the first cache line: compressing the first cache line to obtain a first updated cache line of a first size; in accordance with a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line, writing the first updated cache line to the candidate compressed cache line; in accordance with a determination that the first size is more than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line, performing a read-modify-write operation on the candidate compressed cache line based on the first updated cache line; and in accordance with a determination that the first size is more than the second predetermined size of the candidate compressed cache line, performing a read-modify-write operation on the candidate compressed cache line and a read-modify-write operation on an overflow block of the plurality of overflow blocks, based on the first updated cache line.
9. The method of claim 8, wherein the first predetermined size is a half of size of the candidate compressed cache line.
10. The method of claim 8, further comprising: while writing either the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, writing an ending bit index in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line was written to within the candidate compressed cache line; and computing the second predetermined size based on the ending bit index in the candidate compressed cache line.
11. The method of claim 2, further comprising: in response to receiving a cache line write request for the second cache line: compressing the second cache line to obtain a second updated cache line of a second size; in accordance with a determination that a sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, performing a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and in accordance with a determination that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, performing (i) a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) a second read-modify-write operation to write a remaining portion of the second updated cache line to an overflow block pointed to by the overflow pointer.
12. The method of claim 2, further comprising: in response to receiving a cache line write request for the first cache line or the second cache line: compressing the first cache line or the second cache line to obtain an updated compressed cache line of an updated size; and in accordance with a determination that the updated size cannot fit within an overflow block pointed to by the overflow pointer, freeing the overflow pointer and updating the overflow pointer to point to a new overflow block of the plurality of overflow blocks.
13. The method of claim 1, wherein the first compressed cache line, the second compressed cache line, and the candidate compressed cache line are of equal size.
14. The method of claim 1, wherein the first data region and the second data region are of equal size.
15. The method of claim 1, wherein the second predetermined portion is less than half the size of the candidate compressed cache line.
16. The method of claim 1, wherein the first compressed cache line and the second compressed cache line are written to the candidate compressed cache line in opposite directions, and wherein the first compressed cache line and the second compressed cache line are separated by one or more bytes.
17. A compressed memory system of a processor-based system, comprising: a memory partitioning circuit configured to partition a memory region into a plurality of data regions, each data region associated with a respective priority level; a cache line selection circuit configured to select (i) a first cache line from a first data region of the plurality of data regions and (ii) a second cache line from a second data region of the plurality of data regions, wherein the first data region has a higher priority level than the second data region; a compression circuit configured to compress (i) the first cache line to obtain a first compressed cache line and (ii) the second cache line to obtain a second compressed cache line; and a cache line packing circuit configured to: in accordance with a determination that the first cache line is compressible: write (i) the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) either the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line, wherein the first predetermined portion is larger than the second predetermined portion.
18. The compressed memory system of claim 17, wherein the cache line packing circuit is further configured to: in accordance with a determination that (i) the first cache line is not compressible, (ii) the second cache line is not compressible, or (iii) the second compressed cache line does not fit within the second predetermined portion of the candidate compressed cache line: set an overflow pointer in the candidate compressed cache line, wherein the overflow pointer points to one of a plurality of overflow blocks depending on compressibility of the second cache line or a size of the second compressed cache line, and wherein each overflow block of the plurality of overflow blocks is of a different size.
19. The compressed memory system of claim 18, wherein the cache line packing circuit is further configured to: receive a read request for the first cache line or the second cache line; and in response to receiving the read request: in accordance with a determination that the overflow pointer is set, retrieve data from an overflow block of the plurality of overflow blocks according to the overflow pointer.
20. The compressed memory system of claim 17, wherein the cache line packing circuit is further configured to: in accordance with a determination that the first cache line is compressible: set a first compressibility control bit in the candidate compressed cache line; and in accordance with a determination that the first cache line is not compressible: write a first portion of the first cache line to the candidate compressed cache line; write a remaining portion of the first cache line to an overflow block; reset the first compressibility control bit in the candidate compressed cache line; and set an overflow pointer in the candidate compressed cache line to point to the overflow block.
21. The compressed memory system of claim 20, wherein the cache line packing circuit is further configured to: in response to receiving a read request for the first cache line: in accordance with a determination that the first compressibility control bit is set: retrieve the first cache line from the candidate compressed cache line; and in accordance with a determination that the first compressibility control bit is reset: retrieve the first portion of the first cache line from the candidate compressed cache line; and retrieve the second portion of the first cache line from the overflow block based on the overflow pointer.
22. The compressed memory system of claim 21, wherein the cache line packing circuit is further configured to: in accordance with a determination that the first cache line is not compressible: write either the second cache line or the second compressed cache line to the overflow block depending on whether the second cache line is compressible; and reset a second compressibility control bit in the candidate compressed cache line to indicate if the second cache line is not compressible.
23. The compressed memory system of claim 22, wherein the cache line packing circuit is further configured to: in response to receiving a read request for the second cache line: retrieve either the second cache line or the second compressed cache line from the overflow block, based on the second compressibility control bit.
24. The compressed memory system of claim 18, wherein the cache line packing circuit is further configured to: in response to receiving a cache line write request for the first cache line: compress the first cache line to obtain a first updated cache line of a first size; in accordance with a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line, write the first updated cache line to the candidate compressed cache line; in accordance with a determination that the first size is more than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line, perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line; and in accordance with a determination that the first size is more than the second predetermined size of the candidate compressed cache line, perform a read-modify-write operation on the candidate compressed cache line and a read-modify-write operation on an overflow block of the plurality of overflow blocks, based on the first updated cache line.
25. The compressed memory system of claim 24, wherein the first predetermined size is a half of size of the candidate compressed cache line.
26. The compressed memory system of claim 24, wherein the cache line packing circuit is further configured to: while writing either the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, write an ending bit index in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line was written to within the candidate compressed cache line; and compute the second predetermined size based on the ending bit index in the candidate compressed cache line.
27. The compressed memory system of claim 18, wherein the cache line packing circuit is further configured to: in response to receiving a cache line write request for the second cache line: compress the second cache line to obtain a second updated cache line of a second size; in accordance with a determination that a sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, perform a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and in accordance with a determination that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, perform (i) a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer.
28. The compressed memory system of claim 18, wherein the cache line packing circuit is further configured to: in response to receiving a cache line write request for the first cache line or the second cache line: compress the first cache line or the second cache line to obtain an updated compressed cache line of an updated size; and in accordance with a determination that the updated size cannot fit within the overflow block pointed to by the overflow pointer, free the overflow pointer and updating the overflow pointer to point to a new overflow block of the plurality of overflow blocks.
29. The compressed memory system of claim 17, wherein the first compressed cache line, the second compressed cache line, and the candidate compressed cache line are of equal size.
30. The compressed memory system of claim 17, wherein the first data region and the second data region are of equal size.
31. The compressed memory system of claim 17, wherein the second predetermined portion is less than half the size of the candidate compressed cache line.
32. The compressed memory system of claim 17, wherein the cache line packing circuit is further configured to: write the first compressed cache line and the second compressed cache line to the candidate compressed cache line in opposite directions, wherein the first compressed cache line and the second compressed cache line are separated by one or more bytes.
33. A non-transitory computer-readable medium having stored thereon computer-executable instructions which, when executed by a processor, cause the processor to: partition a memory region into a plurality of data regions, each data region associated with a respective priority level; select (i) a first cache line from a first data region of the plurality of data regions and (ii) a second cache line from a second data region of the plurality of data regions, wherein the first data region has a higher priority level than the second data region; compress (i) the first cache line to obtain a first compressed cache line and (ii) the second cache line to obtain a second compressed cache line; and in accordance with a determination that the first cache line is compressible: write (i) the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) either the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line, wherein the first predetermined portion is larger than the second predetermined portion.
34. The non-transitory computer-readable medium of claim 33 having stored thereon computer-executable instructions which, when executed by a processor, further cause the processor to: in accordance with a determination that (i) the first cache line is not compressible, (ii) the second cache line is not compressible, or (iii) the second compressed cache line does not fit within the second predetermined portion of the candidate compressed cache line: set an overflow pointer in the candidate compressed cache line, wherein the overflow pointer points to one of a plurality of overflow blocks depending on compressibility of the second cache line or a size of the second compressed cache line, and wherein each overflow block of the plurality of overflow blocks is of a different size.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0048] So that the present disclosure can be understood in greater detail, a more particular description may be had by reference to the features of various embodiments, some of which are illustrated in the appended drawings. The appended drawings, however, merely illustrate the more pertinent features of the present disclosure and are therefore not to be considered limiting, for the description may admit to other effective features.
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060] In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
DETAILED DESCRIPTION
[0061] Numerous details are described herein to provide a thorough understanding of the example implementations illustrated in the accompanying drawings. However, some embodiments may be practiced without many of the specific details, and the scope of the claims is only limited by those features and aspects specifically recited in the claims. Furthermore, well-known methods, components, and circuits have not been described in exhaustive detail so as not to unnecessarily obscure more pertinent aspects of the implementations described herein.
[0062]
[0063]
[0064] However, to provide for faster memory access without the need to compress and decompress, cache memory 214 (e.g., the L2 cache 106) is provided. Cache entries in the cache memory 214 are configured to store the cache data in uncompressed form. Each of the cache entries may be the same width as each of the memory entries for performing efficient memory read and write operations. The cache entries are accessed by respective virtual address (“VA”) tags (e.g., tags stored in the L2 cache tag 108), because as discussed above, the compressed memory system 200 provides more addressable memory space to the processor 104 than physical address space provided in the compressed data region. When the processor 104 issues a memory read request for a memory read operation, a virtual address of the memory read request is used to search the cache memory 214 to determine if the virtual address matches one of the virtual address tags of the cache entries. If so, a cache hit occurs, and the cache data in the hit cache entry of the cache entries is returned to the processor 104 without the need to decompress the cache data. However, because the number of the cache entries is less than the number of the memory entries, a cache miss can occur where the cache data for the memory read request is not contained in the cache memory 214.
[0065] Thus, with continuing reference to
[0066] With continuing reference to
[0067] To do so, the cache memory 214 first sends the virtual address and uncompressed cache data of the evicted cache entry to the compress circuit. The compress circuit receives the virtual address and the uncompressed cache data for the evicted cache entry. The compress circuit initiates a metadata read operation to the metadata cache to obtain metadata associated with the virtual address. During, before, or after the metadata read operation, the compress circuit compresses the uncompressed cache data into compressed data to be stored in the compressed data region. If the metadata read operation to the metadata cache results in a cache miss, the metadata cache issues a metadata read operation to the metadata circuit 210 in the system memory 112 to obtain metadata associated with the virtual address. The metadata cache is then stalled. Because accesses to the compressed data region can take much longer than the processor 104 can issue memory access operations, uncompressed data received from the processor 104 for subsequent memory write requests may be buffered in a memory request buffer.
[0068] After the metadata comes back from the compressed data region to update the metadata cache, the metadata cache provides the metadata as the metadata to the compress circuit. The compress circuit determines whether the new compression size of the compressed data region fits into the same memory block size in the compressed data region as used to previously store data for the virtual address of the evicted cache entry. For example, the processor 104 may have updated the cache data in the evicted cache entry since being last stored in the compressed data region. If a new memory block is needed to store the compressed data region for the evicted cache entry, the compress circuit recycles a pointer to the current memory block in the compressed memory system 200 associated with the virtual address of the evicted cache entry to one of free memory lists (e.g., list of free 64B blocks 216, list of free 48B blocks 218, list of free 32B blocks 220, and list of free 16B blocks 222) of pointers to available memory blocks in the compressed data region. The compress circuit then obtains the pointer from one of the free memory lists to the new, available memory block of desired memory block size in the compressed data region to store the compressed data region for the evicted cache entry. The compress circuit then stores the compressed data region for the evicted cache entry in the memory block in the compressed data region associated with the virtual address for the evicted cache entry determined from the metadata.
[0069] If a new memory block was assigned to the virtual address for the evicted cache entry, the metadata in the metadata cache entry of the metadata cache entries corresponding to the virtual address tag of the virtual address tags of the evicted cache entry is updated based on the pointer to the new memory block. The metadata cache then updates the metadata in the metadata entry of the metadata entries corresponding to the virtual address in the metadata cache based on the pointer to the new memory block 125.
[0070] Because the metadata of the metadata circuit 210 is stored in the system memory 112, the metadata circuit 210 may consume an excessive amount of the system memory 112, thus negatively impacting system performance. Accordingly, it is desirable to minimize the amount of the system memory 112 that is required to store the metadata, while still providing effective data compression. In this regard, some implementations of the compressed memory system 200 reduce metadata size. The techniques described herein can be used to
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
Example Read/Write Overhead and Statistics
[0081] For cache line read, high and low priority cache line fitting in one compressed cache line require one read, and other cache lines require two reads, one for compressed line and one for overflow block. For cache line write, compressed high priority cache line fitting in 255 bits requires one write, compressed high priority cache line fitting in 256-480 bits requires one read-modified-write, and other non-compressible high priority cache lines (> 480 bits) require one read-modified-write to a compressed line and one read-modified-write to an overflow block; low priority cache line fitting in the compressed line requires one read-modified-write, low priority cache line not fitting in the compressed line requires one read-modified-write to compressed line and one read-modified-write to an overflow block.
[0082] Following table shows example cache line read/write cost and statistics for a modem data (50017075 bytes of compression candidate data).
TABLE-US-00001 Category Cache Lines % HP fit in 255 bits 330,998 84.71 HP fit in line (HP requires one read) 374,207 95.77 HP/LP fit in line (LP requires one read) 307,134 78.6
[0083] From the statistics, 84.7% of high priority cache line requires only one read/one write for cache line read/write. Additional 11% high priority cache lines on top of above requires one read/one read-modified-write for cache line read/write. Only 4.3% of the high priority cache lines require two reads/two read-modified-writes for cache line read/write. 78.6% of low priority cache lines require one read/one read-modified-write for cache line read/write. 21.4% of low priority cache line require two reads/two read-modified-writes for cache line read/write.
[0084] In
[0085]
[0086]
[0087]
[0088]
[0089]
[0090] The compressed memory system 1400 also includes a cache line selection circuit 1404 configured to select (i) a first cache line from a first data region of the plurality of data regions and (ii) a second cache line from a second data region of the plurality of data regions. For example, a cache line’s address may be used by the cache line selection circuit 1404 to determine whether the cache line belongs to a first data region or a second data region. In accordance with that determination, the cache line selection circuit 1404 may select the first cache line and the second cache line. The first data region has a higher priority level than the second data region. For example, in
[0091] The compressed memory system 1400 also includes a compression circuit 1406 configured to compress (i) the first cache line to obtain a first compressed cache line and (ii) the second cache line to obtain a second compressed cache line. Examples of compression circuits are described above in reference to
[0092] The compressed memory system 1400 also includes a cache line packing circuit 1408 configured to: in accordance with a determination that the first cache line is compressible, write (i) the first compressed cache line to a first predetermined portion of a candidate compressed cache line, and (ii) either the second cache line or a second portion of the second compressed cache line to a second predetermined portion of the candidate compressed cache line. The first predetermined portion is larger than the second predetermined portion. For example,
[0093] In some implementations, the cache line packing circuit 1408 is further configured to: in accordance with a determination that (i) the first cache line is not compressible, (ii) the second cache line is not compressible, or (iii) the second compressed cache line does not fit within the second predetermined portion of the candidate compressed cache line: set an overflow pointer (e.g., the OFP 412 in
[0094] In some implementations, the cache line packing circuit 1408 is further configured to: in accordance with a determination that the first cache line is compressible, set a first compressibility control bit (e.g., HPC 406) in the candidate compressed cache line; and in accordance with a determination that the first cache line is not compressible: write a first portion of the first cache line to the candidate compressed cache line; write a remaining portion of the first cache line to an overflow block; reset the first compressibility control bit in the candidate compressed cache line; and set an overflow pointer in the candidate compressed cache line to point to the overflow block. An example is described above in reference to
[0095] In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for the first cache line: compress the first cache line (e.g., using the compression circuit 1406) to obtain a first updated cache line of a first size; in accordance with a determination that the first size is equal to or less than a first predetermined size of the candidate compressed cache line, write the first updated cache line to the candidate compressed cache line; in accordance with a determination that the first size is more than the first predetermined size and equal to or less than a second predetermined size of the candidate compressed cache line, perform a read-modify-write operation on the candidate compressed cache line based on the first updated cache line; and in accordance with a determination that the first size is more than the second predetermined size of the candidate compressed cache line, perform a read-modify-write operation on the candidate compressed cache line and a read-modify-write operation on an overflow block of the plurality of overflow blocks, based on the first updated cache line. In some implementations, the first predetermined size is a half of size of the candidate compressed cache line. In some implementations, the cache line packing circuit 1408 is further configured to: while writing either the second cache line or the second portion of the second compressed cache line to the second predetermined portion of the candidate compressed cache line, write an ending bit index (e.g., LPE 414) in the candidate compressed cache line to indicate where the second cache line or the second portion of the second compressed cache line was written to within the candidate compressed cache line; and compute the second predetermined size based on the ending bit index in the candidate compressed cache line.
[0096] In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for the second cache line: compress the second cache line to obtain a second updated cache line of a second size; in accordance with a determination that a sum of the second size and the size of the first compressed cache line is less than a first predetermined size of the candidate compressed cache line, perform a read-modify-write operation to write the second updated cache line to the candidate compressed cache line; and in accordance with a determination that the sum of the second size and the size of the first compressed cache line is not less than the first predetermined size of the candidate compressed cache line, perform (i) a first read-modify-write operation to write a first portion of the second updated cache line to the candidate compressed cache line, and (ii) a second read-modify-write operation to write a remaining portion of the second updated cache line to the overflow block pointed to by the overflow pointer. For high-priority cache lines, in some implementations, there are no control bits to indicate where HP ends. In such instances, the system decompresses the HP part to determine that information. In some implementations, this is not a true decompression, rather it is a scan through the HP compressed bits (the process is very similar to decompression) to determine the end. Some implementations reserve 8-9 bits to indicate the end of HP just like the LP side.
[0097] In some implementations, the cache line packing circuit 1408 is further configured to: in response to receiving a cache line write request for the first cache line or the second cache line: compress the first cache line or the second cache line to obtain an updated compressed cache line of an updated size; and in accordance with a determination that the updated size cannot fit within the overflow block pointed to by the overflow pointer, free the overflow pointer and updating the overflow pointer to point to a new overflow block of the plurality of overflow blocks. Examples for manipulation of free lists are described above in reference to
[0098] In some implementations, the first compressed cache line, the second compressed cache line, and the candidate compressed cache line are of equal size. For example, in
[0099] In some implementations, the first data region and the second data region are of equal size. For example, in
[0100] In some implementations, the second predetermined portion is less than half the size of the candidate compressed cache line. Examples of layouts are described above in reference to
[0101] In some implementations, the cache line packing circuit 1408 is further configured to: write the first compressed cache line and the second compressed cache line to the candidate compressed cache line in opposite directions. The first compressed cache line and the second compressed cache line are separated by one or more bytes.
[0102] In another aspect, a compressed memory system of a processor-based system is provided. The compressed memory system includes a memory region comprising a plurality of cache lines. For example,
[0103] In some implementations, the compressed memory system further includes an overflow memory region including a plurality of overflow bins (e.g., the overflow block 426). Each overflow bin is configured to hold a distinct number of bytes. For example, the overflow blocks 308, 310, and 312, each include different number of bytes. Each compressed cache line further includes a set of overflow pointer bits (e.g., OFP 412) configured to hold a pointer to an overflow bin of the plurality of overflow bins.
[0104] In some implementations, each compressed cache line further includes: a first control bit (e.g., HPC 406) to indicate a compressibility of the first cache line; and a second control bit (e.g., LPC 408) to indicate a compressibility of the second cache line.
[0105] In some implementations, the compressed memory system further includes an overflow memory region including a plurality of overflow bins. Each overflow bin is configured to hold a distinct number of bytes. Each compressed cache line further includes a set of bits (e.g., OFB 410) configured to hold a size of an overflow bin of the plurality of overflow bins.
[0106] In some implementations, each compressed cache line further includes an ending bit index (e.g., LPE 414) indicating an end of the second set of data bits.
[0107] In some implementations, each overflow bin is configured to hold bits of the first cache line, and/or bits of the second cache line or the second cache line after compression, in the first direction. For example, as shown in
[0108] In some implementations, the first set of data bits is separated by one or more bytes from the second set of data bits. For example, in
[0109] In some implementations, each cache line and each compressed cache line are of a same size. For example, in
[0110] In some implementations, when the first cache line after compression or the second cache line after compression does not fit in the compressed cache line then each compressed cache line further includes a set of overflow pointer bits configured to hold a pointer to an overflow bin of the plurality of overflow bins. The overflow bin is configured to hold bits of the first cache line, the first cache line after compression, the second cache line, and/or the second cache line after compression. In other words, as shown in
[0111] In some implementations, the second set of data bits is further configured to hold a plurality of control bits that indicate overflow (e.g., OFP 412, OFB 410) and an end of the second set of data bits (e.g., LPE 414) in a compressed cache line.
[0112] It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
[0113] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the claims. As used in the description of the embodiments and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0114] As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
[0115] The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.