ASSOCIATIVE AND ATOMIC WRITE-BACK CACHING SYSTEM AND METHOD FOR STORAGE SUBSYSTEM
20170242794 · 2017-08-24
Inventors
- Horia Cristian Simionescu (Foster City, CA, US)
- Balakrishnan Sundararaman (Austin, TX, US)
- Shashank Nemawarkar (Austin, TX)
- Larry Stephen King (Austin, TX, US)
- Mark Ish (Castro Valley, CA, US)
- Shailendra Aulakh (Austin, TX)
Cpc classification
G06F12/1081
PHYSICS
G06F2212/621
PHYSICS
G06F12/0806
PHYSICS
G06F12/122
PHYSICS
International classification
Abstract
In response to a cacheable write request from a host, physical cache locations are allocated from a free list, and the data blocks are written to those cache locations without regard to whether any read requests to the corresponding logical addresses are pending. After the data has been written, and again without regard to whether any read requests are pending against the corresponding logical addresses, metadata is updated to associate the cache locations with the logical addresses. A count of data access requests pending against each cache location having valid data is maintained, and a cache location is only returned to the free list when the count indicates no data access requests are pending against the cache location.
Claims
1. A method for caching in a data storage subsystem, comprising: receiving a write request indicating one or more logical addresses and one or more data blocks to be written correspondingly to the one or more logical addresses; in response to the write request, allocating one or more physical locations in a cache memory from a free list; storing the one or more data blocks in the one or more physical locations without regard to whether any read requests to the one or more logical addresses are pending; after the one or more data blocks have been stored in the one or more physical locations, and without regard to whether any read requests are pending against the one or more logical addresses, updating metadata to associate the one or more physical locations with the one or more logical addresses; maintaining a count of data access requests, including read requests, pending against each physical location in the cache memory having valid data; and returning a physical location to the free list when the count indicates no data access requests are pending against the physical location.
2. The method of claim 1, wherein allocating the one or more physical locations comprises selecting the one or more physical locations from the free list without regard to order of the one or more physical locations in the cache memory.
3. The method of claim 1, wherein allocating the one or more physical locations comprises selecting the one or more physical locations from the free list without regard to the one or more logical addresses associated with the write request.
4. The method of claim 1, wherein updating metadata comprises: storing identification information identifying the one or more physical locations in one or more data structures; and updating an entry in a hash table to include a pointer to the one or more data structures.
5. The method of claim 4, wherein: allocating the one or more physical locations comprises generating a scatter-gather list (SGL) containing information identifying the one or more physical locations; and storing the one or more data blocks in the one or more physical locations comprises: providing the SGL to a direct memory access (DMA) engine; and the DMA engine transferring the one or more data blocks from a host interface to the cache memory in response to the SGL.
6. The method of claim 4, wherein the one or more data structures define a linked list.
7. The method of claim 4, wherein updating metadata comprises: determining a hash table slot in response to a logical address; and determining whether any of a plurality of entries in the hash table slot identifies the logical address.
8. The method of claim 7, wherein the one or more data structures define a linked list, and the method further comprises adding a new data structure to the linked list in response to determining that none of the entries in the hash table slot identifies the logical address.
9. The method of claim 8, further comprising adding information identifying the new data structure in the linked list to a least recently used dirty list.
10. The method of claim 4, wherein each of the one or more data structures includes a plurality of sub-structures, each sub-structure configured to store the identification information identifying one of the physical locations in the cache memory, each sub-structure further configured to store the count of data access requests pending against the physical location identified by the identification information.
11. The method of claim 10, wherein each sub-structure is further configured to store a dirty indicator indicating whether the physical location identified by the identification information contains dirty data.
12. A system for caching in a data storage subsystem, comprising: a cache memory; and a processing system configured to: receive a write request indicating one or more logical addresses and one or more data blocks to be written correspondingly to the one or more logical addresses; in response to the write request, allocate one or more physical locations in a cache memory from a free list; store the one or more data blocks in the one or more physical locations without regard to whether any read requests to the one or more logical addresses are pending; after the one or more data blocks have been stored in the one or more physical locations, and without regard to whether any read requests are pending against the one or more logical addresses, update metadata to associate the one or more physical locations with the one or more logical addresses; maintain a count of data access requests, including read requests, pending against each physical location in the cache memory having valid data; and return a physical location to the free list when the count indicates no data access requests are pending against the physical location.
13. The system of claim 12, wherein the processing system is configured to allocate the one or more physical locations by being configured to select the one or more physical locations from the free list without regard to order of the one or more physical locations in the cache memory.
14. The system of claim 12, wherein the processing system is configured to allocate the one or more physical locations by being configured to select the one or more physical locations from the free list without regard to the logical addresses associated with the write request.
15. The system of claim 12, wherein the processing system is configured to update metadata by being configured to: store identification information identifying the one or more physical locations in one or more data structures; and update an entry in a hash table to include a pointer to the one or more data structures.
16. The system of claim 15, wherein: the processing system is configured to allocate the one or more physical locations by being configured to generate a scatter-gather list (SGL) containing information identifying the one or more physical locations; and the processing system is configured to store the one or more data blocks in the one or more physical locations by being configured to: provide the SGL to a direct memory access (DMA) engine; and transfer, by the DMA engine, the one or more data blocks from a host interface to the cache memory in response to the SGL.
17. The system of claim 15, wherein the one or more data structures define a linked list.
18. The system of claim 15, wherein the processing system is configured to update metadata by being configured to: determine a hash table slot in response to a logical address; and determine whether any of a plurality of entries in the hash table slot identifies the logical address.
19. The system of claim 18, wherein the one or more data structures define a linked list, and the processing system is configured to add a new data structure to the linked list in response to determining that none of the entries in the hash table slot identifies the logical address.
20. The system of claim 19, wherein the processing system is further configured to add information identifying the new data structure in the linked list to a least recently used dirty list.
21. The system of claim 15, wherein each of the one or more data structures includes a plurality of sub-structures, each sub-structure configured to store the identification information identifying one of the physical locations in the cache memory, each sub-structure further configured to store the count of data access requests pending against the physical location identified by the identification information.
22. The system of claim 21, wherein each sub-structure is further configured to store a dirty indicator indicating whether the physical location identified by the identification information contains dirty data.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
WRITTEN DESCRIPTION
[0019] As illustrated in
[0020] In the exemplary embodiment described herein, the array of multiple physical data storage devices 16, 18, 20, etc., in back-end storage 14 may conform to one or more of the principles commonly referred to under the umbrella of “RAID” or “redundant array of independent (or inexpensive) disks.” For example, in accordance with a common RAID principle known as striping, back-end storage 14 may store data in units of stripes 22. Each of physical data storage devices 16, 18, 20, etc., stores a portion of each stripe 22. Back-end storage 14 may include any number of physical storage devices 16, 18, 20, etc. (The ellipsis symbol (“ . . . ”) in
[0021] In the exemplary embodiment, storage subsystem 10 includes a cache memory 24. Cache memory 24 may be of any type, such as, for example, double data rate dynamic random access memory (DDR-DRAM). Storage subsystem 10 also includes a central processing unit (CPU) 26 and a working memory 28. Working memory 28 may be of any type, such as, for example, static RAM. While CPU 26 may perform generalized processing tasks, storage subsystem 10 further includes the following specialized processing elements: a message processor 30, a command processor 32, a cache processor 34, a buffer processor 36, a back-end processor 38, and a direct memory access (DMA) engine 40. Although in the exemplary embodiment storage subsystem 10 includes these specialized processing elements, other embodiments may include fewer or more processing elements, which in such other embodiments may perform some or all of the processing operations described herein. Storage subsystem 10 also includes a system interconnect 42, such as system or matrix of busses, through which the above-referenced processing elements communicate with each other. Other communication or signal paths among the above-referenced elements also may be included. A host interface 44, through which storage subsystem 10 communicates with host system 12, and a storage interface 46, through which storage subsystem 10 communicates with back-end storage 14, may also be included. Host interface 44 may conform to a communication bus standard, such as, for example, Peripheral Component Interconnect Express (PCIe) and include an associated PCIe controller. Other interfaces, such as memory interfaces and associated memory controllers, also may be included but are not shown for purposes of clarity. Although not shown, storage subsystem 10 may define a portion of an accelerator card that plugs into a backplane or motherboard of host system 12. Some or all of the above-referenced processing elements may be included in an integrated circuit device (not shown), such as a field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), or other device.
[0022] As illustrated in
[0023] Cached data is stored in data area 48 in units referred to as buffer blocks. The unit defines an amount of data, such as, for example, 4 kilobytes (KB). The term “block” means that the data is contiguous. In an exemplary embodiment in which the above-referenced stripe 22 consists of, for example, 64 KB, each stripe 22 thus corresponds to 16 buffer blocks. As described in further detail below, any buffer block can be cached or stored in any available physical location (i.e., address) in data area 48 without regard to any ordering of the buffer blocks and without regard to any relationship between physical and logical addresses of the buffer blocks. As a result, buffer blocks corresponding to a stripe 22 are not necessarily stored contiguously with each other. This characteristic is referred to as associativity.
[0024] For purposes of illustration, some exemplary buffer blocks 58, 60, 62, 64, 66, 68, 70, 72, etc., are depicted as stored in various physical locations in data area 48. (The ellipsis symbols in data area 48 indicate additional buffer blocks in additional physical locations that are not shown for purposes of clarity.) The storage capacity of data area 48 may be substantially less than the storage capacity of back-end storage 14. For example, the storage capacity of back-end storage 14 may be on the order of terabytes, while the storage capacity of data area 48 may be on the order of gigabytes or megabytes. To facilitate processing, a physical location in data area 48 may be identified by a buffer block identifier (BBID) that serves as an index or offset from a physical memory address. As described below with regard to an example, exemplary buffer blocks 68, 66, 70, 62 and 64 are ordered in the manner indicated by the broken-line arrows, with exemplary buffer block 68 being the first of a sequence, and exemplary buffer block 64 being the last of the sequence (with additional buffer blocks being indicated by the ellipsis symbol but not shown for purposes of clarity). That exemplary buffer blocks 58, 60, 62, 64, 66, 68, 70 and 72 are depicted in
[0025] For purposes of illustration, some exemplary cache segments 74, 76, 78, etc., are depicted as stored in physical locations in cache segment area 50. Additional cache segments are indicated by the ellipsis symbol but not shown for purposes of clarity. As described below in further detail, a cache segment is a data structure that contains metadata describing the cached buffer blocks.
[0026] The manner in which a hash table 80 relates to a cache segment 102 is illustrated in
[0027] Hash table 80 comprises a number (n) of slots, of which a first exemplary slot 82, a second exemplary slot 84, etc., through another exemplary slot 86, and up to a last or “nth” exemplary slot 88 are shown, with additional slots indicated by ellipsis symbols but not shown for purposes of clarity. Although hash table 80 may have any number of slots, the number is generally substantially less than the number of logical addresses in the host address space. An example of hash function 84 is: Slot=(LBA)MOD(n), where “Slot” represents an index to a slot in hash table 80, “LBA” represents a logical address, and MOD or modulo is the modular arithmetic function. As the use of a hash function to index a table is well understood in the art, further details are not described herein.
[0028] Each slot has multiple entries 90. For example, each slot of hash table 80 may have four entries 90. Employing multiple (i.e., two or more) entries 90 per hash table slot rather than a single entry per hash table slot can help minimize “collisions,” as hash table address conflicts are commonly referred to in the art. As described below, (in the case of a miss) any empty entry 90 within a slot can be used to fulfill a write request. In an instance in which all of the entries of a slot are occupied, then additional entries 92, 94, etc., can be added in the form of a linked list.
[0029] Each entry 90 includes a logical address field 96, a cache segment identifier (CSID) field 98, and a valid entry field or bit (V) 100. As described below with regard to examples of write and read operations, cache segment identifier field 98 is configured to store a cache segment identifier (e.g., a pointer) that identifies or indexes a cache segment stored in cache segment area 50 (
[0030] Each cache segment identified by a cache segment identifier may have the structure of the exemplary cache segment 102 shown in
[0031] Each cache segment list element includes the following flag fields: a buffer block identifier (BBID) field 110; a valid buffer block field or bit (V) 112; a dirty buffer block field or bit (D) 114; a flush buffer block field or bit (F) 116; and a use count (CNT) field 118. Although the manner in which the flags stored in these flag fields are used is discussed below with regard to write and read operations, the following may be noted. The valid (buffer block) bit 112 of a cache segment list element indicates whether the buffer block identified by the buffer block identifier field 110 of that cache segment list element is valid. As understood by one of skill in the art, the term “valid” is commonly used in the context of caching to denote locations in the cache memory to which data has been written. The dirty (buffer block) bit 114 of a cache segment list element indicates whether the buffer block identified by the buffer block identifier field 110 of that cache segment list element is dirty. As understood by one of skill in the art, the term “dirty” is commonly used in the context of caching to refer to cached data that has not yet been copied to back-end storage 14. The flush (buffer block) bit 116 of a cache segment list element indicates whether the buffer block identified by the buffer block identifier field 110 of that cache segment list element is in the process of being evicted or “flushed” to back-end storage 14. The use count field 118 of a cache segment list element indicates the number of data access requests, including read requests and flush operations, which are pending against the buffer block identified by the buffer block identifier field 110 of that cache segment list element. These fields of a cache segment thus serve as metadata describing aspects of the buffer blocks identified by the buffer block identifier fields 110 of that cache segment.
[0032] Each cache segment also comprises a previous cache segment identifier field 120 and a next cache segment identifier field 122. As illustrated in
[0033] Note that in accordance with the exemplary embodiment, in which the above-referenced stripe 22 corresponds to 16 buffer blocks: the 16 cache segment lists elements (not shown) of cache segment 102 correspond to 16 exemplary buffer blocks 68, 66, 70, etc., through 128; and the 16 cache segment lists elements (not shown) of cache segment 124 correspond to 16 exemplary buffer blocks 130, etc., through 62. Note in the example shown in
[0034] Scatter-gather lists (SGLs), which are data structures, may be employed to communicate information identifying physical locations in data area 48 in which buffer blocks are stored. Any number of SGLs may be linked together. For example, as illustrated in
[0035] As illustrated by the flow diagram of
[0036] As indicated by block 138 (
[0037] As indicated by block 140, one or more SGLs (not shown) containing information identifying the allocated physical locations may be generated. The SGLs are communicated to DMA engine 40 (
[0038] As described below, following the transfer of data from host 12 to data area 48 in response to a write request, cache processor 34 (
[0039] As indicated by block 144, cache processor 34 look ups the one or more logical addresses identified in the write request in the above-described hash table 80 (
[0040] If cache processor 34 determines (block 146) that the result of a hash table lookup is a miss, then cache processor 34 allocates a new cache segment, as indicated by block 148. The new cache segment is identified by a CSID as described above. Cache processor 34 stores the CSID in an available one of the slot entries, or, if all (e.g., four) entries of the slot itself are occupied, then the CSID of the “link” entry of the slot is updated to set the next hash link to the newly added CSID. The previous hash link of the new CSID is set to the CSID of the “link” entry in the referenced slot of the hash table. Then, as indicated by block 150, for each cache segment list element in the newly allocated cache segment, cache processor 34 copies the buffer block identifiers from the SGL into the buffer block identifier fields 110 (
[0041] If cache processor 34 determines (block 146) that the result of a hash table lookup is a hit, then cache processor 34 reads the cache segment identified by the slot entry, as indicated by block 156. Then, as indicated by block 158, for each cache segment list element that is “empty,” i.e., that does not already contain a valid buffer block identifier in its buffer block identifier field 110, cache processor 34 copies the buffer block identifier from the SGL into that buffer block identifier field 110. In that cache segment list element, cache processor 34 also sets the following flags (
[0042] As indicated by blocks 162 and 164, if cache processor 34 determines that the flush bit 116 of the cache segment list element is not “0” (“false”) or the use count 118 of the cache segment list element contains a value other than zero, then, as indicated by block 166, cache processor 34 copies or saves the flags (i.e., the values of valid bit 112, dirty bit 114, flush bit 116 and use count field 118) into, for example, the miscellaneous area 56 of cache memory 24 (
[0043] However, if cache processor 34 determines (blocks 162 and 164) that the flush bit 116 of the cache segment list element is “0” (“false”) and the use count 118 of the cache segment list element contains a value of zero, then, as indicated by block 170, cache processor 34 de-allocates the buffer block identifier that is in the buffer block identifier field 110 of that cache segment list element. That is, the buffer block identifier is returned to the above-referenced free list. Then, cache processor 34 overwrites the buffer block identifier fields 110 of that cache segment with the buffer block identifiers obtained from the SGL, as indicated by block 168.
[0044] Also in response to a hash table hit, cache processor 34 updates the LRU dirty linked list, as indicated by block 172. More specifically, if it is determined that the cache segment identifier already exists in a location the LRU dirty linked list, then the cache segment identifier is removed from that location, and a new location for that cache segment identifier is added to (i.e., linked to) the tail of the LRU dirty linked list. Thus, the most recently written cache segment identifiers are moved to the tail of the LRU dirty linked list. As described below, maintaining the LRU dirty linked list in this manner facilitates evicting or flushing less recently written (i.e., oldest) data to back-end storage 14. The process then continues as described above with regard to block 158.
[0045] If the write request spans more than one stripe 22, the above-described operations are repeated for each stripe 22 for which the result of the hash table lookup is a hit, as indicated by block 154. When all stripes 22 of the write operation have been processed in the above-described manner, host 12 is notified that the write operation has been completed, as indicated by block 174.
[0046] As illustrated by the flow diagram of
[0047] As indicated by block 182, cache processor 34 responds to initiation of the read operation by performing a lookup in hash table 80 (
[0048] As indicated by block 184, if cache processor 34 determines that the result of the hash table lookup is a hit (which may be a full hit or a partial hit), then cache processor 34 reads the cache segments indicated by the entry that resulted in the hit, as indicated by block 186. As described above with regard to
[0049] After all stripes associated with the read request have been read, processing continues at block 196 (
[0050] As stated above, an SGL can be used to facilitate the transfer of data from data area 48. In the case of a partial hit rather than a full hit, cache processor 34 uses the information identifying dirty and not-dirty buffer blocks to include information in the SGL indicating which of the buffer blocks to read from data area 48 and which to “skip over” in data area 48 and to instead read from back-end storage 14. Buffer blocks that are valid and dirty must be read from data area 48, but buffer blocks that are valid and not-dirty may be read from back-end storage 14. Cache manager 34 sends the SGL (or multiple SGLs linked together) either to back-end processor 38 in the case of a partial hit, or to DMA engine 40 in the case of a full hit. As indicated by block 200, the requested buffer blocks are then read from data area 48, back-end storage 14, or a combination of both data area 48 and back-end storage 14.
[0051] As illustrated in
[0052] Transferring data from data area 48 to back-end storage 14, which is commonly referred to in the art as evicting data or flushing data from cache memory, is not described in detail, as evicting or flushing data can be performed in a conventional manner understood by one of ordinary skill in the art. Briefly, with reference to the flow diagram of
[0053] It should be understood that the flow diagrams of
[0054] It should be noted that the invention has been described with reference to one or more exemplary embodiments for the purpose of demonstrating the principles and concepts of the invention. The invention is not limited to these embodiments. As will be understood by persons skilled in the art, in view of the description provided herein, many variations may be made to the embodiments described herein and all such variations are within the scope of the invention.