Bank-selective power efficient content-addressable memory
11574680 · 2023-02-07
Assignee
Inventors
Cpc classification
International classification
G11C15/00
PHYSICS
Abstract
The present invention provides a power efficient content-addressable memory (CAM) architecture that is implementable on FPGAs. The provided CAM architecture comprises an array of CAM cells having a width C.sub.W and a depth C.sub.D, and being grouped into a B number of memory banks. Each of the CAM cells is configured for storing a memory bit and comprises a plurality of flip-flops configured to store at least a masking bit indicating the ternary nature of the stored memory bit and a storing bit saving the binary information of the stored memory bit. The provided CAM architecture allows activating only one bank in multiple banks irrespective of nature of the data set and is updated in a single access and saves power consumption by only accessing the memory in the activated bank. The dynamic power consumption is reduced by 40% compared with the state-of-the-art FPGA-based CAMs.
Claims
1. An electronic memory device comprising: an array of content-addressable memory (CAM) cells having a width C.sub.W and a depth C.sub.D, and being grouped into a B number of memory banks; a decoder configured for decoding a C.sub.W-n bits input search key to obtain a n-bits selector key to address a memory bank to be activated for searching operation, where n=log.sub.2B; and a clock-gating logic for passing a clock signal going to the addressed memory bank while stopping the clock signal going to other memory banks; wherein each of the CAM cells is configured for storing a memory bit and comprises a plurality of flip-flops configured to store at least a masking bit indicating the ternary nature of the stored memory bit and a storing bit saving the binary information of the stored memory bit.
2. The electronic memory device in accordance with claim 1, wherein B is equal to four.
3. The electronic memory device in accordance with claim 1, further comprising: a plurality of comparators configured to compare a plurality of bits of the input search key with a plurality of stored rules to output a plurality of match lines.
4. The electronic memory device in accordance with claim 1, wherein each comparator includes: a comparison gate configured to compare a bit of the input search key with a stored rule to output a match line; and a masking gate configured to receive the compared result from the comparison gate and generate a match line based on the compared result and a corresponding masking bit.
5. The electronic memory device in accordance with claim 1, further comprising a multiplexer configured to select output of the activated one of the B number of memory banks; and wherein: the plurality of comparators is configured to compare C.sub.W-n bits of the input search key with C.sub.W-n stored rules to output C.sub.W-n compared results; the plurality of masking gates is configured to receive the C.sub.W-n compared results respectively and generate C.sub.W-n matching lines for the input search key.
6. The electronic memory device in accordance with claim 1, further comprising a priority encoder configured to provide one or more address of one or more matched stored rule based on the match-lines.
7. The electronic memory device in accordance with claim 1, further comprising a backup content-accessible memory bank configured for overcoming bank overflow and having a width equals to that of each of the memory banks and a depth equals to a half of that of each of the memory banks.
8. The electronic memory device in accordance with claim 1, wherein each of the plurality of CAM cells has a binary content-addressable memory (BiCAM) structure.
9. The electronic memory device in accordance with claim 1, wherein each of the plurality of CAM cells has a ternary content-addressable memory (TCAM) structure.
10. The electronic memory device in accordance with claim 1, the plurality of flip-flops is implementable on a field-programmable gate array (FPGA).
11. The electronic memory device in accordance with claim 9, the FPGA is reconfigurable.
12. A method for operating an electronic memory device including an array of content-addressable memory (CAM) cells having a width C.sub.W and a depth C.sub.D implemented with a field-programmable gate array and divided into a B number of memory banks, each of the CAM memory cells is configured for storing a memory bit and comprises a plurality of flip-flops configured to store at least a masking bit indicating the ternary nature of the stored memory bit and a storing bit saving the binary information of the stored memory bit; the method comprising: decoding, with a decoder, a C.sub.W-n bits input search key to obtain a n-bits selector key from to address a memory bank to be activated for searching operation, where n=log.sub.2B; and passing, with a clock-gating logic, a clock signal going to the addressed memory bank while stopping the clock signal going to other memory banks; selecting, with a multiplexer configured, output of the activated one of the B number of memory banks; comparing, with a plurality of comparators, C.sub.W-n bits of the input search key with C.sub.W-n stored rules to output C.sub.W-n compared results; generating, with a plurality of masking gates, C.sub.W-n match lines for the input search key based on the C.sub.W-n compared results and C.sub.W-n corresponding masking bits; and providing, with a priority encoder, one or more address of one or more matched stored rule based on the match-lines.
13. The method in accordance with claim 12, wherein B is equal to 4.
14. The method in accordance with claim 12, wherein each of the plurality of CAM cells has a binary content-addressable memory (BiCAM) structure.
15. The method in accordance with claim 12, wherein each of the plurality of CAM memory cells has a ternary content-addressable memory (TCAM) structure.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the invention are described in more detail hereinafter with reference to the drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION
(10) In the following description, a power-efficient content-addressable memory (CAM) architecture that is implementable on FPGAs and a method for operating the same are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
(11)
(12) Each of the memory cells is configured for storing a memory bit (or stored rule) and comprises a plurality of flip-flops configured to store at least a masking bit indicating the ternary nature of the stored memory bit and a storing bit saving the binary information of the stored memory bit.
(13) The CAM architecture 100 may be grouped into a B number of memory banks, Bk.sub.d, where d=0, . . . , B−1.
(14) The CAM architecture 100 may further comprise a decoder 120, such as a n:2n decoder, configured for decoding a C.sub.W-bits input search key, S.sub.w, to obtain a n-bits selector key to address a memory bank to be activated for searching operation, where n=log.sub.2B. The position of selector key can be the most significant bits (MSBs), the least significant bits (LSBs), or even in the middle of the search keys based on the distribution of ternary/binary data and type of application. For example, in access control systems, selecting LSBs as selector key evenly distribute the keys among the banks of the proposed TCAM architecture.
(15) The data patterns (storing rules) are mapped (or populated) to the corresponding bank based on the n-bits selector key. The number of bits of the selector key, n, is determined by the number of banks using the equation: n=log.sub.2B.
(16) The CAM architecture 100 may further comprise a clock-gating logic 130 for passing a clock signal, Clk, going to the addressed memory bank while stopping the clock signal going to other memory banks. Preferably, the clock-gating logic 130 may include a combination of AND gates used to allow or stop the clock signal going to each bank.
(17) The CAM architecture may further comprise a plurality of comparators 140 configured to compare C.sub.W-n bits of the input search key with C.sub.W-n stored rules to output C.sub.W-n match lines, MLs.
(18) The CAM architecture may further comprise a priority encoder configured for providing one or more address, Add, of one or more matched stored rule based on the match lines.
(19)
(20)
(21) Referring back to
(22) Referring back to
(23)
(24)
(25) Preferably, the memory is divided into four memory banks. As only one bank is selected, only 25% of the total flip-flops are activated to perform the searching or storing operation. Theoretically, 75% of the comparators are reduced by an extra multiplexer. Other numbers of banks have been explored to find the optimal configuration of bank-selective CAM. For example, for two banks, 50% power improvement is expected, but there is almost no improvement. Similarly, if the number of banks is eight or sixteen, 87.5% or 93.75% power improvement respectively is expected, but the power consumption increases due to the large overhead of multiplexer and clock gating circuitry. Thus, the architecture with the memory divided into four memory banks is the optimal choice for power saving in FPGA-based CAM architectures.
(26) In practical applications, because the data are randomly distributed and the chances of all the rules having the same selector bits are minimal. Therefore, the rules may not be evenly distributed among the memory banks and overflowing of rules to a particular bank may occur. For example, a 64×36 CAM can store 64 rules of any pattern. If we divide 64×36 CAM into four banks, given that the number of bits of selector key is two, each bank has a capacity of 16 rules. If there are rules after 16, say, the 17.sup.th rule, to be mapped to in Bank.sub.0 which already has 16 rules, the 17.sup.th rule is overflowed.
(27)
(28) The backup CAM 710 may have a width similar to the width of the CAM 110, and a depth equal to half of one of the banks, Bk.sub.i. In other words, the backup CAM 710 may have a width of C.sub.W and a depth of C.sub.D/(B*2). If the rules having the same selector key are more than the depth of one bank in the CAM 110, the remaining rules are stored in backup CAM 710. For example, the CAM 110 is a 64×36 CAM and divided into four banks, each bank can store 16 rules. If there are rules after 16, say, the 17.sup.th rule, to be mapped to in Bank.sub.0 which already has 16 rules, it will be stored in the backup CAM 710.
(29) A Flag_bit is used to indicate whether there are overflowed rules stored in the backup CAM 710. If the Flag_bit is high, match lines generated from the bank overflow module 701 will be forwarded to the priority encoder 150.
(30) The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.
(31) The apparatuses and the methods in accordance to embodiments disclosed herein may be implemented using computing devices, computer processors, or electronic circuitries and other programmable logic devices configured or programmed according to the teachings of the present disclosure. Computer instructions or software codes running in the computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.
(32) All or portions of the methods in accordance to the embodiments may be executed in one or more computing devices including server computers, personal computers, laptop computers, mobile computing devices such as smartphones and tablet computers.
(33) The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.