Abstract
Encrypted storage often introduces unwanted latency in access. This delay can result in a processor having to wait for critical data thus slowing performance. Generally speaking, the latency is at most an issue when reading from encrypted storage, since the processor may need the information read from encrypted storage to proceed. During a write operation, there typically is not an issue because the processor does not need to wait for the end of the write operation to proceed. A variant of counter (CTR) mode for a block cipher can be used to perform the majority of the decryption operation without knowledge of the ciphertext, therefore the majority of the decryption operation can be performed concurrently with the retrieval of the ciphertext from memory. In order to further secure the encrypted storage, a light encryption can be performed to further obfuscate the ciphertext.
Claims
1. A encrypted storage device comprising: a memory; an encryption module; and an XOR module coupled to the memory and the encryption module; wherein the encryption module comprises: a counter function that converts a memory address and a nonce into an address based counter; and a block cipher that encrypts the address based counter into an address based pad; wherein encrypted data read from the memory at the memory address is XORed by the XOR module with the address based pad to produce unencrypted data and unencrypted data is XORed by the XOR module with the address based pad to produced encrypted data that is stored into the memory at the memory address, and wherein the nonce comprises a one-time random number shared between an encrypting party and a decrypting party; and further comprising a light encryption block cipher coupled to the XOR module and an input/output of the encrypted storage device; and wherein the light encryption block cipher comprises a plurality of parallel block ciphers having a block size smaller than the block size of the block cipher in the encryption module.
2. A method of reading from an encrypted storage having a memory at a memory address, said method comprising: receiving a nonce from an encrypting party; retrieving previously encrypted data from the memory at the memory address; concurrently to retrieving, determining an address based pad based on the memory address using the nonce; and decrypting the previously encrypted data to produce plaintext data, said decrypting comprising applying an XOR operation to the address based pad and the previously encrypted data to produce plaintext data.
3. The method of claim 2 wherein said decrypting further comprises transforming the previously encrypted data by applying decryption by a light encryption block cipher prior to applying the XOR operation.
4. The method of claim 3 wherein decrypting further comprising discarding a random pad.
5. The method of claim 2 wherein the light encryption block cipher comprises a plurality of parallel block ciphers having a block size smaller than the block size of the block cipher in the encryption module.
6. The method of claim 2 wherein the light encryption block cipher comprises a predetermined number of rounds of a confusion-diffusion cipher.
7. The method of claim 2 wherein the confusion-diffusion cipher is the advanced encryption standard (AES).
8. The method of claim 2 wherein the light encryption block cipher comprises a predetermined number of rounds of a Feistel Network.
9. The method of claim 2 wherein the Feistel Network is selected from a group consisting of tiny encryption algorithm (TEA), extended TEA (XTEA) and extended XTEA.
10. An encrypted storage comprising: a memory; a processor configured to retrieve previously encrypted data from the memory at a memory address: the processor further configured to retrieve an address based pad based on the memory address concurrently to retrieving encrypted data from memory; the processor further configured to receive a nonce from an encrypting party; and the processor configured to decrypt the encrypted data to produce plaintext data, the processor further configured to apply an XOR operation to the address based pad and the previously encrypted data to produce plaintext data.
11. The encrypted storage of claim 10 wherein the processor is further configured to apply a light encryption block cipher that transforms the previously encrypted data by applying decryption to the previously encrypted data prior to any XOR operation.
12. The encrypted storage of claim 10 wherein the processor is further configured to discard a random pad.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
(2) FIG. 1 illustrates a typical block cipher;
(3) FIG. 2A illustrates encryption of a plaintext message using ECB mode;
(4) FIG. 2B illustrates decryption of a ciphertext message using ECB mode;
(5) FIG. 3 illustrates the encryption and decryption of using CBC mode;
(6) FIG. 4 illustrates the encryption and decryption of using CFB mode;
(7) FIG. 5A shows CTR mode encryption system;
(8) FIG. 5B shows the corresponding CTR mode decryption system;
(9) FIG. 5C shows a common form of the counter used in CTR mode;
(10) FIG. 6A illustrates a typical encrypted storage system;
(11) FIG. 6B illustrates a data flow in an encrypted memory during a read from memory;
(12) FIG. 6C illustrates a data flow into an encrypted memory during a write to memory;
(13) FIG. 7A shows a construction of a counter block which is dependent on memory address and not a counter;
(14) FIG. 7B illustrates a method of reading from encrypted memory using an addressed based CTR mode;
(15) FIG. 8A illustrates a typical buffered write operation;
(16) FIG. 8B illustrates a write operation into memory encrypted using the address based CTR mode;
(17) FIG. 9A illustrates a block diagram of encryption using a cache;
(18) FIG. 9B illustrates a block diagram of decryption using a cache;
(19) FIG. 10A illustrates the construction of the lookup table 1010;
(20) FIG. 10B illustrates a block diagram of the encryption and storage using a lookup table;
(21) FIG. 10C illustrates a block diagram of the decryption and retrieval using a lookup table;
(22) FIG. 11 illustrates the construction of a counter block comprising a nonce and an address block where the address block is a subset of an address;
(23) FIG. 12A illustrates a block diagram of encrypted storage which overcomes the vulnerabilities introduced using address based pads;
(24) FIG. 12B illustrates the corresponding decryption system to the system in FIG. 12A;
(25) FIGS. 13A and 13B illustrates a block diagram of another alternate embodiment of encrypted storage;
(26) FIGS. 14A and 14B illustrates a block diagram of another alternate embodiment of encrypted storage;
(27) FIG. 15A illustrates one basic approach to deriving a light encryption algorithm;
(28) FIG. 15B illustrates decrypting the light encryption algorithm of FIG. 15A;
(29) FIG. 16A illustrates a typical block ciphers comprising a plurality of rounds;
(30) FIG. 16B illustrates the decryption of the block cipher encryption shown in FIG. 16A.
(31) FIG. 16C shows a lightened version of the block cipher shown in FIG. 16A.
(32) FIG. 16D illustrates the decryption of the block cipher encryption shown in FIG. 16C.
(33) FIG. 17A illustrates encryption in a Feistel network block cipher;
(34) FIG. 17B illustrates decryption in a Feistel network block cipher;
(35) FIG. 17C illustrates encryption in a lightened Feistel network block cipher;
(36) FIG. 17D illustrates decryption in a lightened Feistel network block cipher;
(37) FIG. 18A illustrates an encryption cycle of TEA which is two Feistel rounds; and
(38) FIG. 18B illustrates a decryption cycle using TEA.
DETAILED DESCRIPTION
(39) A detailed description of embodiments of the present invention is presented below. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure as defined by the appended claims.
(40) As described above, in certain applications, it is desirable to encrypt fast storage devices such as dynamic random access memory (DRAM) and static random access memory (SRAM). Chaining and feedback modes are not suitable for random access to data such as in random access memory (RAM). ECB can work, but introduces latency due to the need to encrypt and decrypt every block.
(41) Although CTR mode is often used as a stream cipher, where the counter block is of the construction described in FIG. 5C, it need not be a stream cipher if a different construction is used for the counter block. For example, if the memory address of a storage location is incorporated into the counter block, a CTR mode cipher can operate in a random access manner, that is, only the address of the memory is necessary to retrieve an encrypted block from memory or store a block to memory in an encrypted form.
(42) FIG. 7A shows a construction of a counter block which is dependent on memory address and not a counter. Counter block 702 comprises nonce 720 and memory address 722. For example, if the address space is 32-bit and the block cipher is 256-bit, a 224-bit nonce can be appended, prepended or otherwise combined to the memory address to form the counter block. For the purposes of this disclosure, the operation of constructing a counter block out of an address shall be referred to as a counter function (T.sub.i(A.sub.i)).
(43) FIG. 7B illustrates a method of reading from encrypted memory using an addressed based CTR mode. A request to read a given address is supplied to both memory 610 and counter function 702. An addressed based counter produced by counter function 702 is then encrypted by block cipher 104 which produces an address based pad. For the purposes of this disclosure an encrypted address based counter shall be referred to as an address based pad. In the meantime, during the encryption operation, the ciphertext retrieved from memory 610 where it can quickly be XORed with the address based pad to retrieve the data in plaintext. Typically, the amount of time the encryption takes is comparable to the amount of time it takes to retrieve data from a DRAM, so by parallelizing the calculation of the address based pad and the memory retrieval, no additional latency would be experienced by the processor in retrieving from encrypted memory. In the case of SRAM, the read times are faster and this would introduced additional latency.
(44) In regards to the write operation, FIG. 8A illustrates a typical buffered write operation. Because DRAMs are relatively slow compared to processor speeds, when a processor writes to DRAM, the data and address are buffered in write buffer 802, which typically comprises faster memory. When the write is posted to write buffer 802, the processor is free to return to other operations. Meanwhile, the data is transferred from the write buffer to memory 610.
(45) FIG. 8B illustrates a write operation into memory encrypted using the address based CTR mode. When a write request is given, the address and data is stored into buffer 804, the processor is then free to return to other operations. Meanwhile counter function 702 produces an address based counter which is then encrypted by block cipher 104 which produces an address based pad. The data is then retrieved from buffer 804 and XORed with the address based pad. The resultant cipher text is then written to memory 610 and may optionally first be stored in write buffer 802. However, depending on the implementation of the encryption block, write buffer 802 need not be necessary.
(46) If memory 610 is a faster memory, one method is to employ a cache of the address based pads. FIG. 9A illustrates a block diagram of encryption using a cache. To store a plaintext block to encrypted memory at address A.sub.i, cache 920 is checked to see if the address based pad corresponding to address A.sub.i is stored. If it is, that is a cache hit and that block is retrieved and XORed with plaintext block 202 (retrieved from buffer 804 in the example of FIG. 8) to generate ciphertext block 906. Ciphertext block 906 is then stored in memory at address A.sub.i. If there is no address based pad corresponding to address A.sub.i stored in cache 920, a cache miss, then counter function 702 produce an addressed based counter from address A.sub.i which is encrypted by block cipher 104 to produce the corresponding address based pad. The resultant address based pad is stored in cache 920 as corresponding to address A.sub.i and XORed with plaintext block 202 to generate ciphertext block 906.
(47) Similarly, FIG. 9B illustrates a block diagram of decryption using a cache. To retrieve a block in encrypted memory at address A.sub.i, ciphertext block 906 is retrieved from memory at address A.sub.i and cache 920 is checked to see if the address based pad corresponding to address A.sub.i is stored. If there is a cache hit and the address based pad is retrieved and XORed with ciphertext block 906 to retrieve plaintext block 202. If there is a cache miss, then counter function 702 produce an addressed based counter from address A.sub.i which is encrypted by block cipher 104 to produce the corresponding address based pad. The resultant address based pad is stored in cache 920 as corresponding to address A.sub.i and XORed with ciphertext block 906 to retrieve plaintext block 202. In general it is desirable though not necessary for the same cache to be used for encryption and decryption primarily because a block store is presumably needed for retrieval sometime in the future.
(48) Another method of improving the latency performance is to store the results of the encryption (E.sub.K(T.sub.i)) in a lookup table. FIG. 10A illustrates the construction of the lookup table 1010. Each address in the address space of the memory is converted to counter block 1002. Each counter block is then encrypted with block cipher 104 to produce an address based pad which is stored in lookup table 1010. FIG. 10B illustrates a block diagram of the encryption and storage using lookup table 1010. To store a plaintext block to encrypted memory at address A.sub.i, the address based pad corresponding to address A.sub.i, encrypted and stored in lookup table 1010 is retrieved and XORed with the plaintext 202 and stored to the memory as ciphertext block 1006. Ciphertext block 1006 is then stored at address A.sub.i in memory. FIG. 10C illustrates a block diagram of the decryption and retrieval using lookup table 1010. To retrieve a plaintext block from encrypted memory, ciphertext block 1006 is retrieved from address A.sub.i and XORed with the address based pad stored in lookup table 1010 corresponding to address A.sub.i to recover plaintext block 202.
(49) However, to store that many results would require a memory at least as large as that of the encrypted storage memory. This generally is not practical. And in essence a one-time pad would be cryptographically stronger and could employ the same amount of storage as the address based pads.
(50) One method of reducing the memory requirements is to use only part of the address to create a counter block. FIG. 11 illustrates the construction of counter block 1102 which comprises nonce 1112 and address block 1114, where address block 1114 is a subset of an address. For instance, if the memory uses 32-bits, address block 1114 could be the least significant 16-bits of the address. Of course any subset of the address could be used including the most significant bits or any internal combination of bits. For more obscurity it could be an arbitrary bit pattern such as the odd bits or every fourth bit.
(51) While the use of subsets of the address can dramatically reduce the amount of storage required in a lookup table, it does open up additional vulnerability. For the sake of example, suppose that 16 most significant bits of the address are used in a 32-bit address space, then 256 k addresses are associated with each counter block. If an attacker has access to both the plaintext block and ciphertext block, i.e., can read the memory as well as write to it, the attacker can deduce the encrypted counter block and be able to access all 256 k addresses. In general, this vulnerability allows an attacker to completely bypass the cryptographic power of the block cipher since knowing the encrypted counter blocks is sufficient for decrypting stored ciphertext, i.e., knowledge of the key or counter blocks are not necessary. Even if it were not necessary to reduce the number of address based pads as in the case of the system in FIG. 7B and FIG. 8B. Knowledge of both the cipher text and plaintext corresponding to a particular location in memory would make it possible to decrypt that given location in memory for any future ciphertext content. It would also allow deliberate modification of the ciphertext memory content, to represent specific plaintext.
(52) FIG. 12A illustrates a block diagram of encrypted storage which overcomes the vulnerabilities introduced using addressed based pads in general, including when subsets of addresses are used. Primarily, it thwarts the known plaintext vulnerability. When storing plaintext block 202 to address A.sub.i, the address based pad corresponding to address A.sub.i is retrieved or calculated and XORed with plaintext block 202. The resultant block is then encrypted with light encryption block cipher 1204 which performs a light encryption, to produce ciphertext block 1206 which is then stored in memory at address A.sub.i. FIG. 12B illustrates the corresponding system to retrieve a plaintext block at address A.sub.i, ciphertext block 1206 is retrieved from address A.sub.i and decrypted with block cipher 1208 which reverses the light encryption of block cipher 1204, the resultant block is then XORed with address based pad 1210 corresponding to address A.sub.i to obtain plaintext block 202.
(53) It should be noted that the use of light encryption block cipher 1204/1208 is equally applicable to the systems of FIGS. 7B and 8B, the use of caches or the use of lookup tables.
(54) The light encryption block cipher is any symmetric cipher, but emphasis is placed on speed and low latency rather than security. Depending on the application, light encryption could be a full blown block cipher or a mere obfuscation of the block. Examples of light encryption are discussed below.
(55) FIGS. 13A and 13B illustrate a block diagram of another alternate embodiment of encrypted storage. In the encryption path shown in FIG. 13A, prior to the light encryption, a random pad is applied by random padding module 1302. Block cipher 1304 is a wider block cipher that accommodates the padding. For example, if the original block size is 256-bits and a random pad of 16-bits is added, then block cipher 1304 should be 272-bit block cipher. Additionally, ciphertext block 1306 would be 272 bits while plaintext 202 is 256 bits. This would result in an increase storage requirement, but in exchange the security would be enhanced because the same plaintext block stored in at the same address would yield two different ciphertext blocks. In the decryption path shown in FIG. 13B, after retrieved ciphertext 1306 is decrypted by block cipher 1304, the padding is discarded by module 1310. The rest of the process is as described for FIGS. 12A and 12B. The manner of padding can be dependent on the nature of the light encryption. To make the random padding effective, the randomness should be diffused throughout the ciphertext. For example, if the light encryption employs a block cipher using a diffuse and confuse approach as described below in FIG. 16, appending or prepending a random pad to the plaintext prior to applying the light encryption would suffice. If the light encryption employs smaller block ciphers as described below in FIG. 15 interlacing the random pad with the plaintext prior to applying light encryption would provide better protection.
(56) FIGS. 14A and 14B illustrate a block diagram of another alternate embodiment of encrypted storage. Unlike FIG. 7A, a memory address is subdivided into two or more subsets to produce two address-based counter blocks 1402 and 1404. For example, counter block 1402 could comprise a nonce and the most significant 16 bits of a 32-bit address and counter block 1404 could comprise another nonce and the least significant 16 bits of a 32-bit address. Alternatively, the odd bits and the even bits could be used. In other variations, three subsets could be used such as the most significant 11 bits, the middle 10 bits and the least significant 11 bits. The combinations and divisions would no doubt be apparent to one of skill in the art. In keeping to the example in FIGS. 14A and 14B, the memory address is subdivided into two subsets to produce address-based counters 1402 and 1404. These counter blocks are encrypted by block cipher 104, these address-based pads can be stored in lookup tables similar to what has been described in FIGS. 12A and 12B to reduce latency and improve performance or could be computed in parallel on the fly in an implementation similar to FIG. 7B and FIG. 8B. In the encryption path, shown in FIG. 14A, the result of the encrypted counter blocks 1402 and 1404 are XORed together with plaintext block 202 to generate ciphertext block 1406 which is stored at address A.sub.i. Likewise, as shown in FIG. 14B, the decryption reverses the role of ciphertext and plaintext. This embodiment can also be used in conjunction either with the light encryption variants of FIGS. 12A and 12B and can also be used in conjunction with the light encryption with random padding of FIGS. 13A and 13B.
(57) A number of the above embodiments employ light encryption. While the term is somewhat subjective, examples can include 2 parallel AES-128 block ciphers when block size is 256 bits. However, advanced encryption standard (AES) even at 128 may still have significant latency, but does offer stronger security. FIG. 15A illustrates one basic approach to deriving a light encryption algorithm. Module 1502 splits the block into smaller subblocks. A plurality of block ciphers 1504a, 1504b, 1504c, 1504d, etc, encrypt each of these subblocks. The encrypted subblocks are then assembled by module 1506. Each block cipher can use a different key, e.g., K.sub.1, K.sub.2, K.sub.3, and K.sub.4 in the example shown. Since most block ciphers that are scalable tend to have a superlinear complexity, they tend to scale down with less complexity than simply a scale factor, e.g., cutting the block size in half reduces complexity by more than half, leading to a net latency and complexity savings. Furthermore, by using parallel block ciphers, the process is parallelizable for further performance improvements. There are a number of 64 bit block ciphers which can be used such as DES, Triple-DES, IDEA, CAST-128 (named for creators Carlisle Adams and Stafford Tavares), Mitsubishi Improved Security Technology (MISTY1), and KHAZAD. The Hasty Pudding cipher (HPC) and SKIP32 can operate on even smaller block sizes still. FIG. 15B illustrates decrypting the light encryption algorithm. Module 1552 splits the ciphertext block into smaller subblocks. A plurality of block ciphers 1554a, 1554b, 1554c, 1554d, etc decrypt each of these subblocks and module 1556 assembles the decrypted subblocks into the plaintext message.
(58) In addition, many conventional block ciphers can be lightened. For example, in FIG. 16A, block cipher 1600 is typical of many block ciphers, it comprises a plurality of rounds which comprise a diffusion operation and a confusion operation. The diffusion operation can comprise a permutation operation and the confusion operation can comprise a substitution operation. Each round can contain a different diffusion operation and a different confusion operation although they need not. Furthermore, the diffusion operation and/or confusion operation can be dependent on a key. In most block ciphers of this type at least the confusion operation in each round depends on the supplied key. Similarly, FIG. 16B illustrates the decryption of the block cipher shown in FIG. 16A. Essentially, the rounds are performed in reverse order and each round comprises an unconfusion operation and an undiffusion operation which acts in the reverse order to the confusion operation and the diffusion operation. This type of block cipher includes the category known as substitution-permutation block ciphers such as AES.
(59) In FIG. 16C, block cipher 1620 is a lightened version of block cipher 1600 and comprises a subset of the plurality of rounds. Naturally, the reduced number of rounds makes the cipher less secure, but the reduction also improves the latency of this lightened block cipher. The decryption counterpart to block cipher 1620 is shown in FIG. 16D as block cipher 1670.
(60) Another significant block cipher architecture are Feistel network block ciphers. FIG. 17A illustrates a basic example. To encrypt plaintext block 202, it is divided into two half blocks. Variants on the Feistel network divide the block into asymmetric subblocks. One of the two half blocks undergoes a function F which is based on a key K.sub.i in each round and combined with the other half block. In a subsequent round, each the other half block undergoes a function F which is based on key K.sub.i+1 and combined. For example, in round 1712a, the second half block has F applied to it with key K.sub.0 and the result is combined with the first half block by combiner 1708. In the next round 1712b, the first half block has F applied to it with key K.sub.1 and the result is combined with the second half block. In the round third and fourth round, the first half block is replaced by the results of the combination in round 1712a and the second half block is replaced by the results of the combination in round 1712b. The process repeats until the cipher is considered secure enough. Most standard ciphers proscribe the number of rounds required to meet their cryptographic specification.
(61) FIG. 17B illustrates decryption in a Feistel network block cipher. Basically, the rounds are performed in the reverse order, but instead of a combiner, uncombiner 1718 is used. For example, if an addition operation were used as a combiner, a subtraction operation is used as an uncombiner. Classically, an XOR is used as a combiner so that an XOR can also be used as an uncombiner.
(62) FIG. 17C illustrates how a Feistel network can be lightened. Basically, the cipher of FIG. 17C takes a subset of the plurality of rounds used in FIG. 17A. For example, DES specifies using 16 rounds. A lightened DES cipher could employ 4 rounds. This would cut the latency down by a factor of 4. FIG. 17D illustrates the corresponding decryption block diagram for the lightened Feistel network cipher.
(63) A specific example of a Feistel network which has been popular for fast operations is the tiny encryption algorithm (TEA). FIG. 18A illustrates an encryption cycle of TEA which is two Feistel rounds. TEA is a 64-bit cipher and employs a 128-bit key (some variations use multiple 128-bit keys) which can be represented by 4 32-bit keys K.sub.1, K.sub.2, K.sub.3, and K.sub.4. During each cycle, the second 32-bit half block is shifted left by four bits (1806) and added to K.sub.1(1812); the second 32-bit half block is also added to a constant, .sub.i(1804), where i represents an cycle index; the second 32-bit half block is also shifted right by five bits (1808) and added to K.sub.2(1814); and all three sums are XORed together (108). The result is added to the first 32-bit half block (1802). In the second round within the cycle, the result of the sum, now the first 32-bit half block, is shifted by four bits (1806) and added to K.sub.3(1816); the first 32-bit half block is also added to .sub.i (1804); the first 32-bit half block is also shifted right by five bits (1808) and added to K.sub.4(1818); and all three sums are XORed together (108). The result is added to the second 32-bit half block (1802).
(64) The .sub.i is used in each round is a different multiple of so that no bit of the multiple will not change frequently. Traditionally, the number is derived from the golden ratio and typically =2.sup.31(51).
(65) FIG. 18B illustrates decryption using TEA. The blocks function the same as during the encryption as described for FIG. 18A with the exception that block 1822 is used to subtract the result from the XOR from each half block.
(66) The recommendation for TEA is 32 cycles or 64 rounds. TEA can be lightened a small number of cycles. A reduction to 4 cycles can reduce latency by 8 fold. One can appreciate that the same approach can be applied to all of TEA's successors such as extended TEA (XTEA) and extended XTEA (XXTEA).
(67) For most modern block ciphers of the Feistel network, substitution-permutation network or other types, there is a recommended number of rounds to insure security which are usually published with the encryption standard. For example AES uses 10 rounds for AES-128 12 rounds for AES-192 and 14 rounds for AES-256, DES use 16 rounds, blowfish use 16 rounds. For the purposes of this disclosure, this shall be referred to as the recommended number of rounds for security.
(68) It should be emphasized that the above-described embodiments are merely examples of possible implementations. Many variations and modifications may be made to the above-described embodiments without departing from the principles of the present disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.