SYSTEM AND METHOD FOR CONCURRENT ENCRYPTION AND LOSSLESS COMPRESSION OF DATA
20230291417 · 2023-09-14
Inventors
Cpc classification
H04L9/00
ELECTRICITY
H03M7/3068
ELECTRICITY
H03M7/46
ELECTRICITY
H03M7/40
ELECTRICITY
G06F21/00
PHYSICS
International classification
H03M7/30
ELECTRICITY
H03M7/40
ELECTRICITY
Abstract
A system and method for concurrent encryption and lossless compression of data with an algorithm executing on a computer platform. The lossless compression component of the algorithm consists of preprocessing the data with a Burrows-Wheeler transformation followed by an inversion ranking transformation in advance of employing an entropy coder, such as binary arithmetic coder. The frequency vector of the Inversion Ranking transformation is then encrypted and transmitted along with the compressed data with only the frequency vector encrypted. Since the frequency vector is required for decompression, no further encryption of the compressed data is necessary to secure the compressed file.
Claims
1. A system for concurrent encryption and lossless compression of data, comprising: a computer platform configured to receive and transmit one or more data files; a frequency vector module resident on the computer platform that selectively calculates and stores an inversion frequency vector for encryption; a lossless compressor resident on the computer platform, including: a BWT module that selectively performs a Burrows-Wheeler transformation on one or more data files received at the computer platform, thereby creating one or more Burrows-Wheeler transformed files; an inversion coder that selectively performs an inversion ranking transformation on the one or more Burrows-Wheeler transformed files thereby creating one or more inversion files, the inversion coder in selective communication with the frequency vector module such that the frequency vector module calculates an inversion frequency vector for the one or more inversion files; a zero-run-length encoder that selectively compresses the one or more inversion files thereby creating one or more zero-run-length files; and an entropy coder that selectively compresses the one or more zero-run-length files to create one or more fully compressed files; and wherein the frequency vector module further encrypts the inversion frequency vector and joins the encrypted inversion frequency vector with the one or more fully compressed files, thereby creating one or more encrypted and compressed output files.
2. The system of claim 1, wherein the computer platform is in selective communication with a network and further configured to selectively transmit the one or more encrypted and compressed output files across the network.
3. The system of claim 2, wherein the computer platform selectively receives one or more encrypted and compressed output files from the network, and computer platform further includes a decompressor that decrypts and uncompresses the received one or more encrypted and compressed output files.
4. The system of claim 1, wherein the frequency vector module and lossless compressor are configured in firmware resident on the computer platform.
5. The system of claim 1, wherein the frequency vector module and lossless compressor are configured in software modules resident on the computer platform.
6. The system of claim 1, wherein the frequency vector module further encrypts the inversion frequency vector with an Advanced Encryption Standard (AES) Cipher Algorithm.
7. The system of claim 1, wherein the entropy coder is a binary arithmetic coder.
8. A method for concurrent encryption and lossless compression of data, comprising the steps of: receiving one or more files at a computer platform configured to receive and transmit one or more data files; compressing the received one or more data files by: performing a Burrows-Wheeler transformation on the received one or more data files received at the computer platform thereby creating one or more Burrows-Wheeler transformed files; performing an inversion ranking transformation on the one or more Burrows-Wheeler transformed files thereby creating one or more inversion files; calculating an inversion frequency vector for the one or more inversion files at a frequency vector module such that the frequency vector module, the frequency vector module resident on the computer platform and selectively calculating and storing the inversion frequency vector for encryption; compressing the one or more inversion files thereby creating one or more zero-run-length files; and compressing the one or more zero-run-length files at an entropy coder to create one or more fully compressed files; and encrypting the one or more fully compressed files by: encrypting the inversion frequency vector; and joining the encrypted inversion frequency vector with the one or more fully compressed files, thereby creating one or more encrypted and compressed output files.
9. The method of claim 8, wherein the computer platform is connected to a network, and further including selectively transmitting the one or more encrypted and compressed output files across the network from the computer platform.
10. The method of claim 9, further including: selectively receiving one or more encrypted output files from the network; and decrypting and uncompressing the received one or more encrypted and compressed output files at a decompressor resident on the computer platform.
11. The method of claim 8, wherein the steps are performed by firmware resident on the computer platform.
12. The method of claim 8, wherein the steps are performed by software modules resident on the computer platform.
13. The method of claim 8, wherein encrypting the inversion frequency vector is encryption with an Advanced Encryption Standard (AES) Cipher Algorithm.
14. The method of claim 8, wherein compressing the one or more zero-run-length files at an entropy coder is compression with a binary arithmetic coder.
15. A non-transitory computer readable medium containing program instructions for concurrent encryption and lossless compression of data, wherein execution of the program instructions by one or more processors of a computer system causes the one or more processors to carry out the steps of: receiving one or more files at a computer platform configured to receive and transmit one or more data files; compressing the received one or more data files by: performing a Burrows-Wheeler transformation on the received one or more data files received at the computer platform thereby creating one or more Burrows-Wheeler transformed files; performing an inversion ranking transformation on the one or more Burrows-Wheeler transformed files thereby creating one or more inversion files; calculating an inversion frequency vector for the one or more inversion files at a frequency vector module such that the frequency vector module, the frequency vector module resident on the computer platform and selectively calculating and storing the inversion frequency vector for encryption; compressing the one or more inversion files thereby creating one or more zero-run-length files; and compressing the one or more zero-run-length files at an entropy coder to create one or more fully compressed files; and encrypting the one or more fully compressed files by: encrypting the inversion frequency vector; and joining the encrypted inversion frequency vector with the one or more fully compressed files, thereby creating one or more encrypted and compressed output files.
16. The computer readable medium of claim 8, wherein the computer platform is connected to a network, and execution of the program instructions by one or more processors of a computer system further causes the one or more processors to carry out the step of selectively transmitting the one or more encrypted and compressed output files across the network from the computer platform.
17. The computer readable medium of claim 15, wherein execution of the program instructions by one or more processors of a computer system further causes the one or more processors to carry out the steps of: receiving one or more encrypted output files from the network; and decrypting and uncompressing the received one or more encrypted and compressed output files at a decompressor resident on the computer platform.
18. The computer readable medium of claim 15, wherein execution of the program instructions by one or more processors of a computer system further causes the one or more processors to carry out the step of encrypting the inversion frequency vector with an Advanced Encryption Standard (AES) Cipher Algorithm.
19. The computer readable medium of claim 15, wherein execution of the program instructions by one or more processors of a computer system further causes the one or more processors to carry out the step of compressing the one or more zero-run-length files with a binary arithmetic coder.
20. The computer readable medium of claim 15, wherein execution of the program instructions by one or more processors of a computer system further causes the one or more processors to carry out the step of configuring a compressor module and a frequency vector module.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DETAILED DESCRIPTION OF THE INVENTION
[0038] With reference to the figures in which like numerals represent like elements throughout the several views,
[0039] The lossless compressor 16 also includes a zero-run-length encoder 24 that selectively compresses the one or more inversion files thereby creating one or more zero-run-length files, and an entropy coder, shown here as a binary arithmetic coder 26, selectively compresses the one or more zero-run-length files to create one or more fully compressed files. The frequency vector module 14 further encrypts the inversion frequency vector and joins the encrypted inversion frequency vector with the one or more fully compressed files, thereby creating one or more encrypted and compressed output files 28.
[0040] In this embodiment, the forward algorithm starts with the input file 18 being processed with BWT at BWT module 20. Next, an inversion ranking is applied to the transformed data at inversion coder 22. The calculated inversion frequency vector ( is stored for later use in encryption. To complete the compression process, the output of the inversion coder 22 is compressed with the zero-run-length encoder 24 (RLE-0) and the context-modeled binary arithmetic coder 26. After the completion of the compression process, the stored inversion frequency vector is encrypted for transmission over a non-secure channel. Depending on the intended use of the application, one can choose an encryption algorithm from various standards to encrypt the inversion frequency vector, such as the Advanced Encryption Standard (AES) Cipher Algorithm (128, 192, and 256 bits).
[0041] The efficiency of the present algorithm lies in the fact that, instead of encrypting the entire input file 18, regardless of the file size, only a much smaller inversion frequency vector needs to be encrypted. For example, assume that the alphabet in the input data contains at most 256 symbols, ordered lexicographically, because the algorithm processes data one byte at a time. The alphabets of certain data types may contain fewer symbols. In most multimedia files, however, all 256 symbols are usually present. Since the number of entries of F is the same as the number of symbols in the alphabet, the inversion frequency vector F has 256 symbols as well. The entries of F are the frequencies of the symbols in the alphabet. Each entry of F is stored using 4 bytes and thus, the total size of F is 1024 bytes. Larger file sizes greater than 4 GB will necessarily cause the use of more than 4 bytes for each F entry.
[0042] The Burrows-Wheeler Transformation (BWT) is a well-known lossless data compression algorithm, such as is used in Bzip2, but it is not commonly used in encryption. The forward transformation, for a given data string, for example, is described below:
TABLE-US-00001 Forward Burrows-Wheeler Transformation Index M
[0043] The matrix M is constructed as follows: The first row of M is the original data ω. The succeeding rows are obtained by left-shifting the previous row. The resulting matrix M is displayed in the second column of Table 1. We next sort the rows of M lexically, as seen in the third column of Table 1 under the heading Ṁ . Finally, we transmit the last column:
[0044] of M along with the index i of ω in the sorted matrix Ṁ to the receiver (in this example, i=5, since ω is located at the fifth entry).
[0045] For the inverse BWT, upon receiving the last column ώ of Ṁ, and the index (row number) i of ω in Ṁ, we construct the original data string ω as follows. First, by sorting the elements of ώ, we obtain the first column Ċ of Ṁ . The characters in column ώ have an interesting property due to cyclical shifting. Each character in ώ is the prefix character of the corresponding element of Ċ. This relationship dictates a 1-1 mapping between the elements of ώ and Ċ. By locating the elements of Ċ in ώ from top to bottom, we obtain a permutation P which then is used to obtain the elements of ω with simply N = |ω| iterations. To obtain the data elements of ω, using the row index i of ω, we locate and print Ċ.sub.i in row i. Using the P.sub.i in permutation array P as the new index i (i= Pi) we print the next element of ω located in Ċ.sub.i. Repeating the process, we obtain all the elements of ω.
[0046] For our example, we can construct Table II using ώ as described above. Since ω is located in the first row (i = 5), our first element is Ċ.sub.5= 2. The next element’s index is P.sub.5 = 10. Thus, the element that follows Ċ.sub.5= 2 is Ċ.sub.10= 4, and since P.sub.10 = 7 the following element is Ċ.sub.7 = 3. The element that follows Ċ.sub.7 = 3 is located at P.sub.7 = 9 and is Ċ.sub.9 = 3. Repeating this process N = |ω| times, we obtain the original input string ω.
TABLE-US-00002 Inverse Burrows-Wheeler Transformation index
[0047] Inversion ranking (IR) transformation, also called “inversion coding” (IC), is commonly used to measure sortedness of a permutation or a sequence. It can also be employed effectively in lossless data compression. The encoding (forward) and decoding (backward or inverse) of IR can be demonstrated with the aid of a specific example.
[0048] Here, for illustration, we consider the specific input data vector:
which was the output of BWT from the previous section. We scan ώ and create the alphabet set A which includes all the symbols (characters) used in ώ:
[0049] As we scan ώ to construct A, we also collect the frequency of each character in A and store it in a vector F:
which is the frequency vector. The number of entries of F is the same as the number of symbols in the alphabet, and the sum of the entries of F is the length of the input data vector ώ.
[0050] Next, we compute four vectors g.sub.i, called the inversion rank vectors, for each character A.sub.i in the alphabet, as follows. The first entry of each vector g.sub.i is the position index of the first occurrence of the character A.sub.i in ώ:
[0051] For each character A.sub.i, we calculate the inversion rank (distance) between the A.sub.i and the next A.sub.i in ώ, where the inversion rank is the number of elements that are greater than A.sub.i between two consecutive A.sub.is in ώ. This way, we obtain all the entries of the four vectors:
[0052] Note that the entries of the frequency vector F are the lengths of the vectors g.sub.i. After calculating the inversion ranks for each character, we concatenate all the g.sub.i vectors to obtain the inversion vector:
[0053] As the output of the IR process, the algorithm provides the alphabet A, the frequency vector F, and the inversion vector ω. From F, we can determine the size of ω, and from ω we can identify all the g.sub.is.
[0054] To decode, one needs A, F and ω. With F, we extract the inversion rank vectors g.sub.is from ω:
[0055] The first element of each inversion rank vector g.sub.i is the position of the first occurrence of the character A.sub.i in the input vector ώ, as shown below:
[0056] Using this information, the decoding process starts by recovering the original elements of the input vector. For example, in the vector g.sub.1, the rank 1 following index 2 represents that there is one character greater than the current symbol “1” located at the 2.sup.nd position in ω. Hence, we skip one position and insert the character “1” in ώ to the 4.sup.th position:
[0057] Similarly, since the third value in g.sub.1 is 1, we insert the third “1” after skipping one location to the 6.sup.th position in ώ:
[0058] After inserting “1″s, we process other characters, for example “2”, using g.sub.2. The second value in g.sub.2 is 1, which means that there is one element bigger than the “2” located at position 7. Hence, the second “2” is located at the 9.sup.th position in ω . Similarly, since the third value in g.sub.2 is 1, we skip one location in ώ and insert the third “2” to 11.sup.th position. By following the process as described above, one recovers all the elements of the input vector ώ. It is evident from the foregoing example that without the knowledge of the correct frequency vector F, decoding cannot proceed. The encryption component of in the system, wherever resident and however embodied, encrypts F for transmission over a non-secure channel. Since F is much smaller than the data string, this results in an efficient encryption.
[0059] A Burrows-Wheeler Inversion Coder (BWIC) is a general-purpose lossless data compressor. The algorithm consists of four main stages, all of which are reversible. In the first stage, data is transformed with the lexical block-sorting algorithm BWT. During the transformation, while the histogram and size of data do not change, the order of data is re-organized by collecting similar values to increase redundancy. The lexical sorter used in this stage requires O(N log N) to complete the transformation. In the second stage, the transformed data is processed with the IC to exploit redundancies. The IC process requires O(N log N) time.
[0060] In the third stage, the data pre-processed with BWT and inversion ranks is encoded with the zero-run-length-encoder (RLE-0). The RLE-0 plays an essential role in reducing a large number of zeros produced by inversion ranks in the previous stage and prepares the transformed but uncompressed data for the entropy coder in the next stage. The RLE-0 requires O(N) time to complete. The final stage of BWIC is the utilization of an entropy coder. A context-modeled binary arithmetic coder achieves the best compression gain. This stage requires O(N) time. Thus, the BWIC process requires O(N log N) time.
[0061] The effectiveness of BWIC as a general-purpose lossless data compression has been demonstrated in various studies. For example, on a set of gray-scale X-ray images, gains over JPEG 2000 (4.1%) and JPEG-LS (5.1%) were reported. In a more recent study, BWIC was shown as also effective in compression of various colored medical images with better compression gains over JPEG 2000 and JPEG-LS, 20.4% and 5.6%, respectively.
[0062] To demonstrate the performance of BWIC on audio files, an audio test data set was used including sixteen speech signals in American English, spoken by eight male and eight female speakers, that are recommended by the International Telecommunication Union (ITU) for speech transmission testing. These and other speech signals in nineteen additional languages are provided as an appendix to Recommendation ITU-T P.50. The remaining entries in the audio test set are several classical and popular music snippets. The efficiency of BWIC on audio files over other well-known generic compressors is evident from Table 3. Specifically, BWIC achieves better compression gains, 15.8%, 15.7%, and 1.4% over Zip, Gzip, and Bzip2, respectively.
[0063] Table 3 is a comparative compression gains of Zip, Gzip, Bzip2, and BWIC on a test set of audio files:
TABLE-US-00003 Audio Files(wav) File Size Zip Gzip Bzip2 BWIC A_eng_f1 385,396 274,117 273,519 219,717 213,949 A_eng_f2 369.454 273,838 273,281 226,924 223,284 A_eng_f3 328,650 234,560 234,003 188,384 182,572 A_eng_f4 318,560 251,322 250,765 219,810 217,169 A_eng_f5 420,494 313,847 313,290 257,723 255,459 A_eng_f6 419,882 310,897 310,340 256,007 253,110 A_eng_f7 405,068 298,491 297,934 248,759 245,826 A_eng_f8 443,894 328,311 328,176 269,404 265,928 A_eng_m1 405,416 286,423 285,866 234,414 231,419 A_eng_m2 327,650 250,455 249,898 205,657 202,321 A_eng_m3 313,396 239,311 238,754 204,001 200,812 A_eng_m4 355,492 262,257 261,700 221,653 219,520 A_eng_m5 467,950 350,074 349,559 294,982 290,863 A_eng_m6 462,800 348,126 347,569 295,746 291,981 A_eng_m7 436,848 330,731 330,174 276,019 271,629 A_eng_m8 446,292 334,342 333,734 280,637 276,298 ImperialMarch60 2,646,044 2,327,385 2,327,207 1,998,462 1,982,764 PinkPanther60 2,646,044 2,250,574 2,250,398 1,846,584 1,817,867 StarWars60 2,646,044 2,495,370 2,495,197 2,294,905 2,258,142 Total F. Size 4,245,374 11,760,431 11,751,374 10,039,188 9,900,913 Ratio 0.826 0.825 0.705 0.695
[0064] The performance of BWIC on large text files demonstrates that BWIC achieves 3.8% better compression than Bzip2 on the Calgary Corpus, and on the larger files in the Canterbury Corpus, the improvement increases to 10.8%. An executable code of BWIC compressor and decompressor are commonly available.
[0065] The gains in the run-times of the encryption part of present algorithm are demonstrated such as by the Kodak image 50 in
[0066]
[0067]
[0068] The encryption component of the concurrent encryption/compression algorithm consists of encrypting the unique inversion frequency vector of an input file for secure transmission. Without the correct inversion frequency vector, the decompression cannot be performed to recover the original file. However, there remains the possibility of guessing or reconstructing the inversion frequency vector of an input file through malicious attacks. Encrypted files created under the present concurrent encryption/compression system, however, are secure against various commonly such-employed attacks, e.g., brute force, histogram, correlation, etc. In particular, the keyspace is sufficiently large and to demonstrate the key sensitivity.
[0069] For reference, one can refer to the inversion frequency vector of an input file as the key. This should not be confused with an encryption key used by a standard encryption algorithm (e.g. AES) for the purpose of secure transmission of an inversion frequency vector.
[0070] Brute force attack is one of the most commonly employed attacks where the attacker employs an exhaustive procedure to try every possible key. To guard against brute force attack, the number of possible keys must be large. A mathematical analysis of the keyspace of the present invention demonstrates the robustness of the encryption. For input files, we will use an alphabet of 256 symbols and order the alphabet as an increasing sequence according to the values of the symbols. Let ώ be an input data string (plaintext) of N symbols from the alphabet for IC. We will denote the length of ώ by N. The output string (ciphertext) ω will also be a data string of length N. The only piece of information one can extract from the ciphertext is the sum of the entries of the inversion frequency vector (key)
[0071] If some of the symbols of the alphabet are not used in the plaintext ώ, then the corresponding entries of F will be zero. We will consider two scenarios for guessing F by brute force depending on the absence or presence of zero entries.
[0072] In the first scenario, we assume that the data length is large compared to the number of symbols used in the alphabet and that all the symbols of the alphabet appear in the data file. Therefore, all of the entries of F will be nonzero. To guess F by brute force, one must try all frequency vectors of length 256 whose entries sum to N.
[0073] To enumerate all possible such F, we will use the theory of compositions of positive integers. The notion of compositions of a positive integer is related to the well-known notion of partitions of a positive integer. However, in partitions, the order of summands of partitions does not matter. In compositions, the order does matter.
[0074] For this example, let N and k be positive integers with k ≤ N. A sequence of positive integers a.sub.i > 0 for i = 1, ..., k is called a composition of N with k parts if:
[0075] Thus, a formula for the number of possible k-part compositions is: For positive integers N and k with k ≤ N, the number of compositions of N into k parts is:
[0076] As an example, we consider a gray-scale image file of size 768×512. Since there are 393216 pixels in total, the length of the file is N = 393216. Moreover, because of our assumption that all 256 symbols in the alphabet are used in the image file, we take k = 256. From the theorem above, the number of k-part compositions of N is:
which is larger than 2.sup.2048. The number of necessary trials to find the correct N by a brute force attack is indeed enormous.
[0077] In the second scenario, we continue to assume that the data length is large compared to the number of symbols in the alphabet but some of the symbols are not used in the data file. In this case, some of entries of F will be zero. To enumerate all possible such keys, we will use the theory of weak compositions of positive integers. Note that the notion of weak compositions of a positive integer is similar to that of compositions with the additional assumption that zero is permitted as a summand.
[0078] Thus, let N and k be positive integers with k ≤ N. A sequence of nonnegative integers a.sub.i > 0 for i = 1, ..., k is called a weak composition of N with k parts if:
[0079] As a toy example, let us consider the alphabet A = {1, 2, 3, 4} consisting of four symbols and the plaintext ώ = [2, 4, 1, 4, 1]. The corresponding ciphertext is ω = [3, 1, 1, 2, 0] with the frequency vector F = [2, 1, 0, 2]. To guess F by brute force, one must try all weak compositions of N = 5 with 4-parts. The number of such parts can be determined as follows: For positive integers N and k with k ≤ N, the number of weak compositions of N into k parts is
[0080] For our toy example, where N = 5 and k = 4, the number of possible frequency vectors is To find the correct F by brute force, one can, of course, try all 56 weak compositions. However, there is another more economical mode of brute force attack. The idea in this second mode of attack is to try frequency vectors with positive entries while increasing their lengths. The advantage of this form of attack is that there are fewer possible F to try, but the downside is the loss of information about the missing symbols in the plaintext.
[0081] As an example of this form of attack on our toy example. For k = 1, there is only one frequency vector F = [5] to try, which fails. For k = 2, there are 4 such vectors, [4, 1], [3, 2], [2, 3], [1, 4], and they all fail. For k = 3, there are 6 such vectors, [3, 1, 1], [2, 2, 1], [2, 1, 2], [1, 3, 1], [1, 2, 2], [1, 1, 3], and the frequency vector [2, 1, 2] among them can be used to decode the ciphertext. Note that so far the attacker tried 11 different frequency vectors. Once the attacker finds the corresponding frequency vector for decoding the ciphertext, the attacker still needs to determine which 3-element subset of the alphabet is used. There are 4 such possible subsets. In sum, while the first mode of attack required 56 frequency vectors, this second mode of attack requires only 15 frequency vectors. It is now evident that this second mode of attack is more efficient than the previous one for the toy example.
[0082] However, there is a general computational cost of the second mode of attack. For a given ciphertext ώ of length N, one starts trying frequency vectors with positive entries and length k, and increase k from 1 until one finds an appropriate frequency vector to invert the ciphertext ω. These frequency vectors will be k-part compositions of N. If we assume that the successful frequency vector is a k-part composition of N, then the number of frequency vectors the attacker must try is
[0083] A successful frequency vector of length K reveals that plaintext uses only symbols of the alphabet. However, it is not feasible to infer from the ciphertext ω the location of the zero entries of F. Therefore, the attacker must try all
K-elements subsets of the alphabet to determine the plaintext. In typical intended use of the concurrent encryption and compression algorithm, data files, e.g. image and audio, are usually large and most of the symbols in the alphabet are used. As anexample, let us consider an image file with 393216 pixels and suppose that 246 of the 256 symbols are used. In this case,
and
[0084] It is evident from the foregoing considerations that the key space of our present algorithm is sufficiently large to resist brute force attacks.
[0085] Furthermore, an effective cryptosystem must exhibit key sensitivity. There are two considerations for key sensitivity testing: (i) Encryption of plaintext with a set of keys with minor differences should generate vastly different ciphertext. This is not an issue for the present encryption process because each plaintext generates its unique key (frequency vector); (ii) Decryption of ciphertext with keys that are arbitrarily close to the correct key should not reveal any recognizable feature of the original plaintext. This is addressed in the present system.
[0086] Since the sum of the entries of the correct key is known, one can use the following two small perturbations of a correct key to test key sensitivity: swap: A randomly chosen entry of the correct key is swapped with an adjacent entry. ±1 The value of a randomly chosen entry of the correct key is increased by 1, while the value of an adjacent entry is decreased by 1.
[0087] Although both BWT and IC are invertible transformations with the correct key, they cannot be inverted with certain incorrect keys. This is true for a substantial number of perturbations, including certain small perturbations described above. It does not seem feasible to determine theoretically if a particular perturbation of the correct key results in noninvertibility. In any case, when either BWT or IC cannot be inverted with a particular incorrect key, the attacker can garner no information about the original plaintext.
[0088] To demonstrate the key sensitivity of our algorithm on image files, four well-known images in
[0089]
[0090] The table contains the PSNR and SSIM values of the RGB channels of the original image versus. the decrypted images with swap and ±1 perturbations of the correct key. As can be seen in the same figure, the PSNR and SSIM values of the original image, as against the ones decrypted with perturbed keys are too low to reveal any recognizable features of the original image in the wrongly decrypted ones.
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099] It should be evident from the foregoing figures that even the smallest possible change in a correct key value results in an entirely different decrypted image than the original one. The PSNR and SSIM values demonstrate that the decrypted images are comparable to randomly generated images.
[0100] To demonstrate the key sensitivity of the present algorithm on audio files, we used speech signals in American English that are recommended by the International Telecommunication Union (ITU) for speech transmission testing. As we have already indicated, testing for key sensitivity for our algorithm amounts to demonstrating that decryption of ciphertext with keys that are arbitrarily close to the correct key should not reveal any recognizable feature of the original plaintext. This entails measuring the differences between two audio files. There are various commonly used metrics for this purpose, e.g., waveform plotting, spectral similarity analysis, Pearson Correlation, Mean Square Error (MSE), Unified Average Changing Intensity (UACI), etc.
[0101]
[0102] Here, the key sensitivity test of our algorithm on the audio file A_eng_m5.wav (ITU test speech in American English by a male speaker) uses four metrics: waveform, spectrogram, Pearson Correlation, and MSE. In
[0103] The “histogram” of a file is simply the distribution of the symbols in that file. A histogram attack is one of the most commonly used statistical model-based attacks in which the attacker attempts to predict relationships of some data segments between the original and the encrypted data from the distribution measures. The present algorithm of the system and method is resistant to histogram attacks, as demonstrated by an attack on image files. The Histogram analysis of our test audio files in Table 3 yielded experimental results similar to those for images.
[0104] In
[0105] In general, secure encryption is expected to randomize the input data, and thus the histogram of an encrypted image is expected to have a uniform distribution of pixel values. The present algorithm does not randomize the input data, but it introduces new values that can be much larger than the pixel values (0-255). However, the two histograms of encrypted images of two vastly different pictures appear to be nearly identical, compare histogram 114 with histogram 118. To quantify this visual observation, we used the nonlinear curve fitting algorithm Exp2 of MATLAB, which employs the double exponential function:
with four parameters, a, b, c, and d. The resulting parameter values and the fitted nonlinear curves are depicted in
[0106]
[0107] It is evident that the parameter values for the histograms of the encrypted two different images are indeed nearly identical. All the encrypted pictures in the set have similar histograms. Encryption of the other image sets used in this paper, as well as random images, also exhibit double exponential distributions, albeit with slightly different coefficients. Therefore, it is unlikely that a histogram attack can succeed in extracting meaningful information about image files encrypted with the present technique.
[0108] For a correlation analysis, adjacent pixels are often correlated in most images. To guard against statistical attacks, adjacent pixels of ciphered images must be decorrelated. The degree of correlation of adjacent pixels can be quantified with the Pearson correlation coefficient.
[0109] The Pearson correlation coefficient of two random variables A and B can be computed with the formula:
where .Math..sub.A and .Math..sub.B are the means and σ.sub.A and σ.sub.B are the standard deviation of the random variables. Equivalently, the correlation coefficients can be calculated with the covariance of A and B as:
The absolute value of the correlation coefficient ranges from 0 to 1, where 0 indicates complete decorrelation and 1 indicates total correlation, e.g. ρ(A, A) = 1.
[0110] To gauge the success of our encryption algorithm in decorrelating adjacent pixels in cipher images, we performed the following experiment with the results shown in
[0111] The table 136 contains the correlation coefficients of four plain and cipher images of adjacent pixels on RGB channels, show are red correction graph 130, green correlation graph 132, and blue correlation graph 134. It is evident that while the pixels of the original images are highly correlated, their ciphered counterparts are decorrelated. It is clear from these values that the present encryption algorithm successfully decorrelates the highly correlated adjacent pixels in all channels.
[0112] To utilize the present algorithm over a non-secure channel, one can choose an encryption algorithm to encrypt the frequency vector. Depending on the intended use of the application, one can choose an encryption algorithm from various standards. For example, the Advanced Encryption Standard (AES) Cipher Algorithm in Cipher Block Chaining (CBC) mode. The official specification for AES issued by the National Institute of Standards and Technology (NIST) are commonly available. One can use the open source implementation of AES by OpenSSL. A unique 256-bit encryption key and 128-bit initialization vector (IV) are generated using gen_params() method of libcrypto library of OpenSSL in each run. The auto-generated encryption key and IV are stored in a file as 48 = 32 + 16 bytes. As shown in the system
[0113]
[0114] In this embodiment, there is an encryption module 154 the implements AES. The encryption module 154 includes a key generator 156 and AES/CBC encryptor 158. The frequency vector is then encrypted with AES (104) bytes (module 160) and then output with the compressed file 168.
[0115] Returning to the lossless compressor 148, there also is a zero-run-length encoder 162 that selectively compresses the one or more inversion files thereby creating one or more zero-run-length files, and an entropy coder, shown here as a binary arithmetic coder 164, selectively compresses the one or more zero-run-length files to create one or more fully compressed files. The frequency vector module 146, or other device, further joins the AES encrypted inversion frequency vector with the one or more fully compressed files (Key & IV 166), thereby creating one or more encrypted and compressed output files 168.
[0116]
[0117] On computer platform 142, which can be the same platform for the compressor 148 in
[0118] The run-time gains from the encryption of the present invention is readily apparent over that of encryption algorithms utilizing either EtC or CtE algorithms. To determine the growth rate of the execution time of AES as a function of the input file size, one can use the OpenSSL random number generator, one dozen random data files ranging in size from 0.1 MB to 50 MB and measured the encryption times of these files with AES. To measure the running times of AES algorithm accurately, these measurements were repeated several times and calculated their averages.
[0119] As seen in
[0120] Gains in encryption times are significant for large files. For instance, while it takes 490 ms to encrypt a 50 MB file with AES (on a system with macOS 10.13.6, 2.6 GHz Intel I7-4960HQ, 16 GB RAM), encrypting a 1024-byte frequency vector requires only 0.39 ms. Decryption times of AES are comparable to those of encryption.
[0121] The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of one or more aspects of the invention and the practical application, and to enable others of ordinary skill in the art to understand one or more aspects of the invention for various embodiments with various modifications as are suited to the particular use contemplated.