Efficient encryption method to secure data with reduced number of encryption operations
10554389 ยท 2020-02-04
Assignee
Inventors
Cpc classification
G09C1/00
PHYSICS
H04L9/00
ELECTRICITY
H04L9/065
ELECTRICITY
H04L2209/34
ELECTRICITY
International classification
H04L9/06
ELECTRICITY
G09C1/00
PHYSICS
H04L9/00
ELECTRICITY
Abstract
The invention is particularly related with an encryption method that aims to secure an input data with reduced number of encryption operations. The invention claim the security by using the hardness of the unique decodability of the variable-length non-prefix-free (NPF) codes. The proposed encryption system is operated according to these steps, compression of the input data and coding with the introduced NPF encoding that splits the independent and identically distributed (IID) data to the two stream as code-word boundaries information data, which has all information of boundaries of NPF code words stream, and NPF code words stream which has all payload of encoded IID data except code-word boundaries information and then encrypting the code-word boundaries information without NPF code words stream.
Claims
1. An encryption method, comprising compression of input data before all encoding operations, wherein the method further comprises the following steps: encoding with non-prefix free encoding operation of compressed input data; splitting the encoded compressed input data to two streams as code-word boundaries information data which has all information of boundaries of non-prefix free code words stream, and non-prefix free code words stream which has all payload of the encoded compressed input data except the code-word boundaries information; encryption of the code-word boundaries information data without the non-prefix free code words stream; decryption of the encrypted code-words boundary information data without the non-prefix free code words stream; decoding the non-prefix free coding of the code-words stream according to the decrypted code-word boundary information data; and generating the input data by decompressing the extracted compressed input data.
2. An encryption method, comprising taking input data where a distribution of symbols in the input data is identically and independently distributed, wherein the method further comprises the following steps: encoding with non-prefix free encoding operation of independent and identically distributed input data; splitting the encoded independent and identically distributed input data into two streams as code-word boundaries information data which has all information of boundaries of non-prefix free code words stream, and non-prefix free code words stream which has all payload of the encoded independent and identically distributed input data except the code-word boundaries information; encryption of the code-word boundaries information data without the non prefix free code words stream; decryption of the encrypted code-words boundary information data; decoding the non-prefix free coding of the code-words stream according to the decrypted code-word boundary information data; and generating the independent and identically distributed input data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The data encryption method that is provided in order to reach the aims of the invention has been illustrated in the attached figures. According to the figures;
(2)
(3)
(4) The parts in the figures have been numbered as follows; 11. Input Data 12. Compression 13. Compressed data 14. IID data 15. NPF encoding 16. Code-word boundaries information 17. Code-word stream 18. Encryption 19. Encrypted code-word boundary information 20. Secured data
DETAILED DESCRIPTION OF THE EMBODIMENTS
(5) The proposed technique processes the n bits long IID data (4) in d-bits long blocks, and replaces every d-bits long block by its corresponding variable-length non-prefix-free code-word generated according to the introduced new coding scheme. The IID data (4) is obtained from input data (1) by compressing (2) it. When input data (1) can be assumed to be independently and identically distributed, the IID data (4) is directly the input data (1) itself omitting the compression (2) step.
(6) A typical block length parameter d is selected such that nd.2.sup.d. The IID data (4) is encoded with the introduced method of variable-length non-prefix-free coding scheme (NPF) (5). The NPF encoding (5) step creates two streams of data. The first stream of is the code-word stream (7), where each fixed d-bits long block of the IID data (4) is NPF encoded (5) with a code-word of varying bit-length. The second stream is the code-word boundary information (6) that specifies the lengths of each code-word in the code-word stream (7).
(7) The NPF code-words set is generated without the prefix-free restriction so that a code-word may be a prefix of another. Thus, non-prefix-free (NPF) code words stream (7) is ambiguous and cannot be decoded without the knowledge of the code-word boundaries information (6). The lack of unique decodability without the code-word boundaries information (6) in NPF encoding (5) provides a unique opportunity in terms of security. Therefore, encrypting (8) the code-word boundary information (6) with an appropriate encryption algorithm is sufficient to provide the security of the input data (1).
(8) The code-words stream (7) is kept intact as it cannot be decoded without the code-word boundary information (6), that is already encrypted (8).
(9) Generally, the security of a system depends on the hardness of solving a mathematical question such as the discrete logarithm or the factorization problems. In this invention it is made use of the hardness of decoding a sequence NPF encoded (5) with variable-length non-prefix-free codes in case the knowledge of the code-word boundaries information (6) is missing. The hardness of cipher-text only attacks on non-prefix-free codes and even on prefix codes had been previously studied in the literature (CITATIONs from the paper).
(10) To examine in detail, A=a.sub.1a.sub.2 . . . a.sub.n is an independently and identically distributed (i.i.d) random bit sequence (4), and d>1 is an integer indicating a bit block length. Without loss of generality, it is assumed n is divisible by d as it is always possible to pad the A with necessary number of randomly generated bits if this is not the case.
(11) The number of d-bits long blocks in A is r=n/d and A can be represented as B=b.sub.1 b.sub.2 . . . b.sub.r, where each d-bits long b.sub.i in B, for 1Ir, is drawn from the alphabet ={0, 1, 2, . . . 2.sup.d1}. For instance, when d=3, the alphabet becomes ={000,001,010,011,100,101,110,111}, and ||=2.sup.d=8. Since A has been assumed to be an i.i.d random bit sequence, B can also be assumed i.i.d. over for reasonable d values selected according to the size of A.
(12) Definition 1.
(13) The minimum binary representation (MBR) of an integer i2 is its binary representation without the leftmost 1 bit.
(14) As an example, the minimum binary representation of 21 is MBR (21)=0101 by omitting the leftmost set bit in its binary representation as 21=(10101).sub.2.
(15) Definition 2.
(16) Let ={.sub.1, .sub.2, . . . , .sub.z} for z=2.sup.d be a random permutation of the alphabet ={0, 1, 2, . . . , 2.sup.d1}, and W={w.sub.1, w.sub.2, w.sub.z} is a code-word set such that
(17)
(18) Assuming an input sequence B=b.sub.1 b.sub.2 . . . b.sub.r, where 0b.sub.i2.sup.d d for all 1ir, the non-prefix-free encoding (4) of B is denoted by N P F (B)=w.sub.1+b.sub.
(19) The NPF encoding (4) of a sample sequence according to the Definitions 1 and 2 with the parameter d=3 is shown in Scheme 2. The code-words w.sub.1 and w.sub.5 are sets due to the corresponding values of .sub.i=6=2.sup.32 and .sub.5=7=2.sup.31. Thus, when b.sub.i=4 or b.sub.i=0, a randomly selected code-word respectively from sets w.sub.5 or w.sub.1, is inserted into the NPF (B) stream. At a first glance, using multiple code-words as an exception for the two .sub.i2.sup.d2 values might seem a bit strange. The reason for that is to avoid a possible statistical attack.
(20) Proposition 1.
(21) In a code-word set W that is generated for a block length d>1 according to Definition 2, the bit-lengths of the code-words are in {1, 2, . . . , d}. The number of l-bits long code-words for each l{1, 2, . . . , d1} is 2.sup.l, and there exist 2 sets of code-words for l=d each of which includes 2.sup.d1 code-words.
(22) TABLE-US-00001 = 0 1 2 3 4 5 6 7 = 6 0 5 1 7 2 4 3 = .sub.1 .sub.2 .sub.3 .sub.4 .sub.5 .sub.6 .sub.7 .sub.8 W = {000, 011, 100, 111} 0 11 1 {001, 010, 101, 110} 00 10 01 = w.sub.1 w.sub.2 w.sub.3 w.sub.4 w.sub.5 w.sub.6 w.sub.7 w.sub.8 A = 001 110 101 011 010 111 100 000 = 1 6 5 3 2 7 4 0 NPF (
) = w.sub.2 w.sub.7 w.sub.6 w.sub.4 w.sub.3 w.sub.8 w.sub.5 w.sub.1 = 0 10 00 1 11 01 001 000
(23) Scheme. 2.
(24) A simple sketch of the non-prefix-free encoding (4) of an input data (1) bit sequence. A is the input data (1), where B is the representation of A by assuming the block length d=3. is a randomly selected permutation of the corresponding alphabet , and W is the non-prefix-free code-word set generated for according to Definition 2.
(25) Proof.
(26) According to Definition 2, the entities in W are minimum binary representations of numbers {2, 3, . . . , 2.sup.d+11}. Since the MBR bit-lengths of those numbers range from 1 to d, there are d distinct code-word lengths in W. Each code-word length l{1, 2, . . . , d1} defines 2.sup.l distinct code-words, and thus, total number of code-words defined by all possible l values becomes
.sub.i=1.sup.d12.sup.i=2.sup.d2
(27) The remaining 2 code-words out of the |W|=2.sup.d items require d-bits long bit sequences. For example, when d=3, the |W| includes 2(=2.sup.1) code-words of 1-bit long, 4(=2.sup.2) code-words of length 2, and 2(=2.sup.36) code-word sets of length 3-bits as shown in Scheme 2.
(28) Lemma 1.
(29) Assuming B=b.sub.1 b.sub.2 . . . b.sub.r denotes an input sequence, where d-bits long bi values for 1ir and d>1 are independent and identically distributed over {0, 1, 2, . . . , 2.sup.d1}, the total bit length of the non-prefix-free code-words encoding the B sequence is
(30)
(31) Proof.
(32) Since B is i.i.d., each value from set {0, 1, 2, . . . , 2.sup.d1} appears r/(2.sup.d) times in B. As stated in Proposition 1, the code-word lengths for NPF encoding (4) in W vary from 1 to d such that there are 2.sup.l code-words for each l{1, 2, . . . , d1}, and l=d bits long code-words are included in 2 separately maintained sets. The number of b.sub.i values that are represented by each l{1, 2, . . . , d1} bits in NPF(B) is (2.sup.l.(r/(2.sup.d)) and thus their contribution to |NPF(B)| is (l.2.sup.l.(r/(2.sup.d)) bits. Since there are 2 code-word set options for length l=d, they produce (d.2.(r/(2.sup.d)) bits in NPF (B). Summing these up gives the total length of the NPF code-stream as:
(33)
(34) While computing the summation term in
(35)
it is used the formula from basic algebra that
.sub.i=1.sup.pi.2.sup.i=2.sup.p+1(p1)+2
and substitute p=d1. The input sequence B is originally r.d bits long, and the NPF encoding (5) reduces that space to
(36)
(37) However, since non-prefix-free codes are not uniquely decodable, NPF (B) cannot be converted back to the original B sequence in absence of the code-word boundaries information (6) on NPF (B). Thus, it is rigidly necessary that representing the code-word boundaries information (6) positions on NPF (B) for proper decoding. Lemma 2 states an efficient method to achieve this task.
(38) Lemma 2.
(39) Let B=b.sub.1 b.sub.2 . . . b.sub.r denotes an input sequence such that the b.sub.i values are i.i.d. over {0, 1, 2, . . . , 2.sup.d1}. The number of bits required to represent the code-word boundaries information (6) in the NPF (B) is
(40)
(41) Proof.
(42) There exists 2.sup.l code-words for each code-word length l{1, 2, . . . , d1}, and each code-word is substituted in place of r/2.sup.d items in B=b.sub.1 b.sub.2 . . . b.sub.r during the NPF encoding (5) of the B. For l=d1, the number of code-words with that length is
(43)
(44) and similarly, when l=d2, there appears
(45)
(46) code-words having length d2. As a generalization, for l=d2 where i={1, 2, . . . , d1}, there are
(47)
(48) code-words with length l on NPF (B). It is maintained (d1) bitmaps denoted by D.sub.1, D.sub.2, D.sub.d-1 such that D.sub.i is dedicated to the specific code-word length l=di for i=1 to i=d1. The bit sequence D.sub.i is of
(49)
bits long, in which the positions that correspond to (di)-bits long code-words are marked with 1, and rest to 0.
(50) For instance, on the first level bitmap D.sub.1 the positions of code-words with length l=d1 are specified by 1, and rest with 0. D.sub.1 contains r bits, and the number of zero and one bits are equal since there are
(51)
(52) items whose NPF code length is l=d1. The second bitmap D.sub.2 is r/2 bits long that are reserved for the items represented by 0 in D.sub.1. In D.sub.2, the positions whose length l=d2 are set to 1, and rest to 0. Same process is repeated on all bitmaps. The last bitmap D.sub.d-i becomes
(53)
bits long, on which the code-words of lengths l=1 and l=d are marked with 1 and 0 bits, respectively. The total number of bits in these (d1) bitmaps can be computed with
(54)
(55) TABLE-US-00002 NPF ( ) = 0 10 00 1 11 01 001 000 Code - word lengths = 1 2 2 1 2 2 3 3
.sub.1 = 0 1 1 0 1 1 0 0
.sub.2 = 1 1 0 0 1 1 0 0
(56) Scheme 3.
(57) Marking the code-word boundaries information (5) on the sample NPF (B) via the proposed method. The 2-bits long code-words are marked on the first bitmap D1 by 1 bits. The lengths for those that are represented by 0 on D1 are specified on the second bitmap D2, on which 1- and 3-bits long code-words represented, respectively, by 1 and 0 bits.
(58) Scheme 2 depicts the proposed method to mark the code-word boundaries information (6) on the sample NPF (B) introduced in Scheme 2 Since d=3 in that sample, there are d1=2 bitmaps. On the first bitmap D.sub.1 that is r=8 bits long, the positions represented with 1 bits indicate that the corresponding code-words that are d1=2 bits long. The second bitmap D.sub.2 is r/2=4 bits long, on which the positions with bit 1 mark the (d2)=1 bit long code-words, and the positions with 0 bit indicate the d=3 bits long code-words. In total
(59)
bits are used.
(60) Theorem 1.
(61) Let B=b.sub.1 b.sub.2 . . . b.sub.r be an input sequence, where the bi values, for 1ir, are i.i.d. over ={0,1,2, . . . 2.sup.d1}, and represented by d-bits. The total bit length of the NPF encoding (4) of the sequence 13 with the code-word boundaries information (5) specification information is:
(62)
(63) Proof.
(64) Sum of the space consumption described in Lemmas 1 and 2 returns this.
(65)
(66) The NPF encoding (5) of the input sequence 13, which is originally r.Math.d bits long, introduces an overhead that requires
(67)
(68) more bits as seen on Theorem 1. The number of extra bits per each original bit in B can be computed with
(69)
(70) TABLE-US-00003 TABLE 1
(71) Table 1 summarizes the amount of extra bits introduced by the proposed NPF encoding (5) per each original bit IID data (4). Thanks to the exponentially increasing denominator (2.sup.d1) in the denominator of the equation
(72)
(73) that the extra space consumption quickly becomes very small, and even negligible with larger d values. When d=8, the proposed NPF encoding (5) method produces only 6.8 extra bits per a thousand bit. Similarly, the overhead becomes less than 3 bits per 100K bits, and less than 2 bits per a million bit for the values of d=16 and d=20, respectively. Thus, while selecting d, one needs to choose a high d value not to spend more space, where as a minimum d=8 seems to be useful in practical applications.