Protocol for lightweight and provable secure communication for constrained devices
10601443 ยท 2020-03-24
Assignee
Inventors
- Reza Tourani (Las Cruces, NM, US)
- Satyajayant Misra (Las Cruces, NM, US)
- Scott Ortegel (Las Cruces, NM, US)
- Travis Mick (Albuquerque, NM, US)
- Vicente Ibarra (Las Cruces, NM, US)
Cpc classification
H04L9/0861
ELECTRICITY
H03M7/42
ELECTRICITY
H04L9/0825
ELECTRICITY
H04L9/30
ELECTRICITY
H04L9/002
ELECTRICITY
International classification
H04L9/08
ELECTRICITY
Abstract
A method of sending content comprising receiving a membership request from a client at an anonymizer, the membership request being encrypted with a public key of the anonymizer, generating a table from a prefix-free source coding scheme with a full binary tree, a pseudonym range, and a master key, sending the table, the pseudonym range, and the master key, all encrypted with a public key of the client, receiving a content request with an encoded content name, the content request being encoded using the table, a pseudonym from the pseudonym range, and the master key, decoding the content name of the content request using the pseudonym, the table, and the master key, retrieving content corresponding to the content name, and sending the content and the encoded content name. Secure information sharing is also provided for.
Claims
1. A method of sending content, the method comprising the steps of: receiving a membership request from a client at an anonymizer, the membership request being encrypted with a public key of the anonymizer; generating at the anonymizer a table from a prefix-free source coding scheme with a full binary tree, a pseudonym range, and a master key; sending to the client from the anonymizer the table, the pseudonym range, and the master key, all encrypted with a public key of the client; receiving a content request with an encoded content name from the client at the anonymizer, the content request being encoded using the table, a pseudonym selected from the pseudonym range and hashed with the master key to generate a hashed value, and the master key; decoding the content name of the content request using the pseudonym, the table, and the master key; retrieving content corresponding to the content name; and sending to the client from the anonymizer the content and the encoded content name.
2. The method of claim 1, wherein the table is a Huffman table.
3. The method of claim 1, wherein the encoded content name includes a domain name of the anonymizer.
4. The method of claim 1, wherein the encoded content name includes the content name encoded with the table.
5. The method of claim 4, wherein the encoded content name includes the content name encoded with the table and XOR'd with the hashed value.
6. The method of claim 1, wherein the table is randomly generated.
7. A non-transitory, computer readable medium comprising code for sending content, the code comprising: code receiving a membership request from a client at an anonymizer, the membership request being encrypted with a public key of the anonymizer; code generating at the anonymizer a table from a prefix-free source coding scheme with a full binary tree, a pseudonym range, and a master key; code sending to the client from the anonymizer the table, the pseudonym range, and the master key, all encrypted with a public key of the client; code receiving a content request with an encoded content name from the client at the anonymizer, the content request being encoded using the table, a pseudonym selected from the pseudonym range and hashed with the master key to generate a hashed value, and the master key; code decoding the content name of the content request using the pseudonym, the table, and the master key; code retrieving content corresponding to the content name; and code sending to the client from the anonymizer the content and the encoded content name.
8. The medium of claim 7, wherein the table is a Huffman table.
9. The medium of claim 7, wherein the encoded content name includes a domain name of the anonymizer.
10. The medium of claim 7, wherein the encoded content name includes the content name encoded with the table.
11. The medium of claim 10, wherein the encoded content name includes the content name encoded with the table and XOR'd with the hashed value.
12. The medium of claim 7, wherein the table is randomly generated.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1) The accompanying drawings, which are incorporated into and form a part of the specification, illustrate one or more embodiments of the present invention and, together with the description, serve to explain the principles of the invention. The drawings are only for the purpose of illustrating one or more embodiments of the invention and are not to be construed as limiting the invention. In the drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF THE INVENTION
(11) Consider a network composed of a set of users (U), routers (R), providers (P), and a set of trusted proxies named anonymizers. A router is either a filtering router or a normal router. A filtering router (r.sub.f) filters the incoming requests' data names and drops requests whose names appear in a blacklist of contents. The filtering process is online (happens on the fly), and to reduce congestion and prevent throttling of the traffic the filtering is generally performed very close to line speed (packet arrival rate). Without loss of generality, assume that a user is directly connected to a filtering router, which is in turn connected to the rest of the network.
(12) Assuming that the Internet Protocol (IP) is not used as the network layer, in ICN, the first hop node on the path of a request is the only one that can identify the requester (from MAC-layer header); further along the requester's identity is absent in the packet. This is the worst case scenario. If IP-based addressing is used, assume that the requester's IP address is cloaked using some anonymization technique. With IP-cloaking the situation is the same as when IP addressing is not used. The present invention also applies when the filtering router and the user are separated by several hops. The user requests a content by sending an interest packet containing the name of the content chunk. The name of an object is in a hierarchical format starting with the content provider's name, e.g., www.youtube.com/ArabSpring.mpg.1 (the postfix number 1 is the chunk ID).
(13) In the present invention, instead of using the plaintext content name for the interest, the invention uses an encoded name. The encoded name can only be decoded correctly by a pre-selected anonymizer which has the decoding table. No entity in the path between the anonymizer and the requester has the decoding table and hence cannot decode the name. Between the anonymizer and the data provider (source or an intermediate router) the request is transmitted as a normal request and no filtering happens. Assume that between the anonymizer and the requester the data is encrypted; otherwise the filtering entity can identify the content. Generally, a third-party entity, chosen by the user from a publicly available list, serves as the anonymizer. This is analogous to what happens today: users evade traffic censorship by choosing an anonymizing service, such as anonymizer.com, as a proxy, and bypass the censors by tunneling their traffic (e.g., Facebook, Youtube) through the anonymizer's servers. A content provider can also operate as the anonymizer.
(14) Note that the present invention has a trade-off between user privacy and caching (network) efficiency. It has been shown that to guarantee strong unlinkability of users' identity with their requests in-network caching should be avoided. In the present framework, if a user does not trust the network, it can request the content provider to be its anonymizer. This would guarantee strong unlinkability in the network, however, the corresponding cached data at intermediate routers is unusable to serve new requests. This is true with Tor and ANDaNA. The multi-layered onion-routing based encryptions render the cached private data unusable for satisfying repeat requests. On the other hand, if the user chooses an anonymizer in the network, the cached data is only unusable by the routers in the path between the user and the anonymizer. In-network caching can be leveraged in the rest of the path. For the present analysis, assume the content provider is the same as the anonymizer.
(15) For the present analysis, assume that the attacker (censoring authority) is either an active or a passive eavesdropper. The attacker's aim is to learn how to correctly decode an interest. Also assume that the attacker has bounded capabilities, i.e., it cannot do large-scale brute force attacks. A passive eavesdropper can capture all packet transmissions to perform analysis. An active eavesdropper can capture and modify in-flight requests and also masquerade as a legitimate user to send interests.
(16) Consider an alphabet of size N for the source messages (chunk names). The source message is denoted by M.sup.k, where k is the message size. The encoded message is denoted by Z and is assumed to be a binary string of size n. In the present framework three sources of randomness exist, namely the random selections of the Huffman tree structure, the conventional key, and the source alphabet order from their respective universal sets. The user and the content provider can secretly perform one or more of these three random selections, thus making it difficult to break the system. The Huffman tree is a full binary tree with source alphabet symbols placed at the leaf nodes. Hence, for an alphabet of size N there are N leaf and N1 internal nodes. However, for the same set of symbols there can be several possible trees, one of which can be chosen randomly. The preferred convention to generate a key for a Huffman tree in detail below. The last source of randomness is the random choice of the alphabet order, that is, the order in which the alphabet symbols are placed on the leaf nodes. Instead of the standard Huffman coding where the placement of the symbols is based on the frequency of a symbol, the symbol placement can also follow a random distribution to further increase the framework's randomness.
(17) Exploiting the Huffman source coding, the present invention preferably employs a novel key generation mechanism from the Huffman tree. Assume that the source (user) and the destination (content provider) of a communication have secretly shared a Probability Mass Function (pmf), which represents the probability of the occurrence of the symbols of an alphabet set in a message. The pmf is selected deliberately as a part of the secrecy mechanism and does not reflect the true source distribution. Also assume that both the source and the destination have information to build the Huffman tree with the same structure. The Huffman tree is generated by assigning labels 0 and 1 to the left and the right branch respectively, at each depth of the tree.
(18) With a little analysis one can see that for a Huffman tree with N1 internal nodes there are 2.sup.(N1) mutation trees, where a mutation tree of a Huffman tree is generated by swapping the labels of the internal nodes. Each of these mutation trees is associated with a unique string of size 2.sup.(N1) that is obtained by traversing the tree sequentially by levels and at each level picking up the labels from left to rightsimilar to breadth-first search. Assuming that one has an arbitrary pmf for a six symbols alphabet,
(19) Note that this present convention is not restricted to the Huffman source coding. In general, every prefix-free source coding scheme with a full binary tree can use it.
(20) Huffman coding is a useful approach to mitigate censorship in ICNs. In the present framework, one encodes a part of the content name, the postfix after the domain name, using the Huffman coding algorithm. The domain name is not encoded to allow for name based routing. Note that if the anonymizer is not the provider then the domain name can also be encoded. In this case, the anonymizer's domain name will be used as the prefix of the interest and used for routing. Once the interest reaches the ingress router of the domain (a CP or an anonymizer), the name in the interest is decoded into the real name. Although it is preferred to use the Huffman coding technique to encode user interests, other coding techniques can also be used in the present framework. The present framework preferably comprises three phases: initialization, secure content sharing, and secure content response. The initialization phase is used for sharing credentials between the user and an anonymizer to enable censorship-proof communication. For simplicity of exposition, assume that the content provider is the anonymizer. Table 1 presents the notation used in the preferred protocols.
(21) TABLE-US-00001 TABLE 1 Notations Used Notation Description PK.sub.C Client's public key PR.sub.C Client's private key CR.sub.C Client's certificate PK.sub.A Anonymizer's public key PR.sub.A Anonymizer's private key CR.sub.A Anonymizer's certificate [p.sub.i.sup.l, p.sub.i.sup.h] Pseudonym range for client i M.sub.i Master key shared between anonymizer and client i HT A Huffman table shared between client and anonymizer ( . . . ).sub.HT Symmetric key operation using Huffman table HT (content encoding) { . . . }.sub.PK Encryption with public key PK { . . . }.sub.PR Decryption with private key PR [ . . . ].sub.PR Signing with private key PR [ . . . ].sub.PK Signature verification with public key PK
(22) In the initialization phase, the client sends a membership request to the anonymizer (A) encrypted using the anonymizer's public key and signed using the user's private key (Protocol 1:Line 1). On receiving the membership request, A generates a random Huffman table, by using a random pmf for the interest, and sends the membership reply secretly to the requester; it also stores these information in its table (see
(23) TABLE-US-00002 Protocol 1 Initialization Phase Input: Client's credentials, client's public/private key, anonymizer's public/private key. Output: Huffman table, pseudonym range, master key. 1: Client securely sends registration request to Anonymizer:
(24) For the secure content request phase, the client needs to generate an interest packet, where the interest name has to be customized to evade censorship. Essentially, the hierarchical content name is composed of the anonymizer's domain name in plaintext (to enable prefix based routing), concatenated with the Huffman encoded postfix of the name representing the exact chunk. The Huffman encoded portion of the name may vary depending on the secrecy level required by the user. For the highest level of secrecy, the complete name postf ix after the anonymizer's domain name needs to be encoded. For lower levels of secrecy, the user can encode a portion of the postfix. For instance, consider a content name for an Arab Spring video: www.google.com/movies/2012/ArabSpring/HD/TahrirSquare. The interest with highest secrecy level encodes all segments except www.google.com/(anonymizer's domain name); while the lowest level secrecy, only encodes the last segment, TahrirSquare, the prefix is plaintext. This flexibility helps the user to adjust its desired level of secrecy.
(25) For fast indexing at A, u.sub.i chooses a random pseudonym p.sub.i.sup.m{p.sub.i.sup.lp.sub.i.sup.h} and adds it as an identity field in the interest packet/chunk. Anonymizer A uses an ordered binary tree data structure, which has the ordered pseudonym ranges of the users as leaves, to search for ui's ID (u(i)) in its table. Then, given p.sub.i, to identify u.sub.i A makes O(log IUD comparisons, where U is the set of users. The client hashes the selected pseudonym with the master key it received from the anonymizer (Protocol 2: Line 1). After that, the client xor's the hashed value and the Huffman encoded content name to achieve higher security (Protocol 2: Line 2) and adds the anonymizer domain name as the encoded name prefix for routing sake (Protocol 2: Line 3). Clients not interested in privacy requirements can be allocated only one identifier or can use only one pseudonym from their range. After the interest packet generation, the client sends the interest for the content chunk (Protocol 2: Line 4). Step 4 of
(26) TABLE-US-00003 Protocol 2 Secure Content Request Input: Contents-name, Huffman table, pseudonym, master key. Output: Encoded request. 1: Client randomly selects a pseudonym from the assigned pseudonym range and hashes it with the master key: Value = Hash(M, p.sub.i.sup.m [p.sub.i.sup.l, p.sub.i.sup.h]) 2: It calculates the XOR of the hashed value and the encoded name (using Huffman table): NAME = Encode(content-name) Value 3: Client adds the anonymizer domain name to the NAME as its prefix: Encoded-Name =/Anonymizer/NAME 4: Client sends the request with Encoded-Name and the selected pseudonym to the anonymizer: Encoded-Name, p.sub.i.sup.m
(27) For the secure content response phase, when the interest packet arrives at a filtering node, the filtering node can only infer partial information from the received interest packet. The inference is only based on A's domain name, but is insufficient for filtering the interest to identify the content. The filtering router can only perform longest prefix matching of the interest name with entries in its forwarding information base (FIB) and forward the interest to the appropriate next hop. At A, a table look-up on the pseudonym (Step 5 of
(28) The present framework is also amenable to the other proposed FIA named-data architectures. Note that naming of data in all these architectures fall into one of two categories: hierarchical, human readable naming and flat machine-readable naming (hashed names). In both cases, the name in the interest can be treated as a source message and encoded using the shared Huffman table. For instance, in NetInf, the Anonymizer A can operate as a local or global name resolution server (outside the filtering region) and do the transformation between the encoded and the real-name of the private data. In DONA, the anonymizer can operate as a high-level resource handler beyond the filtering routers. In PSIRP, a subset of the nodes in the rendezvous system can be chosen as the anonymizer(s). These nodes do the mapping between the encoded name and the real name of the data. The anonymizer can send the forwarding identifier to the source (provider) to send the data to the requester and ask the provider to use the encoded name.
(29) TABLE-US-00004 Protocol 3 Secure Content Response Input: Encoded-Name, Huffman table, pseudonym, master key. Output: Successful content retrieval by the anonymizer. 1: Anonymizer retrieves the pseudonym from the request and looks up the client's information. 2: Anonymizer hashes the pseudonym with the master key and XORs it with the Encoded-Name from the request: Huff-Name = Encoded-Name Hash(M, p.sub.i.sup.m) 3: Anonymizer decodes the Huff-Name using the Huffman table: Original-Name = Decode(Huff-Name) 4: It retrieves the content using Original-Name. 5: Anonymizer forwards the reply including the Encoded- Name and content to the client: Encoded-Name, Content
(30) In this section, the application investigates the information-theoretic security of the framework of the invention and the protocols under different assumptions on the eavesdropper's knowledge of the system. As mentioned above, the invention uses the structure of the Huffman tree, the key, and the alphabet order as the sources of randomness. The invention preferably omits the conventional key in the Huffman tree and only considers the tree structure and the alphabet order as the combination of these two sources covers all possible coding tables. Hence, an attacker with the knowledge of the tree structure and the alphabet order can reconstruct the coding table, which breaks the system. However, as these are encrypted using PKI or symmetric keys before transmission, they are secure.
(31) Thus, there are three remaining information-theoretic attack scenarios, which the present application will study below: (i) the eavesdropper has no parsing information, that is, tree structure and the alphabet order are unknown; (ii) the eavesdropper has the order of the alphabet only; and (iii) the eavesdropper has tree structure information only. Note that the source message is essentially the hierarchical data name postfix that needs to be encoded.
(32) First, one derives basic entropy terms that one will need in the rest of the section. As mentioned before, given a tree T for alphabet size N, one has 2.sup.(N1) mutually independent mutation trees for T, that is, 2.sup.(N1) keys for T. The selection of a mutation tree (i.e., key S's selection) uniformly at random from the set of mutation trees results in S's entropy to become:
(33)
(34) Besides the selection of a mutation tree of T, the random choice of the tree structure (a uniform random distribution) is also another source of randomness. The number of mutually independent full binary trees with N leaves is given by the (N1)th Catalan number (C.sub.N1). The Nth Catalan number (C.sub.N) with increasing N is given by C.sub.N(4.sup.N/N.sup.3/2), thus C.sub.N1C.sub.N for large N. Consequently, the entropy of using a random and secret tree structure (T.sub.R) can be written as:
(35)
(36) Given an alphabet of size N, there exist N! permutations of the alphabet order for a tree. The selection of an alphabet ordering uniformly at random from the set of all possible permutations results in A's (i.e., alphabet ordering A's selection) entropy to become:
(37)
(38) Equation (13) is derived according to log(N!) asymptotic bound, log(N!)(N log(N)).
(39) Also, considering the source alphabet with N symbols (e.g., N=512 for Unicode or 256 for ASCII), which are uniformly randomly distributed, the source symbol entropy is given in Equation (16):
(40)
(41) Next, different eavesdropper attack scenarios are discussed.
(42) Scenario 1: Both Tree Structure and the Alphabet Order are Unknown.
(43) In this scenario, the eavesdropper has no knowledge of the tree structure or the alphabet order. Let M.sup.k be the sequence of k symbols to be encoded and Z be the encoded binary sequence with length n symbols. The evaluation of the mutual information between the source message and the encoded sequence is provided by Equation (19) along with (9), (13), (16):
(44)
(45) Equation (17) is obtained from the definition of the mutual information between the source message and its corresponding encoded sequence. The r.h.s. of Equation (18) is obtained from the fact that in this scenario the entropy of the message, given its encoded sequence, equals the entropies of the alphabet order and tree structure choices, and that the mutual information is always non-negative. The outcome of Equation (19) is the conditional entropy of the source sequence, given the encoded sequence, which is equal to the total randomness for both the structure and the alphabet order.
(46) Scenario 2: Tree Structure Known, but not the Alphabet Order.
(47) In this scenario, the eavesdropper has complete knowledge of the tree structure and consequently can build the Huffman tree, and the alphabet order is the only secret. Hence, the mutual information between the source message and its encoded binary string is:
(48)
(49) Equation (23) presents the entropy of the source message assuming each symbol of this message is an i.i.d. random variable. In practice, the dependency between the letters in a word in the English alphabet reduces the entropy of the upcoming symbol (letter) given the prior symbols. This will be discussed below. Equating the r.h.s. of Equation (23) to zero, one concludes that the amount of information leakage as k becomes greater than N is proportional to the value of k. Although the leakage increases linearly with k, it must be investigated whether the eavesdropper can leverage this leakage or not. This is next investigated.
(50) Scenario 3: Alphabet Order Known, but not the Tree Structure.
(51) The opposite scenario is next investigated: the eavesdropper knows the alphabet order, but does not have access to the tree structure. This can happen when the eavesdropper intercepts the alphabet order sharing communication phase between the anonymizer and the user and somehow identifies the encrypted key. Equation (27) returns the entropy of the source message under this condition:
(52)
(53) Table 2 illustrates the thresholds of source message lengths (in symbols) for perfect secrecy for the three scenarios, evaluated in Equations (19), (23), (27) respectively (i.i.d. symbols in the messages). Note that messages longer than the threshold lead to leakage; leakage is defined as the difference between the message length and the length of the threshold.
(54) TABLE-US-00005 TABLE 2 Maximum possible source message length k (in symbols) for perfect secrecy in i.i.d. messages. Scenario N = 32 N = 64 N = 128 N = 256 N = 512 Scenario 1 43.3 83.8 163.07 318.5 624.2 Scenario 2 32 64 128 256 512 Scenario 3 11.3 19.8 35.07 62.5 112.2
(55) Dependent Source Scenario's Information Leakage.
(56) So far, it has been assumed that the source message is composed of i.i.d. random variables. Although this assumption is valid in most cases (for URL names), instances exist where there is a dependency between the source message symbols. For example, if the source message uses English words, then this changes the distribution of the source symbols and they no longer follow an i.i.d. uniform distribution. Hence, one must also investigate the amount of information leakage when the symbols are dependent. Now, the probability of choosing a symbol is conditioned on the previously selected symbols in the same message, which decreases the source message's rate (i.e., the average entropy per symbol in the message).
(57) The N-gram (sequence of any |N| adjacent symbols) entropy per symbol (F.sub.n) is bounded as
(58)
in a way that given the previous a1 symbols, there is a partial ordering of the symbols in the source alphabet corresponding to their probability of appearing as the next symbol. This can be discerned as a mapping between the symbols and integers such that the most probable next symbol (the a.sup.th symbol) conditioned on the a1 previous symbols, maps to i=1, the second probable symbol maps to i=2, and so on. Hence, pa represents the probability of the i.sup.th most probable symbol (among N symbols) to be placed at the a.sup.th position in the message, conditioned on the known a1 previous symbols. Clearly, p.sub.i.sup.1 is the most probable next symbol for the a.sup.th position in the message and p.sub.i.sup.N is the least probable symbol for the same position in the message. Therefore,
p.sub.N.sup.ap.sub.i.sup.ap.sub.1.sup.a,(29)
which can be inferred from (28). The overall probability of source symbols for the (a+1).sup.th position in the source message is at least equal to the overall probability of source symbols for the a.sup.th position, that is,
(59)
(60) In other words, the probability of guessing the correct source symbols increases with the size of the source message. For instance, for the word the, the probability of guessing e after guessing t and h is higher than the probability of guessing h after guessing t.
(61) The general lower bound of the entropy of a source message with k symbols is given as:
(62)
However, calculating is not easy because of the dependence of a symbol on previous symbols. Consequently, for ease of calculation one tries to obtain a bound that approaches from below. To obtain this bound, first derive the following equation:
(63)
Equation (32) is obtained from the fact that cannot be smaller than the entropy calculated for the source message by substituting the entropy of the last symbol (the last symbol's entropy, given the knowledge of the previous symbols is very low) in place of every source symbol; also cannot be larger than the entropy calculated by substituting all symbols with the first symbol in the source.
(64) It is easy to see that in Scenario 1, the lower bound entropy of the source message is at least as high as the l.h.s. of (32). This is especially true as URL addresses tend to also have symbols other than the English alphabet and sometimes contain incomplete words or meaningless strings, which would increase their randomness. Hence, use the l.h.s. of Equation (32) to approximate the entropy of the source message, hence Equation (19) now becomes,
(65)
Equating the r.h.s. of inequality (34) to zero, Equation (35) presents the condition for perfect secrecy,
(66)
(67) Similarly, the perfect secrecy threshold for Scenario 2 is:
(68)
and Scenario 3 is
(69)
Note that in the dependent source case, the bound for k is dependent on the inter-symbols dependency, which is intrinsic to each message. Hence, it is difficult to derive something similar to Table 2. However, in both set-ups (independent/dependent sources) an important follow-up question is, what happens when k is greater than the corresponding bounds? Then the secrecy is no longer information-theoretically perfect. Two choices exist at that point.
(70) When k equals the bound the user and the anonymizer can use another Huffman table to continue perfectly secure communication. They can use a synchronized protocol where before k reaches the bound the anonymizer can piggyback a new encrypted Huffman table (a small overhead) with the data. Or, in the interest of speed and low overhead, the user can choose to keep using the current table and risk leaking information. In this latter case, an eavesdropper can utilize the information leakage to mount efficient brute-force or cryptanalysis attacks to break the framework. In the next section, this application analyzes the feasibility of such attacks.
(71) In this section, the application investigates and analyzes the security of the framework of the present invention from the perspective of well-known attacks. The present invention is secure against known plaintext attacks. Also, it is secure against the chosen plaintext attack as the eavesdropper cannot get the user/anonymizer to encrypt a chosen plaintext using the corresponding Huffman table. In the chosen ciphertext attack, the attacker needs to obtain the decryption of its selected ciphertext. This is not possible in the present framework, as the anonymizer is the only entity with decoding capability. But, the anonymizer does not publish the decoded interest. The use of independent Huffman tables for users, selected uniformly at random, prevents the information leakage of one user from affecting others. This uniform selection of coding tables also prevents users to be able to correlate their coding tables with those of others to decode their encoded interests. Ciphertext-only attack can be mounted as the attacker has access to a set of ciphertexts (encoded interests). If the user continues to use the same Huffman table, then the repeated interests will have the same encoded names, leaking information that the eavesdropper can use to make the cipher-text attack more potent. This leakage can be prevented by XOR-ing the postfix with a nonce and sending the nonce appended with the encoded URL.
(72) This leaves two attacks that can be orchestrated by an attacker (active/passive): Correctly guessing the source message from the encoded sequence, i.e., ciphertext-only attack, or using brute-force to identify the correct key and the tree structure. As to correctly guessing the source message, the present analysis uses guessing entropy, which is the expected number of guesses required by an attacker to ascertain the correct source message, to calculate the ease of guessing the source message. Let G(M.sup.k|Z) be the number of guesses required to identify M.sup.k given Z, in a way that E[G(M.sup.k|Z)], the expected number of successive attempts, is minimized. Equation (38) evaluates the corresponding E[G(M.sup.k|Z)]:
(73)
where P.sub.Z(z) is the probability of selecting an encoded binary sequence from the pool of all possible binary sequences. The guessing entropy is lower bounded by the conditional entropy as
E[G(M.sup.k|Z)]2.sup.H(M.sup.
Now one evaluates the lower bound on the guessing entropy in the three scenarios. Substituting Equation (19) in Equation (39), one has the lower bound guessing entropy for Scenario 1 as:
E[G(M.sup.k|Z)]2.sup.(2N+(N3/2)log(N)2)+1.(40)
Similarly, by substituting Equations (23) and (27) in Equation (39), one has:
E[G(M.sup.k|Z)]2.sup.(N log N)2+1(41)
and
E[G(M.sup.k|Z)]2.sup.(2N3/2 log(N)2)+1(42)
for Scenarios 2 and 3 respectively.
(74)
(75) As mentioned above, there are N! alphabet orderings for a given tree. Considering an N symbols alphabet, there exists C.sub.N=(2N!)/[(N+1)!(N!)] different Huffman trees where N=N1. Each of these Huffman trees has N! alphabet placement. For a brute-force attack to identify the mutation tree, an attacker needs to compute on average N!/2 different alphabet placement. Given this, the attacker has to use [(2N2)!/N!(N1)!] N!/2=[(2N2)!/2(N1)!] different Huffman coding tables on average to decode the encoded message when attempting the brute force attack. Even with N=256 (extended ASCII) it is computationally difficult to examine this search space at a filtering router, even when it is performed offline.
(76) So far, the application has proved the information-theoretic secrecy and the computational security of the inventive framework. In the next section, the application presents experimental results, which answer the next question: how efficient, applicable, and scalable is the framework for real-world mobile devices?
(77) In this section, the application augments the security of the proposed framework against chosen plaintext and chosen ciphertext attacks. As mentioned above, the attacker does not have access to the encoding and decoding oracles to mount chosen plaintext and chosen ciphertext attacks. However, the system is insecure if the attacker maliciously enforces the client/anonymizer to encode/decode a plaintext/ciphertext. Thus, an extension to the framework is presented to prevent such scenarios that is also secure against pattern analysis attack on the encoded message, which helps the attacker to infer some information about the message.
(78) In the initialization phase, the client and the anonymizer, in addition to a random Huffman table and a pseudonym range, share a master key (). For securely requesting a content, the client first encodes the plaintext and randomly selects an in-range pseudonym as explained above. Following the Huffman encoding, the client generates a temporary key (K) by hashing
with the selected pseudonym (will be used in the interest packet) using a keyed hash function. Eventually, the client XORs the temporary key, K, with the encoded message. If the length of K is smaller than the length of the encoded message, the temporary key will be repeated to the length of the message.
(79) The anonymizer, who is receiving the interest, looks up the client using the pseudonym to find the corresponding master key () and coding table. Using the same hash function, the anonymizer generates K by hashing the pseudonym in the interest with the master key and XORs the result with the received interest. Eventually, the anonymizer decodes the message using the corresponding Huffman table (as defined above).
(80) In order to evaluate the secrecy threshold in this new model, one first derives the entropy of the generated temporary key, K. Since K is generated by a hash function, assume that each symbols of the hashed value is an i.i.d. random variable, and hence the entropy of K is:
(81)
where |K| is the length of the temporary key (hashed value) in bits. The perfect secrecy threshold for the Equation (19), will be transformed as:
I(M.sup.k; Z)=H(M.sup.k)H(M.sup.k|Z)(47)
max(kH(X)(H(T.sub.R)+H(A)+H(K))(0))(48)
k log(N)2N+(3/2N)log(N)||.(49)
Consequently, Equation (23) will be:
(82)
and Equation 27 will be:
I(M.sup.k; Z)max(kH(X)(H(T.sub.R)+H()),0)(53)
=max(k log N(2N3/2 log N+|K|),0)(54)
=(k+3/2)log N2N||.(55)
Even in the scenario in which the attacker knows both the tree structure and the alphabet ordering, decoding the ciphertext is not trivial due to the XOR operation as:
(83)
(84) Here, one does not draw a table similar to Table 2 as different hashing functions have different output length, which makes the length of the temporary key dependent on the employed hashing function. Furthermore, converting the length of the temporary key (in bit) to the number of source symbols would not be precise due to variable length nature of Huffman codes. Although for a rough estimation, one can divide the number of bits in K to the expected length of Huffman codes, |K|/E[I(X)].
(85) For the experimental evaluation, one has clients requesting content over the network to a CCN media server (content provider), which is also the anonymizer. The testbed consists of 18 nodes, eight desktops, six laptops, and four smartphones (3 Nexus 4 and one Nexus 5). A 4-tiered line topology network is connected using switches and IPv4 routers. For the experiments, the clients and the anonymizer are placed on either ends of the linerequests travel over five hops. The client, server, and the nodes in the network employ the CCNx-0.7 code base; the present framework is implemented in C and is integrated into CCNx. The nodes route packets using longest prefix matching.
(86) For fair comparison, the tests have disabled caching, so an interest passes through all 4 tiers. The tests compare latency and content retrieval time over four different scenarios, namely the vanilla (Baseline) CCN implementation, CCN with our anti-censorship framework (CCN+Huffman), data retrieval using FTP, and using Tor, the state of the art Internet anti-censorship tool. The tests also compare the overhead of the inventive framework and Tor over their respective baselines, Baseline CCN and FTP, respectively. For testing Tor, the tests setup the testbed as a Tor network where the first three network gateways (from client towards the server) are provisioned as Tor proxiesthree onion layers of symmetric encryption for the client. All the results were averaged over 100 runs. The size of the contents in the experiments were chosen from the set {1 MB, 10 MB, 100 MB, 500 MB}.
(87) One option for encoding the data name (or the postfix after the domain name) is to use a strong symmetric key algorithm, such as AES. In Table 3, the tests compare the time taken for encryption/decryption by two widely used AES versions and for encoding/decoding using the inventive framework. As an alternative solution, the client can hash the content name with a salt given by the anonymizer. The anonymizer needs to pre-hash all the content names with each salt corresponding to each client. Upon receiving an interest from a particular client, the anonymizer does a look-up on the hashed content name to find the requested content. Though the storage requirement for these hashes grows infeasible with a large number of clients and/or contents, the tests nonetheless evaluate the performance of the Openssl SHA1 digest for the sake of comparison.
(88) TABLE-US-00006 TABLE 3 Running time comparison between the AES symmetric key cryptography and Huffman encoding. Encoding Scheme Encoding (s) Decoding (s) aescrypt in unix (laptop) 0.050 0.021 AES openssl (laptop) 0.010 0.008 Huffman coding (laptop) 0.004 0.004 AES openssl (Nexus 5) 0.041 0.023 Huffman coding (Nexus 5) 0.006 0.005 Huffman* (laptop) 0.000034 0.000027 SHA1 hashing (laptop) 0.000093
(89) The tests measured the timings on a wired laptop (AMD Turion, 2.4 GHz, dual core, 2.7 GB RAM) and on two wireless Nexus 5 smartphones (Krait 2.3 GHz, quad core, 2 GB RAM). For the Huffman operations, indicated by Huffman coding in Table 3, the time includes reading the source symbol frequencies, building the tree, and encoding/decoding the codewords. While the Huffman represents the elapsed time only for the encoding/decoding operations. The represented time for AES accounts for the encryption and the decryption operations only. The data name in the test contained 75 characters.
(90) The optimized OpenSSL AES version is almost four times as fast as the aescrypt version; note that this is the version used in the Tor experiments. Encoding/decoding in the inventive framework (Huffman) is three orders of magnitude faster than OpenSSL (0.000034 vs. 0.010). As mentioned in Protocol 2, a client has to hash the selected pseudonym with the master key and XORs the result with the encoded name. XOR is a low cost operation with negligible latency, which can be omitted. Hence, the combined cost of hashing (0.000093) and encoding (0.000034) operations on a laptop, as per Table 3 is 0.0001 second, which is two orders of magnitude faster than OpenSSL.
(91)
(92)
(93) This fact is also highlighted by the estimated round trip time (RTT) results in
(94)
(95)
(96) These results show that the inventive framework is much more efficient and scalable than Tor, the state of the art, as a mechanism to mitigate censorship of user communications.
(97) In this section, the invention extends the Huffman coding technique for secure communication. For secure communication, a client and the server (anonymizer) should authenticate each other and securely share a Huffman table as their secret. The client uses this table for encoding part of its request while the server uses it for content encoding. The client sends a membership request to the Server and requests an initiation for a secure communication.
(98) The client attaches its public key (PKC) and certificate to the membership request it sends to the server (Protocol 4: Line 1). Upon receiving this request, the server generates a random Huffman table and stores it along with the client's public key. It then forwards the Huffman table encrypted with the client's public key and signed by itself along with its public key (PKS a.k.a PKA) and certificate to the client (Protocol 4: Line 2). The client verifies the server's certificate to ensure the server's authenticity. It decrypts the Huffman table with its public key and stores it if the server's certificate is valid and drops it otherwise (Protocol 4: Line 3).
(99) At this time a secure session is established and hence, the client can request a content from the server. The client then requests a content from the server (Protocol 4: Line 4). The server, upon receiving the request, retrieves the client's Huffman table, encrypts the content with the table, signs it, and forwards it to the client (Protocol 4: Line 5). The client validates the server's signature and decrypts the content with the Huffman table if the signature is valid (Protocol 4: Line 6). The shared Huffman table needs to be updated every so often to guarantee the security of the system. For requesting a new table, the client performs the same steps as at the beginning to share a new Huffman table with the server.
(100) TABLE-US-00007 Protocol 4 Initialization Phase Input: Client's public/private key and certificate, server's public/private key and certificate. Output: A shared Huffman table between the client and server. 1: Client generates and sends membership request to the server: Request = PK.sub.C, CR.sub.C
2: Server generates a random Huffman table, stores it, and securely forwards it to the client along with its public key and certificate: Secret = {[HT].sub.PR.sub.
Secret, PK.sub.S, CR.sub.S
3: Client decrypts the Huffman table and verifies the server's certificate and signature: {[ Secret ].sub.PK.sub.
(101) The present invention provides a lightweight anti-censorship framework for ICN users, specifically for mobile users. This application proved conditions and thresholds for perfect secrecy as well as analyzed the computational complexity of the framework. The framework's breakability study showed the advantages of Huffman coding over AES, and the extensive experimental results demonstrated the efficiency of the inventive framework in comparison to other frameworks, such as Tor.
(102) The invention can be additionally strengthened via innovations in several facets of the construction of the Huffman encoding to improve the underlying strength of the protocols. The facets are: 1) Frequently changing Huffman tables to increase entropy (hence secrecy) of the system; 2) dynamic choice of tree structure/topology; 3) dynamically changing labels of the edges; 4) dynamically changing order of associating the labels for the edges; and 5) dynamic assignment of the p.m.f for the symbols.
(103) In at least one embodiment, and as readily understood by one of ordinary skill in the art, an apparatus according to the invention will include a general or specific purpose computer or distributed system programmed with computer software implementing the steps described above, which computer software may be in any appropriate computer language, including C++, FORTRAN, BASIC, Java, assembly language, microcode, distributed programming languages, etc. The apparatus may also include a plurality of such computers/distributed systems (e.g., connected over the Internet and/or one or more intranets) in a variety of hardware implementations. For example, data processing can be performed by an appropriately programmed microprocessor, computing cloud, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or the like, in conjunction with appropriate memory, network, and bus elements.
(104) Embodiments of the present invention provide a technology-based solution that overcomes existing problems with the current state of the art in a technical way to satisfy an existing problem for sending, receiving, coding, and decoding of confidential communications. An embodiment of the present invention is necessarily rooted in computer technology in order to overcome a problem specifically arising in the realm of computer technology. Embodiments of the present invention achieve important benefits over the current state of the art, such as increased flexibility, faster computation times, smaller memory requirements, etc. Some of the unconventional steps of embodiments of the present invention include the use of Huffman coding, employed in a unique manner to guarantee confidentiality for secure communications.
(105) Note that in the specification and claims, about or approximately means within twenty percent (20%) of the numerical amount cited. All computer software disclosed herein may be embodied on any computer-readable medium (including combinations of mediums), including without limitation CD-ROMs, DVD-ROMs, hard drives (local or network storage device), USB keys, other removable drives, ROM, and firmware.
(106) Although the invention has been described in detail with particular reference to these embodiments, other embodiments can achieve the same results. Variations and modifications of the present invention will be obvious to those skilled in the art and it is intended to cover in the appended claims all such modifications and equivalents. The entire disclosures of all references, applications, patents, and publications cited above are hereby incorporated by reference.