Leakage-free order-preserving encryption
10833841 ยท 2020-11-10
Assignee
Inventors
Cpc classification
H04L2209/12
ELECTRICITY
G06F21/6227
PHYSICS
H04L9/0618
ELECTRICITY
H04L9/002
ELECTRICITY
International classification
G06F21/62
PHYSICS
H04L9/06
ELECTRICITY
Abstract
Embodiments implement leakage-free order-preserving encryption by assigning a distinct ciphertext for each plaintext, including repeated plaintext whose ciphertext is randomly inserted. In order to conceal insertion order, the randomized ciphertexts are compressed to minimal ciphertext space. A uniform distribution is achieved by rotating about a modulus on the ciphertexts rather than the plaintexts. The resulting ciphertext distribution has no leakage from the ciphertextseven if an adversary has perfect background knowledge on the distribution of plaintexts. The encryption may be further secured even against passive query monitoring attacks by hiding the access pattern using , -differential privacy, such that the adversary observing a sequence of queries will not learn the frequency of plaintext. The leakage-free order-preserving encryption may be converted into an adjustable encryption scheme to allow querying (e.g., on a remote server).
Claims
1. A method comprising: a client encrypting plaintext into randomized ciphertext by inserting repeated plaintext elements at random locations, such that there are multiple ciphertexts for a single plaintext; the client converting the randomized ciphertext into a compressed ciphertext space according to a probabilistic encryption scheme; the client performing a rotation of the compressed ciphertext space about a modulus to create distributed ciphertext; the client encrypting the distributed ciphertext according an adjustable encryption scheme; the client forwarding the encrypted distributed ciphertext to a server; an in-memory database engine at the server storing the encrypted distributed ciphertext in an in-memory database of the server; the client rewriting an equality query into a range query; the client communicating the range query to the server; in response to the range query, the in-memory database engine, decrypting the encrypted distributed ciphertext to a searchable encryption scheme, and searching the decrypted distributed ciphertext in the searchable encryption scheme according to the range query using an index, to produce a range query result comprising a subset of the decrypted distributed ciphertext in the searchable encryption scheme; the client receiving from the server the range query result; and the client decrypting the range query result from the searchable encryption scheme.
2. A method as in claim 1 wherein the adjustable encryption scheme stores a binary tree.
3. A method as in claim 2 wherein the binary tree is self-balancing.
4. A method as in claim 1 wherein decrypting the range query result comprises updating the modulus based upon a further rotation performed at the server.
5. A method as in claim 1 further comprising applying a differential privacy procedure to generate the range query as a private query.
6. A method as in claim 5 wherein the differential privacy procedure utilizes a , -differentially private mechanism.
7. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising: a client encrypting plaintext, into randomized ciphertext by inserting repeated plaintext elements at random locations, such that there are multiple ciphertexts for a single plaintext; the client converting the randomized ciphertext into a compressed ciphertext space according to a probabilistic encryption scheme; the client performing a rotation of the compressed ciphertext space about a modulus to create distributed ciphertext; the client encrypting the distributed ciphertext according an adjustable encryption scheme; the client communicating the encrypted distributed ciphertext to a server; an in-memory database engine at the server storing the encrypted distributed ciphertext in an in-memory database of the server; the client rewriting an equality query into a range query; the client communicating the range query to the server; in response to the range query, the in-memory database engine, decrypting the encrypted distributed ciphertext to a searchable encryption scheme, and searching the decrypted distributed ciphertext in the searchable encryption scheme according to the range query using an index, to produce a range query result comprising a subset of the decrypted distributed ciphertext in the searchable encryption scheme; the client receiving from the server the range query result; and the client decrypting the range query result from the searchable encryption scheme.
8. A non-transitory computer readable storage medium as in claim 7 wherein the adjustable encryption scheme stores a binary tree.
9. A non-transitory computer readable storage medium as in claim 8 wherein the binary tree is self-balancing.
10. A non-transitory computer readable storage medium as in claim 7 wherein decrypting the range query result comprises updating the modulus based upon a further rotation performed at the server.
11. A non-transitory computer readable storage medium as in claim 7 wherein the method further comprises applying a differential privacy procedure to generate the range query as a private query.
12. A non-transitory computer readable storage medium as in claim 11 wherein the differential privacy procedure utilizes a , -differentially private mechanism.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION
(7) Described herein are methods and apparatuses implementing leakage-free order preserving encryption. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of embodiments according to the present invention. It will be evident, however, to one skilled in the art that embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
(8) Embodiments implement static leakage-free order-preserving encryption by assigning a distinct ciphertext for each plaintext, including repeated plaintext whose ciphertext is randomly inserted. In order to conceal insertion order, the randomized ciphertexts are compressed to minimal ciphertext space. A uniform distribution is achieved by rotating about a modulus on the ciphertexts rather than the plaintexts. The resulting ciphertext distribution has no leakage from the ciphertexts even if an adversary has perfect background knowledge on the distribution of plaintexts. The encryption may be further secured even against passive query monitoring attacks by hiding the access pattern using , -differential privacy, such that the adversary observing a sequence of queries will not learn the frequency of plaintext. The static leakage-free order-preserving encryption may be converted into an adjustable encryption scheme to allow querying (e.g., on a remote server).
(9)
(10) A security engine 106 present at the client, first converts the plaintext into randomized ciphertext 108. During this process, a ciphertext is inserted for each plaintext. Ciphertext corresponding to repeated plaintext is inserted at random locations (e.g., utilizing a random tree traversal technique).
(11) Next, the randomized ciphertext is compressed into minimal ciphertext space 110 utilizing probabilistic encryption. This conceals the insertion order.
(12) Then, the compressed ciphertext is subjected to order-preserving encryption by rotation about a modulus to achieve a uniform distribution, producing the distributed ciphertext 112. As further discussed below, according to a static encryption procedure, this rotation is performed on the ciphertext space as a whole. According to a dynamic encryption procedure, this rotation is performed after each insertion of ciphertext.
(13) Finally, the distributed ciphertext is subjected to further, adjusted encryption in order to be in a strongest encrypted form 114. An example of such strong encryption is probabilistic IND-CPA secure encryption.
(14) As further discussed in detail below, the adjusted encryption may take place according to a static encryption procedure or a dynamic encryption procedure. A static adjusted encryption procedure may store a binary sorted tree. The dynamic adjusted encryption procedure may employ a self-balancing binary tree.
(15) The adjusted encrypted data created at the client, is then forwarded to the server 116 via a network 117. There it is stored in highly secure form within database 118. The data stored in this highly encrypted form, however, is not available for searching.
(16) It is noted that the security engine may perform additional processing of the adjusted encrypted data prior to its storage. For example, as part of a dynamic encryption procedure preceding the adjustable encryption, the server engine may perform a further rotation. This is further discussed below in connection with the example.
(17)
(18) Accordingly, prior to being sent over the network to the server, the query is subjected to a differential privacy procedure 122 in order to create a private query 124. As described in the example below, this differential privacy procedure may involve the addition of (positive) noise to the query values.
(19) That private query is then communicated over the network to the server for processing on the data stored securely in the database. It is noted that the client may also employ other tactics (e.g., issuing false queries), in order to further preserve system security.
(20) As mentioned above, the adjusted encrypted data in its strongest form stored in the database, is not available for searching. However, an engine 126 on the server is configured to decrypt the adjusted encrypted data to a lower form of encryption 127 that is amenable to search. Examples of such a lower encryption scheme can include deterministic or order-preserving encryption.
(21) The server engine processes the query on the database in order to furnish a query result 128 in the lower encrypted form. That query result is then communicated to the client, where it can be converted into decrypted form 130 for display, local storage, and manipulation by a user. Where the server engine has performed further rotation of ciphertext as part of a dynamic encryption procedure of a plaintext stream, such processing by the security engine may involve rotation about an updated modulus that reflects further rotation performed by the server engine.
(22)
(23) At 204 the randomized ciphertext is converted into a compressed ciphertext space according to a probabilistic encryption scheme. At 206 a rotation of the compressed ciphertext space about a modulus creates distributed ciphertext.
(24) At 208, the distributed ciphertext is communicated for storage in a database of a server. At 210 a query is communicated to the server. This query may be in the form of a secure private query.
(25) At 212 a query result received from the server in response to the query, is decrypted. At 214 the query result is stored and presented for display to a user.
Example
(26) An example is now presented in connection with a series of attacks on deterministic encryption, but which may also extend to order-preserving encryption. Also considered are the impacts of attacks on frequency-hiding order-preserving encryption (FHOPE), a secure form of order-preserving encryption.
(27) These attacks use only the ciphertextsas obtained after a database compromiseand (imprecise) background knowledge. Background knowledge is readily available for many data sets.
(28) In this example, two auxiliary data sets may be used for background knowledge: one from the same source as the target data, and one from a different source. No further passive monitoring (e.g. queries, updates, etc.) or active attacks (e.g. result modifications, insertions, etc.) on the database are performed. The attacks hence only exploit static leakage and are simple to execute.
(29) Let be the multi-set (a multi-set may potentially include duplicate values) of ciphertexts. Let
be the multi-set of plaintexts in the background knowledge of the adversary.
(30) For simplicity only one database column is considered and semantic constraints are ignored. However, embodiments are still as secure for a single column as for a set of columns under arbitrary semantic constraints. It is assumed that the sizes of the multi-sets are equal: n=//=/
/.
(31) The frequency analysis attack computes the histograms Hist() and Hist(
) of the two multi-sets. Then it sorts the two histograms in descending order: {right arrow over (C)}=Sort(Hist(
)) and {right arrow over (m)}=Sort(Hist(
)). The cryptanalysis for c.sub.i is m.sub.i, i.e. the two vectors are aligned.
(32) l.sub.P optimization is as follows. The frequency analysis is refined to the l.sub.P-optimization attack. In this refinement the two histograms are not simply sorted and aligned, but a global minimization is run to find an alignment. Let be the set of nn permutation matrices. The attack then finds X
, such that the distance //{right arrow over (c)}X{right arrow over (m)}//.sub.P is minimized under the l.sub.P distance. For many distances l.sub.P the computation of X can be efficiently (polynomial in n) performed using an optimization procedure, such as linear programming. The cryptanalysis for c.sub.i is X(m).sub.i, i.e. the two vectors are aligned after permutation.
(33) The attack works not only for order-preserving encryption, but also for deterministic encryption. The attack is successful in experimentally recovering hospital dataeven for such deterministic encryption, with an accuracy of 100% for 100% and 95% of the hospitals for the binary attributes of mortality (whether a patient has died) and sex, respectively, under deterministic encryption.
(34) This attack is presumably already prevented by FH-OPE. However, the attack is not meant for order preserving encryption and the sorting attack may still work for FH-OPE.
(35) The sorting attack is now described. Let be the domain of all plaintexts in multi-set
. Let N=/
/ be the size of the domain
. The sorting attack assumes that
is dense, i.e. contains a ciphertext c for each m
. The adversary computes the unique elements Unique(
) and sorts them {right arrow over (c)}=Sort(Unique(
)) and the domain {right arrow over (d)}=Sort(
). The cryptanalysis for c.sub.i is d.sub.i, i.e. the order of the ciphertext and the plaintext are matched.
(36) The attack is 100% accurate, if the ciphertext multi-set is dense. For this assumption, a refinement may be presented that works also for low-density data. The cumulative attack is now described before extending the sorting attack to FH-OPE.
(37) The cumulative attack cleverly combines the l.sub.P-optimization and sorting attack. The adversary first computes the histograms {right arrow over (c)}.sub.1=Hist(), {right arrow over (m)}.sub.1==Hist(
) and the cumulative density functions {right arrow over (c)}.sub.2=CDF(
), {right arrow over (m)}.sub.2=CDF(
) of the cipher- and plaintexts. The attack then finds the permutation X
, such that the sum of the distances between the histograms and cumulative density functions //{right arrow over (c)}.sub.1X{right arrow over (m)}.sub.1//.sub.P+//{right arrow over (c)}.sub.2X{right arrow over (m)}.sub.2//.sub.P is minimized. Again, this can be done using efficient optimization procedures. The cryptanalysis for c.sub.i is X(m).sub.i.
(38) The attack is very accurate against deterministic order preserving encryption, with an accuracy of 99% for 83% of the large hospitals for the attributes of age. The age column is certainly not low-density, but it is also not dense (as the success rate shows).
(39) This attack is again presumably prevented by FH-OPE, since no frequency information is available. However, the sorting attack is not necessarily prevented and maybe extendedin a similar fashion to the cumulative attackto FH-OPE.
(40) An extended Sorting Attack on FH-OPE is now described. The extended sorting attack on FH-OPE proceeds as follows. The adversary sorts the multi-sets {right arrow over (c)}=Sort() and {right arrow over (m)}=Sort(
), i.e. it is not necessary to only use unique values. The cryptanalysis for c.sub.i is m.sub.i. Note that in FH-OPE every element c.sub.i
is unique, but after the attack aligned to the cumulative density function of
as in the cumulative attack.
(41) The accuracy for this attack can be high depending on the precision of the background knowledge . It can be dangerous to make assumptions about the adversary's background knowledge, since they are hard, if not impossible, to verify and uphold. Hence, in the security model for SLF-OPE perfect background knowledge of the adversary is assumed, i.e. the multi-set
is the exact same multi-set as the plaintexts of the ciphertexts. The extended sorting attack then succeeds with 100% accuracy.
(42) A security model according to embodiments is now described. The security model is stronger than indistinguishability under ordered chosen plaintext attack (IND-OCPA) and indistinguishability under frequency-analyzing ordered chosen plaintext attack (IND-FAOCPA) since it implies both of them. Loosely speaking, the model states that the adversary cannot gain non-negligible information from the ciphertexts.
(43) The model is motivated by recent attacks on cloud infrastructure, including real world incidents showing the risks of deterministicnot even order-preservingencryption. For example, in at least one case passwords were encrypted using a deterministic procedure and were many subsequently broken. The cryptanalysis was performed on stolen ciphertexts only (using additional plaintext hints). Other hacking incidents have been recently publicized that resulted in leakage of sensitive informationnot necessarily ciphertexts.
(44) The attacks share a common anatomy. The hackers are capable to break in, access, and copy sensitive information. They did not observe any on-going activities of users or administrators. Instead they used the opportunity of access to gain as much data as possible in a short time, i.e. the adversary obtains a static snapshot. Note that this does not rule out the revelation of more sophisticated, longitudinal attacks in the future, but underpins the pressure to secure current systems.
(45) In this respect the model achieves the following: an attacker gaining access to all ciphertexts stored in an encrypted database does not gain any additional information to his background knowledge. Perfect background knowledge is assumed, i.e. the adversary already knows all plaintexts (but not the correspondence to the ciphertexts). This may sound contradictory at firstwhy would someone break into a database which data he knows? However, if security is demonstrated under such weak assumptions, security holds even if the adversary has less background knowledge. Moreover, the adversary may not be interested in breaking a single column (such as age), but in learning the semantic constraints (such as John Doe's age). Although the model only considers a single database column, it is still as secure for a set of columns under arbitrary semantic constraints, since absolutely no additional information is leaked from each column.
(46) A non-interactive definition of the order-preserving encryption scheme is now provided. Let be the security parameter of a standard symmetric encryption scheme such as AES. Later below, an interactive definition allowing for incremental encryptions. An order-preserving encryption scheme .sub.SLF-OPE comprises the following three procedures: kKeyGen(): generates a secret state k according to the security parameter . CEncrypt(, k): computes the set of ciphertexts
for the multi-set of plaintexts
. m.sub.iDecrypt(c.sub.j, k): Computes the plaintext m.sub.i
for ciphertext c.sub.j
using key k.
(47) The security model can now be formulated as a game between an adversary and a challenger. Let n be the number of plaintexts. Our game Game.sub.SCA(,n) for a static ciphertext attack proceeds as follows:
(48) 1. The challenger executes kKeyGen() and stores the key.
(49) 2. The adversary chooses n plaintexts m.sub.i. Let
=(m.sub.0, . . . , m.sub.n-1) be the multi-set of plaintexts. The adversary sends
to the challenger.
(50) 3. The challenger executes Encrypt(
, k) and sends
to the adversary.
(51) 4. The adversary chooses an index 0i<n and outputs its guess .sub.j of c.sub.jEncrypt(m.sub.i, k).
(52) Note that at first it may seem counterintuitive to have the adversary guess a ciphertext. However, any restrictions on the entropy of the plaintexts are intentionally not enforced. Hence, in the reverse definition of the challenge (guessing a plaintext for a ciphertext) the adversary may take advantage of the entropy, e.g. by guessing the most frequent plaintext. Therefore the adversary guesses the reverse association in order to strictly bound the accuracy independent of the plaintext entropy. The ability to guess the corresponding ciphertext in combination with knowledge of the plaintext also implies the ability to decrypt that ciphertext.
(53) Security against a static ciphertext attack is defined as follows. Definition 1: an order-preserving encryption scheme .sub.SLF-OPE is IND-SCA secure against static cipher-text attack if the adversary A's advantage of outputting c.sub.j in Game.sub.SCA(, n) is at most negligibly in better than guessing c.sub.j from , i.e.
(54)
(55) Impact on attacks is now discussed. IND-SCA security is stronger than any previous definition for order-preserving encryption. It implies that a static adversary does not even learn the order of ciphertexts. How to nevertheless perform range queries on the data, is explained later below. In this sense the security model implies a static leakage-free encryption system.
(56) Now revisiting the attacks, first the security model fully captures the attack setup. The adversary is given full ciphertext information and even has perfect background knowledge. Second, the security definition implies that if the adversary is then able to infer even one plaintext better than guessing (from background knowledge) the scheme is broken. Hence, instead of an attack accuracy close to 100%, embodiments can and must expect a much lower accuracy (depending on the entropy of the database column).
(57) All attacks except the sorting attack exploit specific leaked information: frequency of plaintexts. This information is clearly not statically available in the encryption scheme. The sorting attack also exploits other leaked information: the order. This information is also not statically available in the encryption scheme. It is hence consequential that the attacks are thwarted.
(58) An order-preserving encryption scheme that fulfills the IND-SCA security definition is now presented. The system architecture is described and the intuition of the construction is given. A static encryption procedure is first presented that encrypts all plaintexts at once, and second that encryption procedure is extended into an interactive protocol that incrementally encrypts ciphertexts.
(59) The system architecture assumes a client holding the secret key kKeyGen() and server that holds the ciphertexts. The server may hold several multi-sets of ciphertexts managed independently for each database column. In the static version the client performs Encrypt(
, k) and sends
to the server. The client can issue a query Q, e.g. a range query, to the server which responds with a set
.sub.Q of ciphertexts. The query procedure is described later below. In the interactive version the encryption procedure is changed to an interactive protocol. c.sub.j; SEncrypt((m.sub.i, k), (S)): Is a secure protocol between the client and the server. The client's input is the plaintext mi and the secret key k. The server's input is its current state S. The server only learns the ciphertext c.sub.j and updates its state to S. The server does not learn m.sub.i or k.
(60)
(61) The SLF-OPE encryption scheme combines ideas of three previous order-preserving encryption schemes. One provides the basis for managing the order of ciphertexts in a stateful, interactive manner. Of course, this scheme is not secure against the attacks since it is deterministic. A second adds a frequency-hiding aspect. This scheme cannot be used as the basis of an IND-SCA secure scheme, since it partially leaks the insertion order. The leakage of the insertion order is implied in IND-FAOCPA (and IND-OCPA), but detrimental to IND-SCA security. Therefore the frequency-hiding idea needs to be fit into the first scheme. This is done by encrypting the plaintext using a probabilistic procedure and also inserting a ciphertext for each plaintext using random tree traversal. This combined construction would still be only IND-FAOCPA secure. Third and last, a modular order-preserving encryption idea is applied. This idea rotates the plaintexts around a modulus statically hiding the order. However, modular order-preserving encryption has been developed for deterministic order-preserving encryption. In the probabilistic order-preserving encryption the modulus is to be applied to the ciphertexts. This can be done in the static version of the encryption procedure. In the interactive version of the procedure ciphertexts are necessarily mutable and hence the rotation is to be reapplied after each encryption. Details are presented below.
(62) The static encryption procedure is now described. Let RND be a standard, IND-CCA secure, symmetric encryption scheme, e.g. AES in CCM mode. This encryption scheme RND supports the following procedures: KeyGen, Encrypt, and Decrypt. The procedures of the static version of our SLF-OPE scheme are now described.
(63) 1. kKeyGen(): Execute KRND.KeyGen() and uniformly choose 0r<n. Let k=(K, r).
(64) 2. Encrypt(
, k): Execute CRND. Encrypt(m.sub.i, K). Let c.sub.j=(C, j) for j=i+r mod n.
(65) 3. m.sub.iDecrypt(c.sub.j, k): Parse c.sub.j as (C, j). Let m.sub.i=RND.Decrypt(C,K).
(66) A security proof is now provided. As a first step in order to prove security of our static encryption scheme, a simulator is given the view of the adversary without input of the plaintexts. If the adversary's view is independent of its messages, then the encryption scheme cannot leak any information about the plaintexts.
(67) Lemma 2. There exists a simulator S(, n) outputting , such that the adversary A's view
in game Game.sub.SCA is computationally indistinguishable from
. According to the proof, the simulator proceeds as follows. It executes KRND. KeyGen() and uniformly and independently chooses m.sub.i from
(0i<n). It then computes
RND.Encrypt(m.sub.j, K) and sets
, j) (0j<n). The two sets
and
are computationally indistinguishable, since
and
are computationally indistinguishable due to the IND-CCA security of the encryption scheme RND and j ranges 0 to n1 in both cases.
(68) However, computational indistinguishability is not sufficient for IND-SCA security. Hence, as a second step we prove that the structure of the encryption is (sufficiently) independent of the plaintexts.
(69) Lemma 3. Any ciphertext c.sub.j has equal probability to be plaintext m.sub.i. As proof, this follows from the uniform random choice of r (which is also unknown to the adversary).
(70) Together we can formulate the following theorem. Theorem 4. Our static encryption algorithm .sub.SLF-OPE is IND-SCA secure. As proof, since the encryption is independent of the plaintexts (Lemma 2) and each ciphertext has equal probability to be plaintext m.sub.i (Lemma 3), the adversary cannot succeed in game Game.sub.SCA with probability 1/n+1/poly() (sum of the random guess and the probability of the breaking the computational indistinguishability).
(71) The dynamic encryption procedure is now described. The static procedure encrypts all plaintexts at the same time. This leads to a simple provably secure procedure. However, in most practical cases data is added incrementally. Therefore, a dynamic version is presented which can encrypt one plaintext at a time. The dynamic version is a secure protocol between the client and the server. The server only learns the ciphertext, but nothing else.
(72) The basic idea of a dynamic protocol is as follows. The server maintains the list of ciphertextsincluding the IND-CPA encrypted plaintexts. The client and server engage in an interactive binary search. When the client encounters a repeated element it flips a random coin and inserts in a random location. Second, at the end of each insert the server rotates all ciphertexts according to a uniformly chosen value.
(73) Note that in order to be able to execute the binary search the client needs to know the modulus R of the encryption on the plaintext, since it applies a comparison with the middle element. This modulus might not be unique (due to a rotation on the ciphertexts within one plaintext) and the smaller one is chosen in case. The client retrieves the minimum ciphertext after rotation at the server and updates this modulus R.
(74) Three sub-protocols are used with the server: Get(j) retrieves ciphertext c.sub.j from the server, Insert(c.sub.j) inserts ciphertext cj at the server, i.e. before the current position j and all n-j old ciphertexts c.sub.j (jj<n) are updated by incrementing j, and Rotate(r) rotates the elements. The function Random(a) returns a uniformly chosen integer between 0 and aboth inclusive. A procedure of the client in the dynamic encryption protocol is shown in
(75) A static adversary will always see a uniformly random rotation, since it was chosen freshly. Hence, the proof of IND-SCA security extends to the dynamic version. The dynamic encryption protocol has complexity O(n) at the server (due to the rotation), but only complexity O(log n) at the client which can be presumed less computationally powerful.
(76) Querying is now described. The SCA security model for order-preserving encryption only captures encryption and decryption. Nevertheless, without the capability of searchin particular range queriesstronger encryption could be achieved. Furthermore, the efficient and flexible (because retrofittable to existing database management systems) search capabilities of order-preserving encryption distinguish it from searchable encryption. Now described is how to query an order-preserving encrypted database and how to secure the queries against a curious observer.
(77) Searches are only performed on the order of the ciphertext, i.e. j for c.sub.j. The symmetrically encrypted plaintext C can be ignored for searching.
(78) Query rewriting is now described. Two types of queries are considered: equality and range queries. An equality search criterion is where an entry x in the database table is compared to a fixed parameter value v. An entry matches, if x==v. A range search criterion is where an entry x in the database table is compared to two fixed parameter values v and w (w log. v<w). An entry matches, if vxw.
(79) In deterministic encryption equality search can be mapped to equality search on ciphertexts. However, the SLF-OPE is randomized and there may be multiple ciphertexts for a single plaintext. Hence it is needed to rewrite equality search to a range search. Let c.sub.min(v) be the minimal ciphertext of plaintext v and c.sub.max(v) be the maximal ciphertext of plaintext v.
c.sub.min(v)=min(c.sub.j|DECRYPT(c.sub.j,k)=v)
c.sub.max(v)=max(c.sub.j|DECRYPT(c.sub.j,k)=v)
A query of type x==v is then rewritten to c.sub.min(v)xc.sub.max(v).
(80) Similarly range queries on encrypted data need to account for multiple ciphertexts. Let c.sub.max(w) be the maximal ciphertext of plaintext w (similar to before for v). A range query of type vxw is rewritten to c.sub.min(v)xc.sub.max(w).
(81) Special care needs to be taken if the plaintext v (or w) has ciphertexts at the upper and lower range of ciphertexts (i.e. if the modulus divides this plaintext v). In this case, the roles of minimal and maximal ciphertext need to be switched. We can identify this special case, if c.sub.min(v)=c.sub.0 and c.sub.max(v)=c.sub.n-1. Then we need to compute two different limits c.sub.min and c.sub.max which are the highest of the lower range and the lowest of the higher range, respectively.
c.sub.min(v)=max(c.sub.j|.sub.j(0j<j).Math.DECRYPT(c.sub.j,k)=v)
c.sub.max(v)=min(c.sub.j|.sub.j(j<j<n).Math.DECRYPT(c.sub.j,k)=v)
(82) An equality query x==v is rewritten to xc.sub.min(v) V xc.sub.max(v). A range query vxw (where c.sub.min(v)=c.sub.0 and c.sub.max(v)=c.sub.n-1 is the special case for v) is rewritten to xc.sub.max(x) V xc.sub.max(v). Similar rewriting rules for range queries apply, if w is the special case or if the special case plaintext m is in the range vmw.
(83) Despite the necessary rewriting it is emphasized that both query typesequality and rangecan be handled using standard range queries on the ciphertext. This implies that existing database management systems can be used to implement the queries and our SLF-OPE can still be retrofitted into them. Furthermore, advances in speeding up range queries on plaintextssuch as specialized indicescan still be used to speed up queries on encrypted data. These functionality and performance aspects clearly separate order preserving encryption from searchable encryption.
(84) Query monitoring attacks are now described. The above discussion has presented the rewriting rules for searches over the SLF-OPE scheme. While searches are efficient, they leak additional information. Particularly, an equality query leaks c.sub.min(v) and c.sub.max(v) (or the respective special case values the query also leaks whether it is the special case by the comparison operators). Given these values (and the set of ciphertexts ) an adversary can compute the frequency of v. However, hiding the frequency and preventing frequency analysis was a clear objective of SLF-OPE. A discussion below presents how to prevent this leakage.
(85) Another type of leakage has been analyzed. Since a plaintext range query will almost surely never be of type xv V xw, range queries on ciphertext will almost surely never span the modulus r. Hence, the modulus r can be inferred from the queries. A corresponding attack and leakage prevention may use fake queries. Since the approach of fake queries is orthogonal to the approach below, both can be combined in order to provide protection against frequency and modulus leakage.
(86) Differential privacy is now discussed. Differential privacy is a security (or privacy) definition that given a randomized procedure M any two adjacent inputs x and y are distinguishable from the output of M only with bounded probability. Formally we define , -differential privacy as follows. Definition 3: a randomized procedure M is , -differentially private for adjacent inputs x and y if, for all sets of outcomes S:
Pr[M(x)S]=e.sup.Pr[M(y)S]+
(87) A mechanism may turn any procedure into a , -differentially private mechanism by adding non-negative noise only. The main lemma may be restated as follows. Lemma 4. Consider the procedure M(x) that adds noise N[max(0; Laplace(, s)] to a value x>0. Then M is , -differentially private with respect to changes of up to t in x, for parameters:
(88)
(89) Differentially private queries are now discussed. Since our rewritten queries leak the frequency of the plaintexts, the , -differentially private mechanism is now discussed. Noise is added to the query values c.sub.min(v) and c.sub.max(v) before sending them to the server. It is important to restrict to including additional entries in the result set which can be filtered on the client and hence only positive noise can be added. Negative noise would require at least two queries which could be correlated by the attacker (e.g. by their timing).
(90) We determine the parameters for the mechanism. First, since we aim to hide the frequency of the plaintexts, t is the difference between the maximum and minimum frequencies of any plaintext. Second, we can choose parameters and and compute the security parameter S.
(91) Note that the order of the ciphertexts (j for c.sub.j) are continuous integers from 0 to n1 (inclusive). Hence, the noise can be rounded to the next largest integer. Any diluted ciphertext will then be also in the set of ciphertexts.
(92) Note that it is desired to hide the frequency of v, i.e. the difference between c.sub.min and c.sub.max, and not the limit values by themselves. Hence, it is needed to apply noise between both values. Furthermore, in order not to leak additional information by multiple queries the noise is computed using a pseudo-random function with fixed seed. Let y=PRF1(x) be a pseudo-random function where ymax(0,Laplace(; s)). Let y=PRF2(x) be a pseudo-random function where yUniform(0; 1). The following are computed.
{tilde over (c)}.sub.min(v)=c.sub.min(v)PRF2(v).Math.PRF1(v)
{tilde over (c)}.sub.max(v)=c.sub.max(v)+(1PRF2(v).Math.PRF1(v)
A query of type x==v is then rewritten to {tilde over (C)}.sub.min(v)x{tilde over (C)}.sub.max(v).
(93) The noise needed to be added for a database may be estimated based on the maximum distance t of any two frequencies. If we set =0:5 and =15.Math.t, we have <2.sup.10. Then the additional items to be retrieved will be between [0; [18:23.Math.t]] with 90% probability.
(94) Adjustable encryption is now described. Under a concept of adjustable encryption for database, the basic idea is that data is initially encrypted using probabilistic, IND-CPA secure encryption and only on demand decrypted (on the server) to a lower encryption scheme like deterministic or order-preserving encryption. In many aspects the SLFOPE scheme is more secure than deterministic encryption. It provably prevents frequency analysis attacks. However, it is not fully IND-CPA secure and an adjustment from IND-CPA secure encryption is not straightforward. Embodiments offer an adjustable encryption procedure that before adjustment fulfills a stronger, IND-CPA equivalent security model and offers simple adjustment with O(1) communication cost.
(95) The security model is now described. Embodiments again aim at preventing an adversary that has onetime access to the database from learning anything, i.e. it is assumed that the adversary has access to the full database. Furthermore, as before it is assumed that the adversary cannot observe queries or inserts. However, we aim at the equivalent of a chosen plaintext attack and we make an even weaker assumption on the background knowledge of the adversary. Loosely speaking, the adversary may choose the plaintext for each database entry except the challenge ciphertext. We call our security model IND-SCPAindistinguishability under static chosen plaintext attack.
(96) Again, we first give a non-interactive, static version of our adjustable, order-preserving encryption scheme. This is then extended to a dynamic version. An adjustable, order-preserving encryption scheme .sub.ADJ-SLF-OPE comprises the following four procedures: kKeyGen(): Generates a secret state k according to the security parameter . Encrypt(
, k): Computes the set of IND-SCPA secure ciphertexts
for the multi-set of plaintexts M. CAdjust(
, k.sub.ADJ): Computes the set of IND-SCA secure ciphertexts
for the set of IND-SCPA secure ciphertexts using and adjustment key k.sub.ADJ. m.sub.iDecrypt(c.sub.j, k): Computes the plaintext m.sub.i
M for ciphertext c.sub.j
using key k.
(97) We can now formulate our security model as a game between an adversary and a challenger. Our game Game.sub.SCPA(,n) for a static chosen plaintext attack proceeds as follows:
(98) 1. The challenger executes kKeyGen() and stores the key.
(99) 2. The adversary chooses n1 plaintexts m.sub.i. Furthermore, the adversary chooses two plaintext m*.sub.0 and m*.sub.1. The adversary sends m.sub.0, . . . , m.sub.n-2, m*.sub.0, m*.sub.1 to the challenger.
(100) 3. The challenger flips a coin B(0,1). Let m.sub.n-1=m*.sub.B be the n-th plaintext and M=(m.sub.i/0i<n) be set of all plaintexts. The challenger executes Encrypt(M, k) and sends m.sub.i, c.sub.j(0in2) and c.sub.j*, such that m.sub.n-1=m*.sub.B=Decrypt(c.sub.j*, k) to the adversary.
(101) 4. The adversary outputs its guess B.sub.A of B.
(102) We define security against a static chosen plaintext attack as follows. Definition 5: we say an order-preserving encryption scheme .sub.ADJ-SLF-OPE is IND-SCPA secure against static chosen plaintext attack if the adversary A's advantage of outputting a correct guess B.sub.A=B is negligibly in , i.e.:
(103)
(104) A static adjustable encryption procedure is now discussed. The idea of our adjustable encryption procedure is as follows. Instead of storing the sequence 0, . . . , n1 as order we store a binary, sorted tree. The tree defines a total order on the elements and hence the sequence can be deduced from it. In the tree structure ciphertexts can be stored in a random (also physical) order using pointers, yet searching and inserting new elements is efficient (O(log n)). Let m>>n and p.sub.i be a virtual identifier of the i-th plaintext. We can compute p.sub.i using the keyed pseudo-random function p.sub.i=PRF3(i) where 0p.sub.i<m. Our static adjustable order-preserving encryption scheme ADJ-SLF-OPE is then implemented as follows. kKeyGen(): Execute KRN D.KeyGen(), K.sub.ADJRN D.KeyGen(), choose a key for the pseudorandom function PRF3 and uniformly choose 0r<n. Let k=(K, K.sub.ADJ, PRF3, r). Encrypt(M, k): We compute the binary, sorted, balanced tree T from M. Let l.sub.i be the index of the left child of plaintext m.sub.i and r.sub.i be the index of the right child of plaintext m.sub.i. The ciphertext is computed as c.sub.i(PRF3(i), RN D.Encrypt(m.sub.i;K), RN D.Encrypt(PRF3(l.sub.i)/PRF3(r.sub.i), K.sub.ADJ)). CAdjust(C, k.sub.ADJ): Parse each ciphertext c.sub.j as (p.sub.j, C.sub.j, T.sub.j). Compute pt.sub.j; pr.sub.jDecrypt(T.sub.j). Reconstruct the tree, then from the tree the total order and compute the position j in the total order. Compute j=j+r mod n. The adjusted ciphertext c.sub.j is (C.sub.j, j). m.sub.iDecrypt(c.sub.j, k): Parse each ciphertext c.sub.j as (p.sub.j,C.sub.j,T.sub.j). Compute m.sub.iRND.Decrypt(C.sub.j, K).
(105) A security proof is now provided. In order to prove security of the static adjustable encryption scheme, we follow the same strategy as for IND-SCA security. First, we give a simulator of the view of the adversary without input of the challenge plaintexts.
(106) Lemma 8. Let =(m.sub.0, . . . , m.sub.n-2) be the set of chosen plaintexts without the challenge ciphertext. There exists a simulator S(,
) outputting
, such that the adversary A's view
in game Game.sub.SCPA is computationally indistinguishable from
. As proof the simulator proceeds as follows. It executes kKeyGen() and uniformly and independently chooses m from
(0i<n). It then computes
and
are computationally indistinguishable, since all ciphertexts are composed of a pseudo-random number and two IND-CCA secure ciphertexts in both cases.
(107) Then we prove structural independence of the encryption as follows. Lemma 9. Plaintext m.sub.n-1 always has ciphertext c.sub.n-1. As proof, this follows from the construction of the simulator in Lemma 8.
(108) Theorem 10. Our static adjustable encryption algorithm .sub.ADJ-SLF-OPE is IND-SCPA secure. As proof, since the encryption is independent of the challenge plaintext (Lemma 8) and position of challenge ciphertext c.sub.n-1 is independent of all plaintexts (Lemma 9), the adversary cannot succeed in game Game.sub.SCPA with probability +1/poly() (sum of guessing the challenge plaintext and the probability of the breaking the computational indistinguishability).
(109) A dynamic adjustable encryption procedure is now described. In order to turn our static adjustable encryption procedure into a dynamic one we replace the tree T with a self-balancing, binary tree, e.g. a red-black tree. The advantage of a self-balancing tree is that search and insert operations remain efficient.
(110) On the server we store a map of the virtual addresses p.sub.j and the ciphertexts c.sub.j. Each ciphertext c.sub.j corresponds to one node in the (self-balancing) tree. We access each node by the operations Get(p.sub.j) for retrieving a node, Update(c.sub.j) for replacing a ciphertext c.sub.j with c.sub.j at virtual address p.sub.j, and Insert(c.sub.j) for inserting a new ciphertext c.sub.j. We run this procedure on the client as part of the dynamic encryption procedure. Each access to a tree node is replaced by one of the operations above. Of course, each node is decrypted after retrieval and encrypted before update. Our dynamic encryption procedure then proceeds as follows: c.sub.n-1Encrypt(m.sub.n-1, k): Prepare an initial ciphertext c(PRF3(n1), RN D.Encrypt(m.sub.n-1, K), RND.Encrypt(1/1, K.sub.ADJ)). Insert c.sub.i into the self-balancing tree by running the insert procedure on the client replacing all accesses to tree nodes by calls to the server.
(111) It is easy to see that a static adversary will see the same view as in the static version of the protocol. Since the adversary's view in Game.sub.SCPA is static and does not include the access pattern during the encryption, all ciphertexts are indistinguishable. Each ciphertext is ordered by and addressed by the same virtual addresses and contains two additional IND-CCA secure ciphertexts. Hence, our dynamic adjustable encryption procedure is also IND-SCPA secure.
(112) The examples just provided are given for purposes of illustration only, and embodiments are not limited to them. Thus while
(113) In certain embodiments the server engine may be implemented by a database engine, for example as present in an in-memory database. One example of such an in-memory database engine is that of the HANA in-memory database available from SAP SE of Walldorf, Germany.
(114) For example,
(115) An example computer system 600 is illustrated in
(116) Computer system 610 may be coupled via bus 605 to a display 612, such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user. An input device 611 such as a keyboard and/or mouse is coupled to bus 605 for communicating information and command selections from the user to processor 601. The combination of these components allows the user to communicate with the system. In some systems, bus 605 may be divided into multiple specialized buses.
(117) Computer system 610 also includes a network interface 604 coupled with bus 605. Network interface 604 may provide two-way data communication between computer system 610 and the local network 620. The network interface 604 may be a digital subscriber line (DSL) or a modem to provide data communication connection over a telephone line, for example. Another example of the network interface is a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links are another example. In any such implementation, network interface 604 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.
(118) Computer system 610 can send and receive information, including messages or other interface actions, through the network interface 604 across a local network 620, an Intranet, or the Internet 630. For a local network, computer system 610 may communicate with a plurality of other computer machines, such as server 615. Accordingly, computer system 610 and server computer systems represented by server 615 may form a cloud computing network, which may be programmed with processes described herein. In the Internet example, software components or services may reside on multiple different computer systems 610 or servers 631-635 across the network. The processes described above may be implemented on one or more servers, for example. A server 631 may transmit actions or messages from one component, through Internet 630, local network 620, and network interface 604 to a component on computer system 610. The software components and processes described above may be implemented on any computer system and send and/or receive information across a network, for example.
(119) The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the invention as defined by the claims.