METHOD FOR MULTI-USER CONFIDENTIAL QUERYING OF THE PRESENCE OF A RECORD IN A DATABASE
20240054136 ยท 2024-02-15
Inventors
- Aymen Boudguiga (Gif-sur-Yvette, FR)
- Renaud SIRDEY (GIF-SUR-YVETTE, FR)
- Oana Stan (GIF-SUR-YVETTE, FR)
- Martin ZUBER (GIF-SUR-YVETTE, FR)
Cpc classification
H04L9/065
ELECTRICITY
International classification
Abstract
A method for confidentially querying the presence of a record in a database hosted by a server, the records being stored in the database in the form of digital footprints obtained by hashing a record by a public hash function. The footprints are masked by a stream cipher using a symmetric key of a first user. The first user may grant a second user authorisation to query the database by transmitting the inverse masks of various rows, encrypted by the public key of an additive homomorphic cryptosystem of the second user. The rows of the database are unmasked in the homomorphic domain and the second user transmits an encrypted request to query the base according to a PIR protocol. The second user can decrypt the response from the server using the private key of their homomorphic cryptosystem and determine whether the footprint sought is present in the response thus decrypted.
Claims
1. Method for confidentially querying the presence of a record in a database hosted by a server, said records being stored in the database in the form of digital footprints, each digital footprint being obtained by hashing a record by means of a public hash function and each row of the database containing the digital footprints sharing their m>1 first bits, wherein: a first user has a stream cipher symmetric key and a second user has an at least additive homomorphic cryptosystem; the rows of the database are encrypted by means of said stream cipher by adding masks each having the size of a row of the database; the first user authorises the access to the database to the second user by encrypting the inverse masks by means of the public key of said homomorphic cryptosystem and by transmitting the inverse masks thus encrypted to the server; the rows of the database are transcrypted in the homomorphic domain by adding to said rows the inverse masks thus encrypted; the second user computes the digital footprint of the record by means of said public hash function and deduces therefrom an integer value, q.sub.m, corresponding to the m first bits; the second user constructs a request in the form of a first vector of size 2.sup.m consisting of a homomorphic ciphertext of the value 1 in position q and of homomorphic ciphertexts of the value 0 in the other positions, the request being transmitted to the server; the server evaluates in the homomorphic domain the scalar product between the first vector and a second vector of size 2.sup.m consisting of the encrypted rows of the database, the result being transmitted as a response to the second user; the second user decrypts said response by means of the private key of their homomorphic cryptosystem and deduces therefrom the plaintext content of the row q.sub.m of the database; the second user searches whether the digital footprint of the record is among the digital footprints of the row q.sub.m thus obtained.
2. Method for confidentially querying the presence of a record in a database according to claim 1 wherein the stream cipher uses an encryption primitive selected from AES-CTR, Trivium and Grain.
3. Method for confidentially querying the presence of a record in a database according to claim 1, wherein the cryptographic public hash function is SHA-3.
4. Method for confidentially querying the presence of a record in a database according to claim 1, wherein the homomorphic cryptosystem is an additive homomorphic encryption without being FHE.
5. Method for confidentially querying the presence of a record in a database according to claim 4, wherein the additive homomorphic encryption uses a public key Paillier cryptosystem (g, n) with n=pg where p, q are two prime numbers.
6. Method for confidentially querying the presence of a record in a database according to claim 5, wherein the server returns as response a first and a second ciphertext stored in the line q.sub.m, the second user performing a decryption of each of the first and second ciphertexts by means of the private key of its homomorphic cryptosystem, then performs a decryption by means of this private key of the results thus obtained after having concatenated them.
7. Method for confidentially querying the presence of a record in a database according to claim 5, wherein the additive homomorphic encryption uses a plurality N of ciphertexts of 0 by a Paillier encryption, h.sub.i,i=1, . . . , N, the additive homomorphic encryption of an integer being obtained by:
c=g.sup..sub.i=1.sup.N max (1, u.sub.ih.sub.t) mod n.sup.2 where .sub.i, i=1, . . . , N are random bits selecting said ciphertexts of 0.
8. Method for confidentially querying the presence of a record in a database according to claim 7, wherein the ciphertexts of 0 are computed once for all and stored by the second user.
9. Method for confidentially querying the presence of a record in a database according to claim 1, wherein the transcryption of the rows of the database is carried out once for all by the server when the second user has been authorised to query the database by the first user.
10. Method for confidentially querying the presence of a record in a database according to claim 1, wherein the transcryption of the rows of the database is carried out dynamically by the server upon each query of the second user.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030] Other features and advantages of the invention will appear upon reading a preferred embodiment of the invention, made with reference to the appended figures wherein:
[0031]
[0032]
[0033]
DESCRIPTION OF THE EMBODIMENTS
[0034] Firstly, we will consider a use case representative of the application context of the present invention. In this context, a client (also called user) can query a database hosted by a remote server. For this purpose, the client sends a request having as argument a record likely to be present in the base and receives in return information indicating whether or not this record is effectively present. More generally, a plurality of clients are likely to query the database and receive in return an indicator of the presence of the records sought. One or more administrators can further proceed to add or delete records in the base. The client stations and the administrator stations collectively form the client domain. As the remote server can itself use clusters of servers distributed in the Cloud, the servers that host the database collectively form the server domain.
[0035] In the use case considered, the database, although hosted in the server domain remains the property of the client domain. In other terms, the confidentiality of the data present in the base must be guaranteed with respect to the server while making it possible for the clients of the client domain to query the latter confidentially. The confidentiality constraints to be respected between the client domain and the server domain can be stated as follows: [0036] confidentiality of the records stored in the database hosted in the server domain with respect to the server; [0037] confidentiality of the requests from the client domain with respect to the server; [0038] confidentiality of the responses (indicators of presence of the records) sent back to the clients, with respect to the server.
[0039]
[0040] In this use case, a first user A having a terminal 110 desires to make it possible for a second user B, having a terminal 120, to query, confidentially, a database, 130, hosted by a remote server, 140. We will assume that the various actors (users and server) are curious but semi-honest, that is to say follow the protocol that is imposed on them.
[0041] The records can include a plurality of fields, and consist of, for example, alphanumeric data. For example, each record may comprise the name, first name, gender, date and place of birth, address of a person. It is essential to note that this information is not however stored as such in the base but only in the form of digital footprints. Unlike the prior art, these footprints are obtained by a public cryptographic hash function, for example the SHA-3 hash function. Public hash function means a hash function that does not use confidential elements such as a secret key.
[0042] The user A or an administrator of the client domain may compute the digital footprints of the records to be stored. The digital footprints all have the same size T, for example 256 or 512 bits. Only a portion of the bits of the digital footprint provided by the hash function may be kept, for example 64 bits. For the sake of simplification, these reduced footprints will be assimilated to the digital footprints within the previous meaning, as the size reduction of the footprint can be taken into account in the hash function.
[0043] The database is row oriented and the digital footprints are advantageously grouped by rows. The various rows are of size L, in other words each row can comprise up to L footprints. If applicable, dummy bits may be provided to complete an incomplete row. The grouping of footprints is carried out so that the footprints of the same row possess the same m first (or alternatively last) bits, in other words the footprints represent hash values that collide on their m first bits, that is to say, in an equivalent manner; that they belong to the same collision class on these m first bits.
[0044] Preferably, the value m is selected, such as K>2.sup.m (or even K>>2.sup.m) and K<2.sup.mL where K is the total number of records in the base. The first condition makes it possible to have collision classes that are not reduced to singletons, in other words to obtain a sufficiently high L value. The second condition makes it possible to classify all the footprints of the base.
[0045] The database consists of 2.sup.m rows (some of which may be empty), the content of each row i=1, . . . , 2.sup.m able to be represented by an integer .sub.i of
.sub.n=
/n
with log.sub.2 nLT. For reasons explained below, advantageously n=pg will be selected where p and q are prime numbers. The rows can be considered as integers taking their values in the set {0, . . . ,n1}.
[0046] Originally, each integer .sub.i is encrypted with the aid of a stream cipher with the aid of a symmetric key belonging to the user A. It is reminded that a stream cipher is an encryption which the message to be encrypted is simply added to a sequence of bits generated on the basis of a symmetric key, here noted s.sub.k.sup.A. To this end, encryption primitives known by the person skilled in the art may be used such as AES-CTR (Advanced Encryption Standard in Counter Mode), Grain, Trivium, etc.
[0047] Regardless of the stream cipher algorithm retained, the words .sub.i of the rows i=1, . . . , 2.sup.m are encrypted (or masked) by the user A by computing the sums (
.sub.i+b.sub.i) mod n, i=1, . . . 2.sup.m where the integers b.sub.i are extracted from the sequence of bits generated by s.sub.k.sup.A. It should be noted that the sequence of words b.sub.i, also referred to as keystream, does not need to be stored by the user A given 15 that it can be regenerated at any time on the basis of the symmetric key s.sub.k.sup.A. The integers b.sub.i are also called masks because they make it possible to make the content of the database confidential with respect to the server.
[0048]
[0049] The structure of the base before stream cipher is represented in 210. Each of the rows i=1, . . . , 2.sup.m is formed by a word .sub.i of size log.sub.2 n bits and comprises at most L digital footprints E.sub.ij, j=1, . . . , L of size T.
[0050] The structure of the base after stream cipher is shown in 220. It comprises the same number of rows, each row i=1, . . . , 2.sup.m being of size log.sub.2 n and formed by the masked word (.sub.i+b.sub.i) mod n.
[0051] The idea behind the present invention is to transcrypt the rows thus encrypted by means of an (at least) additive homomorphic cryptosystem.
[0052] It is reminded that a homomorphic encryption makes it possible to perform operations (in practice arithmetic operations of adding or multiplying) on data without ever revealing them. More specifically, an additive homomorphic encryption is an asymmetric key encryption Enc.sub.pk_H of public key pk_H) verifying the following property:
[Math. 2]
[0053]
Enc.sub.pk_H:
.fwdarw.
Dec.sub.sk_H[Enc.sub.pk_H(a) Enc.sub.pk_H(b)]=a+b(1)
where is the plaintext message space (more simply referred to as plaintext space) and is the encrypted message space (more simply referred to as ciphertext space), + an additive operation in the plaintext space conferring to a group structure, an internal addition operation in the ciphertext space conferring to a group structure. Thus, it is understood that the application of (, +) in (, ) is a group homomorphism. Dec.sub.sk_H is the decryption function corresponding to Enc.sub.pk_H where sk_H is the secret key of the user).
[0054] The result of the expression (1) is that it is possible to perform an additive operation between two plaintexts (a, b) on the basis of a corresponding operation between their ciphertexts (Enc.sub.pk_H(a), Enc.sub.pk_H(b)).
[0055] An additive homomorphic encryption has an external multiplication operation of in , noted .Math..sub.ext, such as
[Math. 3]
[0056]
Dec.sub.sk_H[a.Math..sub.ext Enc.sub.pk_H(b)]=ab(2-1)
[0057] This internal addition operation is nothing more than a consequence of the expression (1) insofar as a .Math..sub.ext means that the internal addition is applied a times.
[0058] In other words, the external multiplication operation makes it possible to multiply a plaintext by a ciphertext to obtain a ciphertext. It will be noted that this operation carries out an absorption in the homomorphic domain.
[0059] Similarly, it is always possible to define an external addition operation of in , noted .Math..sub.ext, by means of:
[Math. 4]
[0060]
a.sub.ext Enc.sub.pk_H(b)=Enc.sub.pk_H(a) (61 ) Enc.sub.pk_H(b)(2-2)
in which case:
[Math. 5]
[0061]
Dec.sub.sk_H[a.sub.ext Enc.sub.pk_H(b)]=a+b(2-3)
[0062] In some cases, the cryptosystem natively has an external addition operation. This is particularly and advantageously the case for the Paillier cryptosystem.
[0063] Without loss of generality, we will assume in the following that the additive homomorphic cryptosystem is a Paillier cryptosystem.
[0064] It is assumed in the following that each user has their own additive homomorphic cryptosystem. The private key-public key pair of the homomorphic cryptosystem of the user A is noted (sk.sup.A_H, pk.sup.A_H) and that of B is noted (sk.sup.B_H, pk.sup.B_H).
[0065] Transcryption is a cryptographic technique making it possible to change data encrypted by a first cryptosystem to the same data encrypted by a second cryptosystem (inevitably homomorphic), without passing through an intermediate step of decrypting in the plaintext space.
[0066] In the present case, the transcryption operation makes it possible to change from a stream ciphertext to a ciphertext in the homomorphic domain. More specifically, if the homomorphic ciphertext of x is noted [x].sub.B by means of the public keypk.sup.B_H, the following is obtained:
[Math.6]
[0067]
(.sub.i+b.sub.i)(.sub.ext [b.sub.i].sub.B=[
.sub.i+b.sub.ib.sub.i].sub.B=[
.sub.i].sub.B(3)
that is to say the homomorphic ciphertext of the row in question.
[0068] In practice, when the user A desires to authorise the user B to query their database, they encrypt the inverse masks, that is to say the words b.sub.i i=1, . . . , 2.sup.m (the masks b.sub.i being from the keystream), by means of the public key of the additive homomorphic cryptosystem of B then transmit them to the server which performs the unmasking operation (3) on the rows of the database. This unmasking operation in the homomorphic domain may be carried out once for all in offline mode or on the contrary, on the fly, whenever the user B asks the user A to access the database (owned by A). It will be noted that the operation (3) protects the 5 confidentiality of data with respect to the server because the latter does not have access to the first term of the expression (due to the masking) or to the second due to the homomorphic encryption.
[0069] According to one alternative embodiment, instead of the operation (3), the server can overencrypt the values (.sub.i+b.sub.i) of the base with the aid of the public key pk.sup.B_H to obtain the homomorphic ciphertext [
.sub.i+b.sub.i].sub.B. The homomorphic ciphertext of the row i can then be obtained by means of the internal addition operation:
[Math. 7]
[0070]
(.sub.i+b.sub.i).sub.ext[b.sub.i].sub.B=[
.sub.i+b.sub.ib].sub.B=[
.sub.i].sub.B(4)
[0071] Returning back to .sub.i].sub.B, i=1, . . . , 2.sup.m. Each ciphertext [
.sub.i].sub.B consequently consists of two words of log.sub.2 (n), noted a.sub.i.sup.1 and .sub.i.sup.2, stored in the row i.
[0072] The user B can then transmit to the server a request concerning the presence of a given record, according to a PIR-type protocol. For this purpose, the user B computes the digital footprint of the record sought by means of the public hash function having served for constructing the base. The integer value associated with the digital footprint is noted q and the integer value associated with the m first bits of this footprint is noted q.sub.m. The request is formed by a vector u of 2.sup.m encrypted with a ciphertext of the value 1 in position q.sub.m, the remainder consisting of ciphertexts of the value 0. In other words:
[Math. 8]
[0073]
Dec.sub.sk.sub.
where u.sub.i is the i.sup.th element of the vector u and is the Kronecker symbol.
[0074] For any i=1, . . . , 2.sup.m, the server computes U.sub.i .Math..sub.ext .sub.i.sup.1 and u.sub.i .Math..sub.ext .sub.i.sup.2, without taking into account the fact that the values and fL are ciphertext portions, then computes the following sums in the homomorphic space:
in application of the expressions (1) and (2). It will be noted that these expression correspond to an evaluation in the homomorphic domain of a scalar product in the plaintext domain.
[0075] The homomorphic ciphertexts [.sub.q.sup.1m].sub.B and [.sub.q.sup.2m].sub.B form the response that is returned to the user B. It will be understood that the server can under no circumstances determine the row (q.sub.m) that has responded or a fortiori the content of the response.
[0076] The user B then decrypts [.sub.q.sup.1m].sub.B and [.sub.q.sup.2m].sub.B by means of their private key, sk.sup.B_H, then concatenates the values obtained .sub.q.sup.1, and .sub.q.sup.2m to form the ciphertext of the row:
[Math.11]
[0077]
[.sub.qm].sub.B=.sub.q.sup.1m|.sub.q.sup.2m(7)
which is, in turn, decrypted with the private key sk.sup.B_H to provide the list of footprints E.sub.qm.sub.j, j=1, . . . , L initially stored in the row q.sup.m of the base.
[0078] The user can finally verify whether the digital footprint of the record sought is part of the list of footprints E.sub.qm.sub.j, j=1, . . . , L thus obtained.
[0079]
[0080] The left section of the figure corresponds to the terminal of the user B, and the right section to the server hosting the database. It is assumed that the database has been encrypted beforehand by means of a keystream generated on the basis of a symmetric key of A, s.sub.k.sup.A. More specifically, each row i of the base has been encrypted beforehand by means of a word b.sub.i from this keystream.
[0081] The terminal B has an additive homomorphic cryptosystem (sk.sup.B_H, pk.sup.B_H) in 310 and the hosting server of the database has received from the user A, in 315, the homomorphic ciphertexts [b.sub.i].sub.B obtained by encryption on the basis of the public key of B, pk.sup.B_H.
[0082] In step 325, the server performs a transcryption of the rows of the database by means of (.sub.i+b.sub.i).sub.ext[b.sub.i].sub.B=[
.sub.i].sub.B, i=1, . . . , 2.sup.m. This transcryption converts the rows encrypted by a symmetric key encryption s.sub.k.sup.A into rows encrypted by an additive homomorphic encryption by means of the public key pk.sup.B_H.
[0083] In step 330, the user B searching in the base for a given record, computes their digital footprint by means of the public hash function having served for constituting the base, and deduces therefrom the integer value q.sub.m associated with the first m bits of this footprint.
[0084] In step 340, the user B constructs a request R(q.sub.m) in the form of a vector u of 2.sup.m encrypted with a homomorphic ciphertext of the value 1 in position q.sub.m, the other elements of this vector being homomorphic ciphertexts of the value 0:
[Math.12]
[0085]
u=([0].sub.B, . . . , [0].sub.B, [0].sub.B, . . . , [0].sub.B)(8)
[0086] The request R(q.sub.m) is then transmitted to the server.
[0087] In step 345, the server evaluates in the homomorphic domain the scalar product between the vector u and the vector d of size 2.sup.m consisting of the encrypted rows [.sub.i].sub.B, i=1, . . . , 2.sup.m:
[Math. 13]
[0088]
d=([.sub.1].sub.B, . . . , [
.sub.2m].sub.B)(9)
and sends the result S(q m) back to the user B. In the case where the additive homomorphic cryptosystem is a Paillier cryptosystem, the result may consist of two homomorphic ciphertexts stored at the row q.sub.m, [.sub.q.sup.1m].sub.B and [.sub.q.sup.2m].sub.B, as explained above.
[0089] In step 350, the user B performs a decryption of the response received, by means of their private key sk.sup.B_H to obtain the content of the row q.sub.m in plaintext, i.e. .sub.qm. It is reminded that, in the case of a Paillier system, this step comprises three decryption steps: decryption of the ciphertexts [.sub.q.sup.1m].sub.B and [.sub.q.sup.2m].sub.B, then decryption of the concatenated results [
.sub.qm].sub.B=.sub.q.sup.1m|.sub.q.sup.2m. In the case of a fully homomorphic cryptosystem or Fully Homomorphic Encryption (FHE), a single decryption operation is necessary for the user B, the first operation able to be carried out in the homomorphic domain by the server according to a technique referred to as bootstrapping.
[0090] Finally, in step 360, the user B determines whether the digital footprint sought is among the footprints E.sub.qm.sub.j, j=1, . . . , L of t he row .sub.qm.
[0091] Although the present invention has been described in the case of an additive homomorphic cryptosystem, the person skilled in the art will understand that it applies a fortiori in the case of a FHE cryptosystem. It is reminded that a FHE is a group homomorphism both in accordance with an internal addition law and with an internal multiplication law. The homomorphic encryption (additive or FHE) may be based on a Learning With Errors (LWE) scheme, in a manner known per se.
[0092] Generally, an additive homomorphic cryptosystem will be preferred, because leading to simpler computations than in the case of a FHE cryptosystem.
[0093] According to a first example of embodiment already mentioned, the additive homomorphic cryptosystem may be a Paillier cryptosystem.
[0094] It is reminded that the public key of a Paillier cryptosystem is defined by a pair of integers (g, n) and that the ciphertext of a message .sub.n is given by:
[Math. 14]
[0095]
c=g.sup.r.sup.n mod n.sup.2(10) [0096] where r.sub.n is a random variable
[0097] It will be noted that the computation of the scalar product in step 345 only involves ciphertexts of 1 and of 0 and that consequently the cost of the encryption is essentially dominated by the computation of the term r.sup.n n mod n.sup.2, and is therefore generally high as a result of high values of the exponent n. It is reminded that n =pq where p, q are two prime numbers of large size.
[0098] Advantageously, a simpler encryption may alternatively be selected, by precomputing ciphertexts of 0 with N various random variable values, i.e.:
the ciphertext of a message .sub.n being given by:
[Math. 16]
[0099]
c=g.sup..sub.i=1.sup.N max (1, u.sub.ih.sub.i) mod n.sup.2(12)
where u.sub.i, i=1, . . . , N are random bits aiming to select said ciphertexts of 0. It will be understood that the expression max(1, u.sub.ih.sub.i) leaves the product invariant when the ciphertext of 0, h.sub.i, is not selected. The operation (13) is therefore the same in the homomorphic domain with the addition to the message of a random number =.sub.i=1.sup.N u.sub.i of 0. The values h.sub.i, i=1, . . . , N can be computed once for all and stored by the second user. These values h.sub.i can even be public. In any case, the additive homomorphic encryption simply involves selecting bits u.sub.i and performing the computation of the expression (13), the latter not requiring modular exponentiation. Of course, the bits u.sub.i may be, for their part, single-use and therefore do not need to be stored by the second user.