CRYPTOGRAPHIC PSEUDONYM MAPPING METHOD, COMPUTER SYSTEM, COMPUTER PROGRAM AND COMPUTER-READABLE MEDIUM

Abstract

The invention is a cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating pseudonymised data from entity data originating from data sources (DS.sub.i), wherein the data are identified at the data sources (DS.sub.i) by entity identifiers (D) of the respective entities, and wherein the pseudonymised data are identified by pseudonyms assigned to the respective entity identifiers (D) applying a one-to-one mapping. Furthermore, the invention is a computer system implementing the method, and a computer program and a computer-readable medium.

Claims

1. A cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating pseudonymised data from entity data originating from data sources (DS.sub.i), wherein the data are identified at the data sources (DS.sub.i) by entity identifiers (D) of the respective entities, and wherein the pseudonymised data are identified by pseudonyms (P) assigned to the respective entity identifiers (D) applying a one-to-one mapping, characterised by applying, for a number n of data sources (DS.sub.i) more than one, a number k of mappers (M.sub.j), with an algebraic group (G) having an order , and within that, a generator element (g) of the algebraic group (G) being predetermined with respect to the mappers (M.sub.j) and data sources (DS.sub.i), and with a set (H) being also predetermined, said set (H) being a subset of integers being coprime to that forms an algebraic group with regard to modulo multiplication, furthermore, a mapping (h) assigning an integer value to each entity identifier (D) with respect to the mappers (M.sub.j) and data sources (DS.sub.i) being also predetermined, and for each index j=1, 2, . . . , k: i. the actual mapper (M.sub.j) selecting for itself, in a random manner, an integer x.sub.j and an integer .sub.j, ii. the actual mapper (M.sub.j) selecting an integer b.sub.j from the set (H) in a random manner, for each index j=1, 2, . . . , k the actual mapper (M.sub.j) generating, cooperating with the other mappers (M.sub.j), such a first ElGamal public key (R.sub.j) that corresponds to the private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k, and storing the generated first ElGamal public key (R.sub.j) by the actual mapper (M.sub.j), for each index i=1, 2, . . . , n the actual data source (DS.sub.i) generating, in cooperation with the mappers (M.sub.j), such a second ElGamal public key (S.sub.i) that corresponds to the private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k, and storing the generated second ElGamal public key (S.sub.i) by the actual data source (DS.sub.i), and, in the course of the pseudonymisation of an entity identifier (D) by a data source (DS.sub.i), the data source (DS.sub.i) i. calculating an ElGamal cipher (C.sub.1) applying one of the following two alternatives: $C_{1} = {ElGamalEnc}_{S_{i}} (h (D))$ $or$ $C_{1} = {(E_{i})}^{h (D)},$ wherein, in the case of the first alternative, the range of the function h is a subset of the set (H), and in the case of the second alternative, for each index i=1, 2, . . . , n the actual data source (DS.sub.i) generates, in cooperation with the mappers (M.sub.j), such an ElGamal cipher (E.sub.i) of the value b.sub.1.Math.b.sub.2.Math. . . . .Math.b.sub.k that can be decrypted utilizing the ElGamal private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k: $ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. ... .Math. x_{k}} (E_{i})) b_{1} .Math. b_{2} .Math. ... .Math. b_{k} \mod$ ii. selecting a number .sub.1 from the numbers 1, 2, . . . , k in a random manner, and iii. sending the ElGamal cipher (C.sub.1) to the mapper (M.sub..sub.1) that corresponds to the number .sub.1, for each ElGamal cipher (C.sub.1) received in the system, the mappers (M.sub.j) carrying out the following operations for each index j=1, 2, . . . , k in the following order: i. the actual mapper (M.sub..sub.j) checks if both components of the ElGamal cipher C.sub.j are elements of the set (H), and continues the process only in case the result of the check is positive, ii. the actual mapper (M.sub..sub.j) calculates the subsequent ElGamal cipher (C.sub.j+1): $C_{j + 1} = {ElGamalRerand}_{R_{_{j}}} ({(C_{j})}^{_{_{j}}}),$ iii. if j<k, then the actual mapper (M.sub..sub.j) randomly selects from the numbers 1, 2, . . . , k such a number .sub.j+1 that is not among the numbers .sub.1, .sub.2, . . . , .sub.j, and then sends in a message the subsequent ElGamal cipher (C.sub.j+1) to the mapper (M.sub..sub.j+1) corresponding to the number .sub.j+1, iv. if j=k, then the actual mapper (M.sub..sub.j) randomly chooses a number custom-character .sub.1 from the numbers 1, 2, . . . , k, and sends in a message to the mapper () corresponding to the number .sub.1 the following information: $Z_{1} = g$ $K_{1} = R_{_{k}}$ $U_{1} = C_{k + 1}$ thereafter the mappers () carrying out, for each index j=1, 2, . . . , k, in this order, the following operations: i. the actual mapper ( custom-character ) checks if both components of the ElGamal cipher U.sub.j are elements of the set (H), and continues the process only in case the result of the check is positive, ii. the actual mapper () checks if both components of the ElGamal public key K.sub.j are elements of the set (H), and continues the process only in case the result of the check is positive, iii. the actual mapper ( custom-character ) randomly chooses an integer from the set (H) and determines an integer f.sub.j such that: e.sub.j.Math.f.sub.j1 mod iv. the actual mapper () calculates the followings: $a value (Z_{j + 1}) : Z_{j + 1} = {(Z_{j})}^{e_{j}},$ $a key (K_{j + 1}) : K_{j + 1} = ElGamalKeyRerand (K_{j} x_{_{j}}), and$ $a cipher (U_{j + 1}) : U_{j + 1} = {ElGamalRerand}_{K_{j + 1}} ({ElGamalPartialDec}_{x_{_{j}}} (U_{j}) .Math. f_{j}),$ v. if j<k, then the actual mapper ( custom-character ) randomly chooses from the numbers 1, 2, . . . , k such a number .sub.j+1 that is not among the numbers .sub.1, .sub.2, . . . , .sub.j, and then sends in a message to the mapper () corresponding to the chosen number .sub.j+1 the value (Z.sub.j+1), the key (K.sub.j+1) and the cipher (U.sub.j+1) generated in the previous step, vi. and, if j=k, then the actual mapper ( custom-character ) generates the pseudonym (P) corresponding to the entity identifier (D): $P = {(Z_{k + 1})}^{ElGamalResolve (U_{j + 1})}$

2. The method according to claim 1, characterised in that, in the course of generating the first ElGamal public keys (R.sub.j), for each index j=1, 2, . . . , k: vii. the actual mapper (M.sub.j) randomly chooses an integer r.sub.j from the set (H), viii. the actual mapper (M.sub.j) randomly chooses an integer number t.sub.j, ix. utilizing the chosen number t.sub.j, the actual mapper (M.sub.j) generates an ElGamal public key (K.sub.j,1) corresponding thereto: K.sub.j,1=(r.sub.j, (r.sub.j).sup.t.sup.j mod ), x. if j>1, then the actual mapper (M.sub.j) sends to the first mapper (M.sub.1) the ElGamal public key (K.sub.j,1) corresponding to the chosen number t.sub.j, xi. for each index =1, 2, . . . , k, in this order, the actual mapper (M.sub.) checks if both components of the actual ElGamal public key (K.sub.j,) are elements of the set (H), and, if the result of the check is positive, it calculates therefrom the subsequent ElGamal public key (K.sub.j,+1): K.sub.j,+1=ElGamalKeyRerand(K.sub.j,.Math.x.sub.), and if <k, sends it to the subsequent mapper (M.sub.+1), xii. the last mapper (M.sub.k) sends to the actual mapper (M.sub.j) the subsequent ElGamal public key (K.sub.j,k+1), and xiii. the actual mapper (M.sub.j) generates and stores the first ElGamal public key (R.sub.j): $R_{j} = K_{j, k + 1} t_{j} .$

3. The method according to claim 1, characterised in that, in the course of generating the second ElGamal public keys (S.sub.i), for each index i=1, 2, . . . , n: i. the actual data source (DS.sub.i) randomly chooses an integer number s.sub.i from the set (H), ii. the actual data source (DS.sub.i) randomly chooses an integer number u.sub.i iii. the actual data source (DS.sub.i) generates the ElGamal public key (L.sub.i,1) that corresponds to the chosen numbers: L.sub.i,1=(s.sub.i, (s.sub.i).sup.u.sup.i mod ), iv. the actual data source (DS.sub.i) sends to the first mapper (M.sub.1) the ElGamal public key (L.sub.i,1) that corresponds to the chosen numbers, v. for each index =1, 2, . . . , k, in this order, the actual mapper (M.sub.) checks if both components of the received ElGamal public key (L.sub.i,) are elements of the set (H), and, if the result of the check is positive, it calculates the subsequent ElGamal public key (L.sub.i,+1): L.sub.i,+1=ElGamalKeyRerand(L.sub.i,.Math.x.sub.), and if <k, sends it to the subsequent mapper (M.sub.+1), vi. the last mapper (M.sub.k) sends to the actual data source (DS.sub.i) the subsequent ElGamal public key (L.sub.i,k+1), and vii. the actual data source (DS.sub.i) generates and stores the second ElGamal public key $(S_{i}) : S_{i} = L_{i, k + 1} u_{i} .$

4. The method according to claim 1, characterised in that the ElGamal ciphers (E.sub.i) adapted for being decrypted utilizing the ElGamal private key are generated as follows: For each index i=1, 2, . . . , n: i. the data source (DS.sub.i) randomly chooses a value .sub.i,0 from the set (H). ii. the data source (DS.sub.i) generates the cipher (.sub.i,1) corresponding to the chosen $value :_{i, 1} = {ElGamalEnc}_{S_{i}} (_{i, 0}),$ iii. the data source (DS.sub.i) sends the cipher (.sub.i,1) to the first mapper (M.sub.1), iv. for each index j=1, 2, . . . , k, in this order, the actual mapper (M.sub.j) checks if both components of the actual ElGamal cipher (.sub.i,j) are elements of the set (H), and, if the result of the check is positive, it calculates the subsequent cipher (.sub.i,j+1): .sub.i,j+1=ElGamalRerand.sub.R.sub.j(.sub.i,j).Math.b.sub.j, and if j<k, it sends the calculated cipher (.sub.i,j+1) to the subsequent mapper (M.sub.j+1), v. the last mapper (M.sub.k) sends the calculated cipher (.sub.i,k+1) it has received to the data source (DS.sub.i), and vi. the data source (DS.sub.i) generates and stores the ElGamal cipher (E.sub.i): E.sub.i=.sub.i,k+1.Math.((.sub.i,0).sup.1 mod ).

5. The method according to claim 1, characterised in that: for calculating the ElGamal cipher (C.sub.1), the first alternative is applied by all data sources (DS.sub.i) for each entity identifier (D): C.sub.1=ElGamalEnc.sub.S.sub.i(h(D)), where the range of the function h is a subset of the set (H), or for calculating the ElGamal cipher (C.sub.1), the second alternative is applied by all data sources (DS.sub.i) for each entity identifier (D): C.sub.1=(E.sub.i).sup.h(D).

6. The method according to claim 1, characterised in that the random selections are performed according to a uniform distribution.

7. The method according to claim 1, characterised in that the mapping (h) adapted for assigning an integer value to each entity identifier (D) is a cryptographic hash function that is defined over the space of entity identifiers (D) and maps to an interval [0, ].

8. The method according to claim 1, characterised in that the algebraic group (G) is a Schnorr group.

9. The method according to claim 1, characterised in that the algebraic group (G) is a prime-order elliptic curve defined over a finite field.

10. The method according to claim 1, characterised in that the set (H) forms a Schnorr group with regard to modulo multiplication.

11. The method according to claim 1, characterised in that the data sources (DS.sub.i) share the ElGamal ciphers (C.sub.1) with the mappers (M.sub.j) by writing them into a database that operates according to a protocol verified by third parties and provides decentralized authenticity.

12. The method according to claim 11, characterised in that a blockchain database is applied as the database providing decentralized authenticity.

13. The method according to claim 1, characterised in that the mappers (M.sub.j) constitute a decentralized network and communicate with each other over encrypted channels.

14. The method according to claim 13, characterised in that the mappers (M.sub.j) do not immediately send the messages containing the ElGamal ciphers (C.sub.j+1), values (Z.sub.j+1), keys (K.sub.j+1), and ciphers (U.sub.j+1) generated by them to the respective subsequent mapper, but instead put them on a waiting list, and, when the size of the waiting list has exceeded a predetermined limit, they send the messages in a random order.

15. The method according to claim 13, characterised in that the mappers (M.sub.j) do not immediately send the messages containing the ElGamal ciphers (C.sub.j+1), values (Z.sub.j+1), keys (K.sub.j+1) and ciphers (U.sub.j+1) generated by them to the respective subsequent mapper, but instead send these messages after a randomly chosen time period has elapsed.

16. The method according to claim 13, characterised in that the mappers (M.sub.j) do not immediately process the received messages containing ElGamal ciphers (C.sub.j+1), values (Z.sub.j+1), keys (K.sub.j+1), and ciphers (U.sub.j+1), but instead put them on a waiting list and, after the size of the waiting list has exceeded a predetermined limit, they randomly choose a message from among the received messages and perform the subsequent mapping step on it.

17. The method according to claim 13, characterised in that the mappers (M.sub.j) do not immediately process the received messages containing ElGamal ciphers (C.sub.j+1), values (Z.sub.j+1), keys (K.sub.j+1), and ciphers (U.sub.j+1), but instead they carry out on each message to the subsequent mapping step after a respective randomly chosen time period has elapsed.

18. The method according to claim 1, characterised in that each ElGamal cipher (C.sub.j+1), value (Z.sub.j+1), key (K.sub.j+1), and cipher (U.sub.j+1) is shared by writing into a database providing decentralized authenticity.

19. The method according to claim 18, characterised in that a blockchain database is applied as the database providing decentralized authenticity.

20. The method according to claim 1, characterised in that the algebraic group (G), the generator element (g), and the set (H) are predetermined by the entity or entities responsible for the implementation or the operation of the system.

21. The method according to claim 1, characterised in that the algebraic group (G), the generator element (g), and the set (H) are predetermined by the mappers (M.sub.j) in a decentralized manner.

22. The method according to claim 1, characterised in that the algebraic group (G), the generator element (g), and the set (H) are predetermined by the following algorithm: i. choosing randomly, according to a uniform distribution, a prime number q, the binary representation of which consists of B bits, ii. searching for an integer r between 2 and B for which it holds true that p=r.Math.q+1 is prime; if no such r can be found, returning to step i, iii. searching for an integer s between 2 and B for which it holds true that N=s.Math.p+1 is prime; if no such s can be found, returning to step i, iv. choosing randomly, according to a uniform distribution, integer numbers between 2 and (p1) until such a number f is found that f is relatively prime to p, and f.sup.s1 mod N, v. defining the generator element g as the value f.sup.s mod N, and defining the group G as a group of reduced residue classes a over modulo N for which it holds true that a.sup.s1 mod N and a.sup.pmod N, vi. defining the set (H) as a set of integers a between 1 and p where a is relatively prime to p, a.sup.r1 mod P, and a.sup.q1 mod p.

23. The method according to claim 22, characterised in that a pseudorandom number generator determined in the following manner is applied in the algorithm utilized for defining the algebraic group (G), the generator element (g), and the set (H): each mapper (M.sub.j) chooses an integer N.sub.j from a predetermined range, each mapper (M.sub.j) publishes a commitment value F(N.sub.j), where F is a cryptographic hash function, each mapper (M.sub.j) waits until all of the values F(N.sub.j) are published, each mapper (M.sub.j) publishes its own N.sub.j value, the mappers (M.sub.j) calculate the value N.sub.1.Math.N.sub.2.Math. . . . .Math.N.sub.k, where the symbol .Math. denotes a bitwise XOR operation, the value N.sub.1.Math.N.sub.2.Math. . . . .Math.N.sub.k is applied as the seed of the pseudorandom number generator utilized for defining the algebraic group (G), the generator element (g), and the set (H).

24. The method according to claim 1, characterised in that one or more attributes (A) belong to each entity identifier (D), which attribute/attributes is/are attached in unencrypted form to the ElGamal cipher (C.sub.1) calculated as an encrypted entity identifier, to the value calculated in the course of pseudonym calculation, and to the calculated pseudonyms (P), followed by matching and/or collecting the attributes (A) based on the pseudonyms (P).

25. The method according to claim 1, characterised in that one or more attributes (A) belong to each entity identifier (D), which attribute/attributes is/are attached by the data source (DS.sub.i) in encrypted form to the ElGamal cipher (C.sub.1) calculated as an encrypted entity identifier, such that the attribute (A) corresponding to the entity identifier (D) is encrypted by the data source (DS.sub.i) in the following manner: $A_{1, 1} = {ElGamalEncrypt}_{S_{i}} (A)$ then, in addition to the ElGamal cipher (C.sub.1), the encrypted attribute (A.sub.1,1) is also sent by the data source (DS.sub.i) to the mapper (M.sub..sub.1) in a message, for each index j=1, 2, . . . , k: i. after receiving the encrypted attribute (A.sub.1,1) attached to an ElGamal cipher (C.sub.1), the actual mapper (M.sub..sub.j) checks if both components thereof originate from the set (H), and, if the result of the check is positive, it calculates the subsequent value (A.sub.1,j+1) of the encrypted attribute in the following manner: $A_{1, j + 1} = {ElGamalRerandomize}_{R_{_{j}}} (A_{1, j})$ ii. if j<k, then, in addition to the subsequent ElGamal cipher (C.sub.j+1), the actual mapper (M.sub..sub.j) also sends the subsequent value (A.sub.1,j+1) of the encrypted attribute in a message sent to the subsequent mapper (M.sub..sub.j+1), iii. if j=k, then, in addition to the values Z.sub.1, K.sub.1, U.sub.1, the actual mapper (M.sub..sub.j) sends in a message sent to the mapper ( custom-character ) corresponding to the number .sub.1 a value A.sub.2,1=A.sub.1,j+1 that is calculated as the subsequent value of the encrypted attribute and which corresponds to the first encrypted attribute value (A.sub.2,1) of the subsequent order of mappers, and then for each index j=1, 2, . . . , k: i. if both components of the encrypted attribute value (A.sub.2,j) are elements of the set (H), then the actual mapper ( custom-character ) calculates therefrom the subsequent value of the encrypted attribute (A.sub.2,j+1) in the following manner: $A_{2, j + 1} = {ElGamalRerandomize}_{R_{_{j}}} (A_{2, j})$ ii. if j<k, then the actual mapper () also sends the subsequent value (A.sub.2,j+1) of the encrypted attribute to the subsequent mapper ( custom-character ) in a message containing the value (Z.sub.j+1), key (K.sub.j+1), and cipher (U.sub.j+1), iii. the final encrypted form (A.sub.2,k+1) of the entity attribute is finally obtained.

26. A computer system implementing the method according to claim 1, the system comprising data sources (DS.sub.i) containing data related to entities, more than one, a number k of mappers (M.sub.j), a module adapted for generating the cryptographic keys of the data sources, a module adapted for storing the cryptographic keys of the data sources, a module adapted for generating the cryptographic keys of the mappers, a module adapted for storing the cryptographic keys of the mappers, a module adapted for encrypting the entity identifiers (D), and a module adapted for mapping the encrypted entity identifiers (C.sub.1) to the pseudonyms (P).

27. The computer system according to claim 26, characterised by further comprising databases (DBI.sub.i) stored at the data sources (DS.sub.i), in which the data are identified applying the entity identifiers (D) of the entities, and a database (DBP) containing pseudonymised data, in which the pseudonymised data are identified by pseudonyms (P) assigned to the respective entity identifiers (D) applying a one-to-one mapping,

28. The computer system according to claim 26, characterised by the system further comprising data streams (SI.sub.i) broadcast by the data sources (DS.sub.i), wherein the data are identified by the entity identifiers (D) of the entities, and a data stream (SP) containing pseudonymised data, wherein the pseudonymised data are identified by pseudonyms (P) assigned to the respective entity identifiers (D) applying a one-to-one mapping.

29. The computer system according to claim 26, characterised by further comprising a key manager adapted for storing and/or generating the cryptographic keys of the data sources (DS.sub.i), and a key manager adapted for storing and/or generating the cryptographic keys of the mappers (M.sub.j).

30. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of any of the methods according to claim 1.

31. A computer-readable medium adapted for storing the computer program according to claim 30.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0054] Preferred embodiments of the invention are described hereinafter by way of example with reference to the following drawings, where

[0055] FIGS. 1A, 1B are schematic diagrams of the generation process of a first ElGamal public key applying a solution according to the invention,

[0056] FIGS. 2A, 2B show schematic diagrams of the generation process of a second ElGamal public key applying a solution according to the invention,

[0057] FIGS. 3A, 3B show schematic diagrams of the generation process of ElGamal ciphers adapted to be decrypted utilizing ElGamal private keys in a solution according to the invention, and

[0058] FIGS. 4A, 4B, 4C show schematic diagrams of the mapping process applied in a technical solution according to the invention.

MODES FOR CARRYING OUT THE INVENTION

[0059] Like in the technical solution set forth in the document WO 2021/009528 A1, it is supposed that the system consists of more than one (a number n of) data sources and more than one (a number k of) mappers. The data sources will hereinafter be denoted by DS.sub.i, with the mappers being denoted by M.sub.jtherefore, DS.sub.1, DS.sub.2, . . . DS.sub.n denotes a series of all data sources, while M.sub.1, M.sub.2, . . . M.sub.k denotes a series of all mappers.

Cryptographic Structure

[0060] The basis of the pseudonymisation method is formed by an algebraic group G (which is a cyclic group) and a set H which is a subset of integers being coprime to that forms an algebraic group with regard to modulo multiplication. For carrying out the method it is not necessary (and it is even practically impossible) to list all the elements of the group G or the set H; it is sufficient if there exists an effective algorithm that is able to decide whether a given object is an element of the group G or of the set H.

[0061] The order (i.e., the number of the elements) of the group G will be denoted in this document by the symbol , with a generator element g being also designated in the group G.

[0062] A further basis of the pseudonymisation system is formed by a mapping

[00001] $h : .fwdarw. {0, 1, 2, ..., (- 1)}$

[0063] where custom-character denotes the set of nonnegative integer numbers. For example, this can be a cryptographic hash function (see e.g., in Wikipedia), or even the modulo function (yielding h(x)=x mod for each nonnegative integer x). It is strongly recommended, i.e., preferable, that h be a cryptographic hash function.

Denotations

[0064] If a and b are integers, then the expression (a,b) denotes in all cases the ordered pair consisting of the numbers a, b.

[0065] In this document, ElGamal encryption is defined over the modulo cp multiplicative group of integers, therefore: [0066] By an ElGamal private key, a positive integer between 1 and is meant. [0067] By an ElGamal public key such a number pair K=(w,z) is meant where w and z are both integers that are relatively prime to . In this case, the integers w and z are called the components of the ElGamal public key K. [0068] By an ElGamal cipher such a number pair C=(c.sub.1, c.sub.2) is meant where c.sub.1 and c.sub.2 are both integers that are relatively prime to . In this case, the integers c.sub.1 and c.sub.2 are called the components of the ElGamal cipher C.

ElGamal Encryption

[0069] If K=(w,z) is an ElGamal public key, then in this document the expression ElGamalEnc.sub.K(m) denotes the cipher of the message m generated utilizing the key K applying ElGamal encryption over the modulo multiplicative group of integers, i.e.:

[00002] ${ElGamalEnc}_{K} (m) = (w^{y} \mod, m .Math. z^{y} \mod),$

[0070] where the value of y is an integer number chosen randomly between 1 and with a uniform distribution by the entity performing the encryption (this is an ephemeral key, i.e., a different value must be chosen each time).

Decryption (Partial Decryption) of an ElGamal Cipher

[0071] If C=(c.sub.1, c.sub.2) is an ElGamal cipher and x is an ElGamal private key, then the expression ElGamalPartialDec.sub.x(C) means the following:

[00003] ${ElGamalPartialDec}_{x} (C) = (c_{1}^{x} \mod, c_{2})$

Resolving an ElGamal Cipher

[0072] If C=(c.sub.1, c.sub.2) is an ElGamal cipher, then the expression ElGamalResolve(C) denotes the following value:

[00004] $ElGamalResolve (C) = (({(c_{1})}^{- 1} \mod) .Math. c_{2}) \mod$

[0073] Here, the expression (c).sup.1 mod denotes the multiplicative inverse modulo of the value c.sub.1. This value can be generated for example by applying the Euclidean algorithm.

Rerandomization of an ElGamal Cipher

[0074] If C=(c.sub.1, c.sub.2) is an ElGamal cipher and K=(w,z) is an ElGamal public key, then the expression ElGamalRerand.sub.K(C) denotes the rerandomization of the cipher C, i.e.:

[00005] ${ElGamalRerand}_{K} (C) = (c_{1} .Math. w^{y} \mod, c_{2} .Math. z^{y} \mod),$

[0075] where the value of y is an integer number chosen randomly between 1 and with a uniform distribution by the entity performing the rerandomization (this is an ephemeral key, i.e., a different value must be chosen each time).

Rerandomization of an ElGamal Key

[0076] If K=(w,z) is an ElGamal public key, then the expression ElGamalKeyRerand(K) denotes the rerandomization of the key K, i.e.:

[00006] $ElGamalKeyRerand (K) = (w^{y} \mod, z^{y} \mod),$

[0077] where the value of y is an integer number chosen randomly between 1 and with a uniform distribution by the entity performing the rerandomization (this is an ephemeral key, i.e., a different value must be chosen each time).

Multiplication by a Scalar of an ElGamal Cipher

[0078] If C=(c.sub.1, c.sub.2) is an ElGamal cipher and X is an arbitrary integer, then the expression C.Math. is used for denoting the product by the scalar of the cipher C, i.e.:

[00007] $C .Math. = (c_{1}, c_{2} .Math. \mod)$

Exponentiation of an ElGamal Cipher

[0079] If C=(c.sub.1, c.sub.2) is an ElGamal cipher and is an arbitrary integer, then the expression C.sup. is used for denoting the -th power of the cipher C, i.e.:

[00008] $C^{} = (c_{1}^{} \mod, c_{2}^{} \mod)$

Addition of a Private Key to an ElGamal Public Key

[0080] If K=(w,z) is an ElGamal public key and x is an ElGamal private key, then the expression K.Math.x denotes the following pair of values:

[00009] $K x = (w, z^{x} \mod)$

Removal of a Private Key from an ElGamal Public Key

[0081] If K=(w,z) is an ElGamal public key and x is an ElGamal private key, then the expression Kx denotes the following pair of values:

[00010] $K x = (w^{x} \mod, z)$

Equivalence of ElGamal Ciphers

[0082] If x is an ElGamal private key and B and C are ElGamal ciphers, then the expression

[00011] $B \overset{x}{} C$

[0083] denotes the following statement:

[00012] $ElGamalResolve ({ElGamalDecrypt}_{x} (B)) = ElGamalResolve ({ElGamalDecrypt}_{x} (C))$

[0084] Informally, this can be phrased as the ciphers B and C encode the same message.

Key Exchange Process

[0085] Preparing the pseudonymisation system requires the preparation of the keys of the mappers M.sub.j and the data sources DS.sub.i. In this section, the steps of this process are described in relation to FIGS. 1A-1B, and FIGS. 2A-2B.

[0086] According to the invention, by random selection it is meant that the implementation of the method is not dependent on the particular elements of the given set that are chosen. Accordingly, random selection is meant to include also quasi-random or pseudo-random selection, as well as all such selection methods (even according to rules unknown to an observer) wherein the selection appears to be random to the outside observer. If the set constitutes an algebraic structure, then, if it has a null element and/or a unit element, then it/they are not regarded as randomly selected. However, for cryptography considerations it is worth selecting values for which the bit length of their representation fills up all the available space.

Generation of Private Keys

[0087] The following two steps must be performed for each index j=1, 2, . . . , k: [0088] 1.1.1 The mapper M.sub.j randomly selects for itselfpreferably according to a uniform distributionan integer x.sub.j and an integer .sub.j between 2 and . These values are chosen independently of each other. [0089] 1.1.2 The mapper M.sub.j randomly selects for itselfpreferably according to a uniform distributionan integer b.sub.j from the set H.

[0090] The values b.sub.j, x.sub.j, and .sub.j are kept secret by the mapper M.sub.j because they are qualified as secret cryptographic keys.

[0091] The values b.sub.j, x.sub.j, and .sub.j cannot be changed during the pseudonymisation process.

Exchange of the Distributed ElGamal Keys of Mappers (FIGS. 1A and 1B)

[0092] The following steps must be carried out for each index j=1, 2, . . . , k: [0093] 1.2.1 The mapper M.sub.j randomly choosespreferably according to a uniform distributionan integer r.sub.j between 1 and from the set H. [0094] 1.2.2 The mapper M.sub.j randomly choosespreferably according to a uniform distributionan integer t.sub.j between 1 and . [0095] 1.2.3 The mapper M.sub.j generates the ElGamal public key K.sub.j,1=(r.sub.j, (r.sub.j).sup.t.sup.j mod ). [0096] 1.2.4 The mapper M.sub.j sends the public key K.sub.j,1 to the mapper M.sub.1 (if j=1, this step can be omitted). [0097] 1.2.5 For each index =1, 2, . . . , k, in this order, the mapper M.sub. checks if the first and second components of the ElGamal public key K.sub.j, are elements of the set H, and, if the result of the check is positive, it calculates the public key K.sub.j,+1=ElGamalKeyRerand(K.sub.j,.Math.x.sub.) from the ElGamal public key K.sub.j,, and, if

[0100] Note: for each index j=1, 2, . . . , k the key R.sub.j is a public key corresponding to the ElGamal private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k.

[0101] Key exchange of the distributed ElGamal keys of the data sources (FIGS. 2A and 2B) The following steps must be carried out for each index i=1, 2, . . . , n: [0102] 1.3.1 The data source DS.sub.i randomly choosespreferably according to a uniform distributionan integer s.sub.i between 1 and from the set H. [0103] 1.3.2 The data source DS.sub.i randomly choosespreferably according to a uniform distributionan integer u.sub.i between 1 and . [0104] 1.3.3 The data source DS.sub.i generates the ElGamal public key L.sub.i,1=(s.sub.i, (s.sub.i).sup.u.sup.i mod ). [0105] 1.3.4 The data source DS.sub.i sends the ElGamal public key L.sub.i,1 to the mapper M.sub.1. [0106] 1.3.5 For each index =1, 2, . . . , k, in this order, the mapper M.sub. checks if the first and second components of the ElGamal public key L.sub.i, are elements of the set H, and, if the result of the check is positive, it calculates the public key L.sub.i,+1=ElGamalKeyRerand(L.sub.i,.Math.x.sub.) from the ElGamal public key L.sub.i,, and, if

[00013] $S_{i} = L_{i, k + 1} u_{i} .$

[0109] Note: for each index i=1, 2, . . . , n the key S.sub.i is a public key corresponding to the ElGamal private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k.

[0110] The mapping h adapted to assign integer numbers to the entity identifiers is required in order that the entity identifier D is converted to a form that can be interpreted by the mathematical function defining pseudonymisation. The first and second ElGamal public keys R.sub.i, S.sub.i are in turn adapted for securely calculating the mathematical function defining pseudonymisation.

Exchange of Distributed ElGamal Ciphers (FIGS. 3A and 3B)

[0111] The following steps must be carried out for each index i=1, 2, . . . , n: [0112] 1.4.1 The data source DS.sub.i randomly choosespreferably according to a uniform distributiona value .sub.i,0 from the set H. [0113] 1.4.2 The data source DS.sub.i generates the cipher .sub.i,1=ElGamalEnc.sub.s.sub.i(.sub.i,0). [0114] 1.4.3 The data source DS.sub.i sends the cipher .sub.i,1 to the mapper M.sub.1. [0115] 1.4.4 For each index j=1, 2, . . . , k, in this order, the mapper M.sub.j checks if the first and second components of the ElGamal cipher .sub.i,1 are elements of the set H, and, if the result of the check is positive, it calculates the cipher .sub.i,j+1=ElGamalRerand.sub.R.sub.j(.sub.i,j).Math.b.sub.j from the ElGamal cipher .sub.i,j, and, if j

[0118] Note: it can be seen that for each index i=1, 2, . . . , n

[00014] $E_{i} \overset{x_{1} .Math. x_{2} .Math. ... .Math. x_{k}}{} {ElGamalEnc}_{s_{i}} (b_{1} .Math. b_{2} .Math. .Math. .Math. b_{k})$

Pseudonymisation Process (FIGS. 4A, 4B and 4C)

[0119] Two alternatives are presented for defining the function adapted for mapping the entity identifiers to pseudonyms. For implementing the system, a decision must be made on which of the two alternatives will be utilized.

[0120] The pseudonym assigned by the system to the entity identifier D will hereinafter be denoted by P(D).

Pseudonymisation Function, First Alternative:

[00015] $\begin{matrix} P (D) = g^{({h (D)}^{_{1} .Math._{2} .Math. ... .Math._{k}} \mod)}, & (2. .1) \end{matrix}$

[0121] where the range of the function h is a subset of the set H.

Pseudonymisation Function, Second Alternative:

[00016] $\begin{matrix} P (D) = g^{({(b_{1} .Math. b_{2} .Math. .Math. .Math. b_{k})}^{h (D) .Math._{1} .Math._{2} .Math. ... .Math._{k}} \mod)} & (2. .2) \end{matrix}$

[0122] For pseudonymisation it is not necessary to utilize the ElGamal-type encryption system, because the pseudonym values are defined by the functions according to the above-described alternatives, and neither of the functions can be derived solely from the ElGamal-type encryption system. The mapping adapted for assigning pseudonyms to the entity identifiers is therefore basically not resulting from the ElGamal-type encryption system. In the course of the method described in the present invention, the mathematical operations required for calculating the formulas 2.0.1 and 2.0.2 (modular multiplication, modular exponentiation) are not obtained as a direct output of the ElGamal-type encryption system but rather due to the homomorphic properties of the ElGamal-type encryption system, i.e., thanks to the fact that the ElGamal-type encryption system is homomorphic with regard to exponentiation and to multiplication by a group element. The ElGamal-type encryption system is included in the invention not because it is required for generating the pseudonyms, but in order that the pseudonyms can be generated in a secure manner, i.e., without the participants of the process gaining any extra information during the interaction. The significance of the invention does not exclusively stem from the method being adapted for securely generating the pseudonyms, but first and foremost from the cryptographic strength of the pseudonyms themselves, including their resistance to exponent marking attacks.

Data Submission Step

[0123] In this step, one of the data sources (hereinafter: DS.sub.i) initiates the pseudonymisation of an entity identifier D in its possession. This is performed in the following manner: [0124] 2.1.1 The data source DS.sub.i calculates an ElGamal cipher C.sub.1 according to one of the following two alternatives: [0125] 2.1.1.1 In case the first alternative (2.0.1) of the pseudonymisation function is applied, the definition of the ElGamal cipher C.sub.1 is:

[00017] $C_{1} = {ElGamalEnc}_{S_{i}} (h (D))$ [0126] 2.1.1.2 In case the second alternative (2.0.2) of the pseudonymisation function is applied, the ElGamal cipher C.sub.1 is defined as:

[00018] $C_{1} = {(E_{i})}^{h (D)}$ (see the section Exponentiation of an ElGamal cipher). [0127] 2.1.2 The data source DS.sub.i randomly choosespreferably according to a uniform distributiona value .sub.1 from among the numbers 1,2, . . . , k. [0128] 2.1.3 The data source DS.sub.i sends the cipher C.sub.1 to the mapper M.sub..sub.1.

Randomization Step

[0129] This step has two objectives: [0130] To ensure that, if the entity identifier D or the cipher C.sub.1 is manipulated (altered) by the data source DS.sub.i in the submission step (chosen plaintext or chosen ciphertext attack), the attacker is not able to gain such information that would enable it to find out or correctly guess the pseudonym corresponding to an identifier (with a probability higher than that of choosing randomly from the set of all pseudonyms). [0131] To transform the cipher C.sub.1 such that no entity (including the data source DS.sub.i) is able to guess which data source is the originator of the cipher.

[0132] This step is performed by the mappers carrying out the following operations for each index j=1, 2, . . . , k in the following order: [0133] 2.2.1 The mapper M.sub..sub.j checks if the first and second components of the ElGamal cipher C.sub.j are elements of the set H, and only continues the process if the result of the check is positive. [0134] 2.2.2 The mapper M.sub..sub.j calculates the cipher

[00019] $C_{j + 1} = {ElGamalRerand}_{R}_{_{j}} ({(C_{j})}^{_{_{j}}}) .$ [0135] 2.2.2. If jj randomly selectspreferably according to a uniform distributionfrom the numbers 1, 2, . . . , k such a number .sub.j+1 that is not among the numbers .sub.1, .sub.2, . . . , .sub.j, and then sends in a message the cipher C.sub.j+1 to the mapper M.sub.j+1. [0136] 2.2.3. If, in turn, j=k, then the mapper M.sub..sub.j randomly selectspreferably according to a uniform distributionfrom the numbers 1,2, . . . , k a number custom-character .sub.1, and sends to the mapper the following in a message:

[00020] $Z_{1} = g$ $K_{1} = R_{_{k}}$ $U_{1} = C_{k + 1}$ The rerandomization step is thereby completed.

Pseudonym Decryption Step

[0137] In this step, the cipher U.sub.1 is decrypted by the mappers such that the decrypted value itself is never known, only the result of exponentiating the generator element g utilizing this value as an exponent is published.

[0138] The following operations must be performed for each index j=1, 2, . . . , k, in this order: [0139] 2.3.1 The mapper M.sub..sub.j checks if the first and second components of the ElGamal cipher U.sub.j are elements of the set H, and only continues the process if the result of the check is positive. [0140] 2.3.2 The mapper M.sub..sub.j checks if the first and second components of the ElGamal public key K.sub.j are elements of the set H, and only continues the process if the result of the check is positive. [0141] 2.3.3 The mapper custom-character randomlypreferably according to a uniform distributionchooses an integer e.sub.j from the set H, and determines an integer f.sub.j such that:

[00021] $e_{j} .Math. f_{j} 1 \mod$ Such a value f can be found for example applying the Euclidean algorithm. [0142] 2.3.4 The mapper custom-character calculates the following values:

[00022] $Z_{j + 1} = {(Z_{j})}^{e_{j}}$ $K_{j + 1} = ElGamalKeyRerand (K_{j} x_{_{j}})$ $U_{j + 1} = {ElGamalRerand}_{K_{j + 1}} ({ElGamalPartialDec}_{x_{_{j}}} (U_{j}) .Math. f_{j})$

[0143] (By the exponentiation of Z.sub.j, in this case the group operation, i.e., exponentiation defined over G is meant.) [0144] 2.3.5. If j randomly selectspreferably according to a uniform distributionfrom the numbers 1, 2, . . . , k such a number custom-character that is not among the numbers , , . . . , , and then sends in a message the value Z.sub.j+1, the key K.sub.j+1, and the cipher U.sub.j+1 to the mapper . [0145] 2.3.6. If j=k, then the mapper generates the following value:

[00023] $= {(Z_{k + 1})}^{ElGamalResolve (U_{j + 1})}$ (By the exponentiation of Z.sub.k+1, in this case the group operation, i.e., exponentiation defined over G is meant.) This value is the pseudonym corresponding to the identifier D.

Correctness of the Pseudonym

[0146] In the following we sketch a proof for the value generated by the method presented in the previous section being equal to the value according to the appropriate definition (2.0.1 or 2.0.2) of P(D).

[0147] It can be easily seen that if K is a public key corresponding to the ElGamal private key x, m is an arbitrary integer, is a positive integer, and C=ElGamalEnc.sub.K(m), then the following equivalences hold:

[00024] $\begin{matrix} {ElGamalRerand}_{K} (C) \overset{x}{} C & (3.1) \end{matrix}$ $\begin{matrix} {ElGamalRerand}_{ElGamalKeyRerand (K)} (C) \overset{x}{} C & (3.2) \end{matrix}$ $\begin{matrix} {ElGamalEnc}_{K} (m^{}) \overset{x}{} C^{} & (3.3) \end{matrix}$ $\begin{matrix} {ElGamalEnc}_{K} (m .Math.) \overset{x}{} C .Math. & (3.4) \end{matrix}$ $\begin{matrix} {ElGamalPartialDec}_{t} (C) \overset{x .Math. t^{- 1} \mod}{} {ElGamalEnc}_{K t} (m) & (3.5) \end{matrix}$

[0148] Applying the equivalences 3.1 and 3.3, from the steps 2.2.1, 2.2.2, and 2.2.3 it follows by induction that for each index j=1, 2, . . . , k:

[00025] $\begin{matrix} C_{j + 1} \overset{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}}{} {(C_{1})}^{_{_{1}} .Math._{_{2}} .Math. .Math. .Math._{_{j}}} & (3.6) \end{matrix}$

[0149] Specially:

[00026] $\begin{matrix} C_{j + 1} \overset{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}}{} {(C_{1})}^{_{_{1}} .Math._{_{2}} .Math. .Math. .Math._{_{j}}} = {(C_{1})}^{_{1} .Math._{2} .Math. .Math. .Math._{k}} & (3.7) \end{matrix}$

[0150] With the help of the properties 3.2, 3.4, and 3.5, based on the steps 2.3.2, 2.3.3, 2.3.4 it can be obtained by induction that for each index j=1, 2, . . . , k:

[00027] $\begin{matrix} U_{j + 1} \overset{x_{_{j + 1}} .Math. x_{_{j + 2}} .Math. .Math. .Math. x_{_{k}}}{} {ElGamalPartialDec}_{x_{_{1}} .Math. x_{_{2}} .Math. .Math. .Math. x_{_{j}}} (C_{k + 1}) .Math. f_{1} .Math. f_{2} .Math. .Math. .Math. f_{j} & (3.8) \end{matrix}$

[0151] Substituting equivalence 3.7 and considering the case j=k:

[00028] $ElGamalResolve (U_{k + 1}) = ElGamalResolve ({ElGamalPartialDec}_{1} (U_{k + 1})) = {ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}} (C_{1}))}^{_{1} .Math._{2} .Math. .Math. .Math._{k}} .Math. f_{1} .Math. f_{2} .Math. .Math. .Math. f_{k}$

[0152] based on the definition of Z.sub.j+1 it is obvious that:

[00029] $Z_{k + 1} = g^{e_{1} .Math. e_{2} .Math. .Math. .Math. e_{k}}$

[0153] Applying the previous two equations and the Euler-Fermat theorem:

[00030] ${(Z_{k + 1})}^{ElGamalResolve (U_{k + 1})} = g^{(e_{1} .Math. e_{2} .Math. .Math. .Math. e_{k} .Math. f_{1} .Math. f_{2} .Math. .Math. .Math. f_{k} .Math. {ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}} (C_{1}))}^{_{1} .Math._{2} .Math. .Math. .Math._{k}})} = g^{({ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}} (C_{1}))}^{_{1} .Math._{2} .Math. .Math. .Math._{k}})}$

[0154] If the pseudonymisation function according to the equation 2.0.1 is being used, then

[00031] $C_{1} = {ElGamalEnc}_{S_{i}} (h (D)), and thus ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}} (C_{1})) = h (D), therefore = g^{(h (D)_{1} .Math._{2} .Math. .Math. .Math._{k})} = P (D) .$

[0155] If, on the other hand, the pseudonymisation function has been defined according to the equation 2.0.2, then C.sub.1=(E.sub.i).sup.h(D), and therefore according to the equivalence 3.3

[00032] $ElGamalResolve ({ElGamalPartialDec}_{x_{1} .Math. x_{2} .Math. .Math. .Math. x_{k}} (C_{1})) = {(b_{1} .Math. b_{2} .Math. .Math. b_{k})}^{h (D)}, thus = g^{({(b_{1} .Math. b_{2} .Math. .Math. .Math. b_{k})}^{h (D) .Math._{1} .Math._{2} .Math. .Math. .Math._{k}})} = P (D) .$

[0156] It therefore holds true that in both cases =P(D).

Additional Information on Attribute Encryption

[0157] In the previous sections only the process of calculating the pseudonyms was described. In itself, this opens up few possibilities for data analysis. A preferred application of the present pseudonymisation method is that (with due care). various other data (entity attributes) can also be attached to the entity identifiers, these data accompanying the entity identifiers during the steps of the pseudonymisation process.

[0158] Preferably, in this case the entity attributes are also subjected to an encryption process, namely such that the entity attribute leaves the data source already in the form of a cipher, and this cipher is overwritten by the mappers in each step by a different respective cipher that, although different in each step, mathematically still carries the same message. This solution provides that an attacker (e.g., a data source with malicious intent) is not able to utilize the entity attributes for assigning the entity identifier, thereby cracking the anonymity provided by the method.

Entity Attribute Encryption Process

[0159] The attribute encryption step added to the submission step is as follows: [0160] 4.1.1 The entity attribute A corresponding to the entity identifier D is encrypted by the data source DS.sub.i in the following manner:

[00033] $A_{1, 1} = {ElGamalEncrypt}_{S_{i}} (A)$ [0161] 4.1.2 After that, in addition to the cipher C.sub.1, the data source DS.sub.i also sends the cipher A.sub.1,1 in a message sent to the mapper M.sub..sub.1.

[0162] The attribute encryption step added to the rerandomization step is the following:

[0163] The following operations must be performed for each index j=1, 2, . . . , k: [0164] 4.2.1 After receiving the cipher A.sub.1,j attached to the cipher C.sub.j, the mapper M.sub..sub.j checks if both components thereof are elements of the set H, and, if the result of the check is positive, it calculates the cipher A.sub.1,j+1 in the following manner:

[00034] $A_{1, j + 1} = {ElGamalRerandomize}_{R_{_{j}}} (A_{1, j})$ [0165] 4.2.2 If jj also sends the cipher A.sub.1,j+1 in a message sent to the mapper M.sub..sub.j+1. [0166] 4.2.3 If, in turn, j=k, then, in addition to the values Z.sub.1, K.sub.1, U.sub.1, the M.sub..sub.j also sends the cipher A.sub.2,1=A.sub.1,j+1 in a message sent to the mapper custom-character .

[0167] The attribute encryption step added to the pseudonym decryption step is the following:

[0168] The following operations must be performed for each index j=1, 2, . . . , k: [0169] 4.3.1 After receiving the cipher A.sub.2,j attached to the values Z.sub.j, K.sub.j, U.sub.j, the mapper custom-character checks if both components thereof are elements of the set H, and, if the result of the check is positive, it calculates the cipher A.sub.2,j+1 in the following manner:

[00035] $A_{2, j + 1} = {ElGamalRerandomize}_{R_{_{j}}} (A_{2, j})$ [0170] 4.3.2 If j also sends the cipher A.sub.2,j+1 to the mapper custom-character in a message containing the value Z.sub.j+1, the key K.sub.j+1, and the cipher U.sub.j+1. [0171] 4.3.3 The cipher A.sub.2,k+1 is the final encrypted form of entity attribute A.

Decryption Process of the Encrypted Entity Attributes

[0172] The entity attributes A can only be decrypted with the consent of all mappers.

[0173] Let .sub.1, .sub.2, . . . , .sub.k be an arbitrary-order permutation of the numbers 1,2, . . . , k.

[0174] The entity attribute is decrypted performing the following steps: [0175] 5.1.1 The mapper M.sub..sub.1 calculates the following value:

[00036] $B_{_{1}} = {ElGamalPartialDec}_{x_{_{1}}} (A_{2, k + 1})$ [0176] 5.1.2 For each value j=1, 2, . . . , k the mapper .sub.j calculates the following value:

[00037] $B_{_{j + 1}} = {ElGamalPartialDec}_{x_{_{j}}} (B_{_{j}})$

[0177] The entity attribute A is then obtained as A=ElGamalResolve(B.sub..sub.k+1.sub.).

[0178] Note: Without knowing the private key x.sub.1.Math.x.sub.2.Math. . . . .Math.x.sub.k, the ciphers A.sub.1,1, A.sub.1,2, . . . , A.sub.1,k+1 and A.sub.2,1, A.sub.2,2, . . . , A.sub.2,k+1 are indistinguishable from integer pairs chosen independently of each other in a random manner even if the value of the attribute A is known. If the system operates correctly, the same number pair A.sub.2,k+1 will never be generated twice. Therefore, the entity attributes cannot be utilized for a data marking attack.

Preferred Applications Complementing the Pseudonymisation Method

[0179] It is particularly preferable if, in the above-described system, the mappers do not process the messages in the order of their reception, but instead they apply a so-called mix network mentioned in the introduction. Furthermore, if, from a business aspect it is not important that the entity identifiers are mapped to the pseudonyms immediately after being submitted, an alternative version of the mix network can also be applied: the incoming messages are first put on a waiting list by all mappers, and, in each case when the size of its waiting list exceeds a certain predetermined limit (e.g. 100 messages), the mapper randomly chooses a message from the list, and, after forwarding it, it waits until the size of the list again exceeds the preset limit. This ensures that a relationship between the identifiers and the pseudonyms cannot be established even from the generation order of the pseudonyms. The greater the limit regarding the length of the waiting list, the more effective the temporal mixing of the messages, but also the more the mapping process is hindered. Therefore, it is expedient to choose such a limit value that balances these two effects.

[0180] There is a large number of options for choosing the type of the group G; two of these will be scrutinised below. One of the opportunities is that G is a so-called Schnorr group (see e.g., the corresponding Wikipedia article). This option makes the ElGamal encryption defined over the multiplicative group mod p more secure. According to the other suggested option, G is a group defined by a prime-order elliptic curve defined over a finite field (see the Wikipedia article Elliptic-curve cryptography). In this case, the exponentiation operation over G is defined as the multiplication by a scalar of the elliptic point (i.e., the repeated addition of the elliptic point). It is important that the order of the group G is high enough (i.e., an order ensuring that the key size of ElGamal encryption over the multiplicative group mod is secure).

[0181] The are multiple options for the exact selection of the group G. It is not recommended to choose a predetermined, known group, because, according to recent research, certain preprocessing attacks may offer significant help in cracking one of the cryptographic problems forming the basis of the present invention, namely the sqDDH problem (see the article by Henry Corrigan-Gibbs and Dmitry Kogan entitled The Discrete-Logarithm Problem with Preprocessing, DOI: 10.1007/978-3-319-78375-8_14).

[0182] The algebraic group G and the generator element g are preferably predetermined by the entity or entities responsible for the implementation or the operation of the system. Optionally, they can also be predetermined in a decentralised manner by the mappers, expediently according to some type of commitment scheme (see e.g., the Wikipedia article Commitment scheme). This, for example, can be performed by applying such a deterministic pseudorandom number generator for generating the group G that has the number N.sub.1.Math.N.sub.2.Math. . . . .Math.N.sub.k as its seed (starting value), where for each index j the integer N.sub.j is selected randomly from a sufficiently large interval by the mapper M.sub.j, and the symbol .Math. denotes a bitwise XOR operation. Each mapper reveals its own N.sub.j value only after publishing a value F(N.sub.j) (commitment) calculated therefrom by a predetermined cryptographic hash function F, and after ensuring that every other mapper has also done so. Thereafter, based on the commitment values, each mapper is able to make sure that there has been no such manipulation which could affect the random nature of the value N.sub.1.Math.N.sub.2.Math. . . . .Math.N.sub.k.

[0183] It is strongly recommended to choose the set H such that it forms a Schnorr group with regard to modulo multiplication. Otherwise, a maliciously acting mapper could choose a small integer (e.g., on the order of a million) x that is a submultiple of , and participate in the process utilizing the value

[00038] $_{j} = \frac{}{x} .$

Thereby, the range of the pseudonymisation function P could be significantly reduced (in both the first and second alternatives), to the extent that the malicious actor could even reproduce the function P applying a brute force attack, and then use it for cracking pseudonyms. Such an attack cannot be executed in case the set H is a Schnorr group, because in that case is a prime, so there does not exist a suitable x value (the attacker cannot succeed either in case it chooses x=1 or x=). It can be relatively easily provided that both the set H and the group G form a Schnorr group; a possible exemplary implementation is the following: [0184] 1. A prime number q is chosen randomly (preferably according to a uniform distribution, for example applying the Miller-Rabin prime test), of which prime number the binary representation consists of B bits. [0185] 2. Such an integer r is searched for between 2 and B for which it holds true that p=r.Math.q+1 is prime (for example it is checked applying the Miller-Rabin prime test). If there is no such number r, we return to step 1. [0186] 3. Such an integer s is searched for between 2 and B for which it holds true that N=s.Math.p+1 is prime. If there is no such number s, we return to step 1. [0187] 4. Integer numbers are randomly chosen (preferably according to a uniform distribution) between 2 and (p1) until such a number f is found that f is relatively prime to p, and f.sup.s1 mod N. [0188] 5. The generator element g is defined by the value f.sup.s mod N, the group G being defined as a group of reduced residue classes a over modulo N for which it holds true that a.sup.s1 mod N and a.sup.p1 mod N. [0189] 6. The set H is defined as a set of integers a between 1 and p that are relatively prime to p, while a.sup.r1 mod P and a.sup.q1 mod p.

[0190] In this case, random selection from the set H can be implemented easily, e.g., by applying the Monte Carlo method (see the Wikipedia article Monte Carlo method).

[0191] Data sources may optionally also fulfil a mapping role. This further improves the security of the system, as it ensures that the other mappers cannot generate the open entity identifier even if all of them maliciously collude.

[0192] Optionally, multiple entity attributes can even be submitted simultaneously, attached to the same entity identifier. In that case, the entity attributes must be encrypted parallelly with each other and with the entity identifier. It must be made sure that an attacker is not able to perform data marking by submitting an unusually large number of entity attributes simultaneously with an entity identifier. It is recommended to restrict the number of entity attributes that can be submitted simultaneously.

[0193] As a special case, in the above-described process the case A=D may also occur; in other words, the submitted entity attribute may be identical to the entity identifier. This can be useful in case the field of application requires that the entity identifiers are decrypted from time to time, for example in the case of providing fraud prevention and fraud discovery services, because the pseudonymisation process is not reversible (not even by the consent of all mappers).

A Typical, Exemplary Application and Implementation

[0194] In the following, a possible mode of application of the invention will be described. One of the unsolved problems related to the operation of commercial plasma collectors is that the companies are unable to credibly establish that a given donor has not exceeded the maximum allowed number of plasma donations (in Hungary, this number is 45). This is because a donor may apply at two different companies and donate plasma 40 times at each company. This fraud can be discovered only if the plasma companies share data between them. However, sharing the donors' personal data is ethically reprehensible (and is also against the law).

[0195] The present invention can be applied for discovering this kind of fraud in the following manner: [0196] the data sources DS.sub.i are the plasma companies participating in the fraud discovery program, [0197] the entity identifiers D are the social security numbers of the donors registered with the participating plasma companies, [0198] the entity attributes A are also the social security numbers of the donors registered with the participating plasma companies, [0199] the mappers M.sub.j are entities that cooperate with or are independent from the participating plasma companies (each plasma company may operate an own mapper M.sub.j if it desires so, thereby ensuring that the entity identifiers D submitted by it cannot be decrypted even in the case of the malicious collusion of all the other plasma companies), [0200] applying the above-described method, the mappers M.sub.j calculate the pseudonyms P and the encrypted entity attributes A.sub.2,k+1 corresponding thereto, [0201] if a pseudonym P has more occurrences in the database containing pseudonymised data than what is allowed (e.g. >45), the encrypted entity attribute A.sub.2,k+1 is decrypted by the mappers M.sub.j by mutual agreement, thereby obtaining the entity identifier D of the fraudulent donor.

CONCLUSION

[0202] Therefore, the following data conversion is performed by the invention in order to ensure that even a collusion between all of the data sources and all but one of the mappers is not sufficient to allow for deciphering the relationship between the pseudonym P(D) and the entity identifier D: [0203] from data that are available at data sources DS.sub.i and that are suitable for identifying persons, things, or other entities by a characteristic name, i.e., from the entity identifier D, [0204] such pseudonymised data are produced in which each entity identifier D is replaced by a respective pseudonym P assigned thereto in a one-to-one manner (independent of the value of the cryptographic keys utilized by the data source DS.sub.i),
such that [0205] the data sources DS.sub.i generate the cipher C.sub.1 from the entity identifiers D in their possession, [0206] these ciphers are mapped into respective pseudonyms by a plurality of encryption means called mappers M.sub.j, the mapping being performed by the mappers M.sub.j in a centralized system or in a decentralized (peer-to-peer, e.g., blockchain) network applying their respective own unique cryptographic keys b.sub.j or mapping cryptographic keys .sub.j executing their own operation in an arbitrary order,
such that [0207] the mappers calculate from each cipher the value

[00039] $P (D) = g^{({h (D)}^{_{1} .Math._{2} .Math. ... .Math._{k}} \mod)}$ or the value

[00040] $P (D) = g^{({(b_{1} .Math. b_{2} .Math. ... .Math. b_{k})}^{h (D) .Math._{1} .Math._{2} .Math. ... .Math._{k}} \mod)}$ in a secure manner, where g is the generator element of the group G utilized for encryption, is the order (number of elements) of the group G, and h is a predetermined function (e.g. a cryptographic hash function) mapping integers to integers, [0208] this modular exponentiation, or a deterministic function thereof (e.g., applying a cryptographic hash function) will be treated as the pseudonym of the entity.

[0209] The computer system implementing the functionalities according to the invention is adapted for generating pseudonymised data related to entities, and comprises [0210] databases DBI.sub.i containing the entity-related data that are owned by the data sources DS.sub.i, wherein the data are identified applying the entity identifiers D of the entities, [0211] a database DBP containing pseudonymised data, wherein the pseudonymised data are identified by pseudonyms P assigned to the respective entity identifiers D applying a one-to-one mapping, [0212] more than one (a number k of) mappers M.sub.j, [0213] optionally, a key manager KM, and [0214] modules implementing the above-described functions and/or entities, which modules can be hardware, software, or combined hardware-software modules.

[0215] Another aspect of the invention is a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method according to the invention. The invention further relates to a computer-readable medium adapted for storing the above-mentioned computer program.

[0216] The invention can also be applied for all other purposes for which the technical solution described in the document WO 2021/009528 A1 can be applied.

CRYPTOGRAPHIC PSEUDONYM MAPPING METHOD, COMPUTER SYSTEM, COMPUTER PROGRAM AND COMPUTER-READABLE MEDIUM

Inventors

Cpc classification

Classification Explorer

H04L9/3066

ELECTRICITY

Classification Explorer

H04L2209/08

ELECTRICITY

Classification Explorer

H04L9/3013

ELECTRICITY

Classification Explorer

H04L9/008

ELECTRICITY

Classification Explorer

H04L2209/42

ELECTRICITY

International classification

Classification Explorer

H04L9/30

ELECTRICITY

Classification Explorer

H04L9/00

ELECTRICITY

Abstract

Claims

Description