Cryptographic Pseudonym Mapping Method, Computer System, Computer Program And Computer-Readable Medium
20220318403 · 2022-10-06
Assignee
Inventors
Cpc classification
H04L9/0866
ELECTRICITY
H04L9/0861
ELECTRICITY
H04L9/3066
ELECTRICITY
H04L9/0825
ELECTRICITY
H04L9/302
ELECTRICITY
H04L9/0897
ELECTRICITY
International classification
Abstract
The invention is a cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating a pseudonymized database (DB) from data relating to entities and originating from data sources (DS.sub.i), wherein the data are identified at the data sources (DS.sub.i) by entity identifiers (D) of the respective entities, and wherein the data are identified in the pseudonymized database (DB) by pseudonyms (P) assigned to the respective entity identifiers (D) applying a one-to-one mapping. According to the invention, one mapper (M) and one key manager (KM) are applied, and a respective pseudonym (P) is generated by the mapper (M), for each encrypted entity identifier (C.sub.i) encrypted by the data source (DS.sub.i), utilizing the mapping cryptographic key (hi) corresponding to the particular data source (DS.sub.i). The invention is further a computer system realizing the invention, as well as a computer program and a computer-readable medium.
Claims
1. A cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating a pseudonymised database (DB) from data relating to entities and originating from data sources (DS.sub.i), wherein the data are identified at the data sources (DS.sub.i) by entity identifiers (D) of the respective entities, and wherein the data are identified in the pseudonymised database (DB) by pseudonyms (P), characterised in that the pseudonyms (P) are assigned to the respective entity identifiers (D) applying a one-to-one mapping, irrespective of the originating data source, by — applying one mapper (M) and one key manager (KM), — wherein the key manager (KM) is applied for — selecting a secret element (b) from a predetermined algebraic structure constituting a multiplicative or additive cyclic group, — selecting, for each data source (DS.sub.i) an element (e.sub.i, a.sub.i) to be applied as a cryptographic key by the given data source (DS.sub.i), from the algebraic structure and sending it to the given data source (DS.sub.i) while the element (e.sub.i, a.sub.i) is kept secret, — calculating an inverse (d.sub.i) of the element (e.sub.i, a.sub.i) in the given algebraic structure, and using it, together with the secret element (b), for generating a mapping cryptographic key (h.sub.i) of the mapper (M), the mapping cryptographic key (h.sub.i) corresponding to the given data source (DS.sub.i), and sending the mapping cryptographic key (h.sub.i) to the mapper (M) while keeping it secret and assigned to the given data source (DS.sub.i), — transforming by the data source (DS.sub.i), each entity identifier (D) to be mapped, into a respective encrypted entity identifier (C.sub.i) by using the element (e.sub.i, a.sub.i) to be applied as a cryptographic key, and — a respective pseudonym (P) is generated by the mapper (M), for each encrypted entity identifier (C.sub.i) encrypted by the data source (DS.sub.i), utilizing the mapping cryptographic key (h.sub.i) corresponding to the particular data source (DS.sub.i).
2. The method according to claim 1, characterised by applying an algebraic structure constituting a multiplicative cyclic group, wherein values selected by the key manager (KM) are represented by residue classes modulo N, for which algebraic structure constants N=p.Math.q, φ(N)=(p−1).Math.(q−1) are predetermined, wherein p and q are randomly selected prime numbers, the secret element (b) is a randomly selected prime number, and φ(N) is the value of the Euler function obtained for N, — the element (e.sub.i) to be applied as the own cryptographic key of the data source (DS.sub.i) is randomly selected by the key manager (KM), — the key manager (KM) is applied for computing the inverse (d.sub.i) of the element (e.sub.i) to be applied as a cryptographic key, for which the equation e.sub.i.Math.d.sub.i≡1 mod φ(N) is satisfied, — then, the mapping cryptographic key h.sub.i=d.sub.i.Math.b is generated by the key manager (KM), and is sent to the mapper (M), — the encrypted entity identifier (C.sub.i) is computed by the data source (DS.sub.i) utilizing the formula C.sub.i=D.sup.e.sup.
3. The method according to claim 2, characterised in that the randomly selected prime numbers p and q can be represented utilizing half the number of bits of a chosen key size.
4. The method according to claim 2, characterised by sending to an authorised entity (A) — by the key manager (KM), the inverse (d.sub.i) of the element (e.sub.i) to be applied as a cryptographic key, — by the mapper (M), the encrypted entity identifier (C.sub.i).
5. The method according to claim 1, characterised by applying an algebraic structure constituting an additive cyclic group, wherein values are represented by points of elliptic curves defined over a number field of residue classes modulo p, where p is a prime number, for which algebraic structure the following constants are predetermined: the parameters A, B of the formula y.sup.2=x.sup.3+Ax+B mod p defining the points of an elliptic curve defined over the residue classes of the prime number p, and a point G of the curve that has an order q that is greater than the number of entity identifiers (D), a secret element (b) having a value chosen arbitrarily from among residue classes mod q, — the key manager (KM) is applied for randomly selecting, from among residue classes mod q, an element (a.sub.i) to be applied as a cryptographic key by the data source (DS.sub.i), — then, a mapping cryptographic key h.sub.i=a.sub.i+b is generated by the key manager (KM), and is sent to the mapper (M), — the encrypted entity identifier (C.sub.i) is computed by the data source (DS.sub.i) utilizing the formula C.sub.i=M⊕a.sub.iG, where operator ⊕ is the sum of the points of the elliptic curve, and — and the pseudonym (P) P=C.sub.i⊖h.sub.iG is calculated by the mapper (M), where A⊖B=A⊕(−B).
6. The method according to claim 5, characterised by passing on to an authorised entity (A) — by the key manager (KM), the inverse (d.sub.i) of the element (a.sub.i) to be applied as a cryptographic key, and — by the mapper (M), the encrypted entity identifier (C.sub.i).
7. A computer system for cryptographic pseudonymisation, the system comprising: — data sources (DS.sub.i) comprising data by the entity identifiers (D) of the entities, — a pseudonymised database (DB), in which the data are identified by pseudonyms (P), characterised in that the pseudonyms (P) are assigned to the respective entity identifiers (D) applying a one-to-one mapping, irrespective of the originating data source, and the system further comprises — a mapper (M) and a key manager (KM), — a module adapted for selecting, applying the key manager (KM), a secret element (b) from a predetermined algebraic structure constituting a multiplicative or an additive cyclic group, — a module adapted for selecting from the algebraic structure, applying the key manager (KM), for each data source (DS.sub.i), an element (e.sub.i, a.sub.i) to be applied as a cryptographic key by the given data source (DS.sub.i), and for sending it to the given data source (DS.sub.i) while the element (e.sub.i, a.sub.i) is kept secret, — a module adapted for computing, applying the key manager (KM), an inverse (d.sub.i) of the element (e.sub.i, a.sub.i) in the given algebraic structure, for generating, by using said inverse (d.sub.i), together with the secret element (b), a mapping cryptographic key (h.sub.i) of the mapper (M) corresponding to the given data source (DS.sub.i), and for sending said mapping cryptographic key (h.sub.i) to the mapper (M) while keeping it secret and assigned to the given data source (DS.sub.i), — a module adapted for transforming, applying the data source (DS.sub.i), each entity identifier (D) to be mapped into a respective encrypted entity identifier (C.sub.i) utilizing the element (e.sub.i, a.sub.i) to be applied as a cryptographic key, and — a module adapted for mapping, applying the mapper (M), each encrypted entity identifier (C.sub.i) encrypted by the data source (DS.sub.i) utilizing the mapping cryptographic key (h.sub.i) corresponding to the particular data source (DS.sub.i) into a respective pseudonym (P).
8. The computer system according to claim 7, characterised in that it comprises modules adapted for performing a cryptographic pseudonym mapping method for an anonymous data sharing system, the method being adapted for generating a pseudonymised database (DB) from data relating to entities and originating from data sources (DS.sub.i), wherein the data are identified at the data sources (DS.sub.i) by entity identifiers (D) of the respective entities, and wherein the data are identified in the pseudonymised database (DB) by pseudonyms (P), characterised in that the pseudonyms (P) are assigned to the respective entity identifiers (D) applying a one-to-one mapping, irrespective of the originating data source, by — applying one mapper (M) and one key manager (KM), — wherein the key manager (KM) is applied for — selecting a secret element (b) from a predetermined algebraic structure constituting a multiplicative or additive cyclic group, — selecting, for each data source (DS.sub.i) an element (e.sub.i, a.sub.i) to be applied as a cryptographic key by the given data source (DS.sub.i), from the algebraic structure and sending it to the given data source (DS.sub.i) while the element (e.sub.i, a.sub.i) is kept secret, — calculating an inverse (d.sub.i) of the element (e.sub.i, a.sub.i) in the given algebraic structure, and using it, together with the secret element (b), for generating a mapping cryptographic key (h.sub.i) of the mapper (M), the mapping cryptographic key (h.sub.i) corresponding to the given data source (DS.sub.i), and sending the mapping cryptographic key (h.sub.i) to the mapper (M) while keeping it secret and assigned to the given data source (DS.sub.i), — transforming by the data source (DS.sub.i), each entity identifier (D) to be mapped, into a respective encrypted entity identifier (C.sub.i) by using the element (e.sub.i, a.sub.i) to be applied as a cryptographic key, and a respective pseudonym (P) is generated by the mapper (M), for each encrypted entity identifier (C.sub.i) encrypted by the data source (DS.sub.i), utilizing the mapping cryptographic key (h.sub.i) corresponding to the particular data source (DS.sub.i).
9. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to perform the steps of the method according to claim 1.
10. A computer-readable medium storing the computer program according to claim 9.
Description
BRIEF DESCRIPTON OF THE DRAWINGS
[0031] Preferred embodiments of the invention are described hereinafter by way of example with reference to the following diagram, where
[0032]
MODES FOR CARRYING OUT THE INVENTION
[0033] According to the invention it has been recognised that the characteristics of algebraic structures constituting multiplicative or additive cyclic groups can be preferably utilized to fulfil the objects of the invention. Two types of solutions based on such algebraic structures are described below in more detail, but, according to the invention, other such algebraic structures that provide the arithmetic required for the operation of the invention can also be applied. Of the exemplary algebraic structures, a solution involving residue classes modulo N (where N is a positive integer) is first described in detail, followed by describing, in relation to the former, a solution involving points of elliptic curves defined over the number field of residue classes modulo p (where p is a prime).
[0034] The entity identifiers and corresponding data are stored in databases at mutually independent data sources, and, after pseudonymisation according to the invention, the data, together with the pseudonyms generated from the entity identifiers, are stored, assigned to each other, in a central pseudonymised database. Complying with the conditions of the object set for the invention, for a given entity identifier, the relationship between unencrypted data and the pseudonym cannot be affected by the origin of the data (i.e. what data source it came from). However, the process of mappings, i.e. the operations performed at the particular stages, are unique for each data source, that is, different cryptographic keys (for example, modular exponents) have to be used for performing the same mapping. The implementation of a data gathering system necessarily begins with selecting an appropriate modulus. This is carried out in practice by the provider of the data gathering service, or the data gathering community first deciding upon the bit length of the applicable keys. Then, two such prime numbers are selected of which the product (applied as the modulus) can be represented using the given number of bits. The entity generating the keys (hereinafter: the key manager) has to know the modulus N and also the value φ(N) given by the Euler function, or in other words, its totient value. The value N of the modulus has to be known by all participating entities performing mappings. If the representation size of the entity identifiers to be mapped is significantly smaller than the key size, some kind of padding method is preferably applied. This method has to be deterministic in the sense that every data source has to receive the same value such that the pseudonym is also deterministic, irrespective of the data source. The basic data of the mapping are therefore N and φ(N).
[0035] According to the invention, by random selection it is meant that the implementation of the method is not dependent on which particular elements of the given set are chosen. Accordingly, random selection is meant to include also quasi-random or pseudo-random selection, as well as all such selection methods (even according to rules unknown to an observer) wherein the selection appears to be random to the outside observer. If the set constitutes an algebraic structure, then, if it has a null element and/or a unit element, then it/they are not regarded as randomly selected. Also, in the case of residue classes, the selection of non-relatively prime values is avoided. However, for cryptography considerations it is worth selecting such values for which the bit length of their representation fills up all the available space.
[0036] As can be seen in
[0037] The attributes related to the entity identifiers D are preferably passed on by the data sources DS; as unencrypted data, while the entity identifiers D are encrypted by the data sources DS.sub.i utilizing their own cryptographic keys. The resulting cipher is sent to the entity adapted to perform the mapping to the pseudonym P, i.e. to the mapper M. At the same time, an assignment between the unencrypted data and the cipher, i.e. the encrypted entity identifier is maintained, because the database required for data analysis can only be loaded with useful information in such a manner.
[0038] The inventive technical solution applying a single mapper M common for all data sources DS; is implemented as follows. In a first step, encryption is performed by the data source DS.sub.i, followed by the pseudonym P being computed by the mapper M in a second step. If these two entities cooperate, then they are able to connect the unencrypted data with the pseudonym P, so they can even generate a rainbow table. The solution can be applied therefore in case supervision of the mapper M is complemented by further informational technology means.
[0039] The i-th data source DS.sub.i uses its own secret cryptographic key that is referred to as (e.sub.i, N) using the references applied above. Each data source may utilize an arbitrary number of keys, so more than one key identifier index i can correspond to it. Therefore, the entity identifier D is sent to the mapper as the cipher
C.sub.i≡D.sup.e.sup.
[0040] As it was shown above, the pseudonym is generated by the mapper performing the operation
P≡((C.sub.i).sup.d.sup.
[0041] As a result of the intermediate calculations, by computing the value (C.sub.i).sup.d.sup.
P≡C.sub.i.sup.h.sup.
[0042] It has to be noted that the result of the operations performed on the exponents of modular exponentiation has to be computed with a modulus according to φ(N), because these exponents form a cyclic group having a number φ(N) of elements.
[0043] Of course, it has to be made sure whether the entity identifier D, i.e. the unencrypted data appears somewhere in the course of the above calculation. If the modular exponentiation above was performed as a sequence of multiplications, then after performing d.sub.i multiplications the unencrypted identifier D would be obtained, which would make the method useless because the entity performing the computation can easily check if the obtained partial result is of the form expected from the entity identifier, especially if it also contains a CDV (check digit value). However, due to the minimally applicable key size of 2048 bits, the decimal form of the exponent has 616 digits, i.e. as many as 10.sup.616 modular multiplications would have to be carried out. In practice, this is not feasible. Therefore, the simplest (but by far not the most effective) practically feasible method will be described below. The exponentiation x.sup.y is performed applying a binary exponentiation method (see for example: modular exponentiation, right-to-left binary method). The numbers are represented in binary form, as a sum of powers of two. For example, for y:
y=y.sub.0.Math.2.sup.0+y.sub.1.Math.2.sup.1+y.sub.1.Math.2.sup.2+y.sub.3.Math.2.sup.3y.sub.n−1.Math.2.sup.n−1
[0044] where y.sub.i is the i-th bit of the binary representation of the number, i.e. it is either zero or one, contributing either zero or 2.sup.i to the sum. Thus
x.sup.y≡x.sup.y.sup.
[0045] Therefore, if the i-th bit of the exponent y is zero, then the intermediate result is multiplied by one, while if the i-th bit is one, then it is multiplied by x.sup.2.sup.
[0046] Let us consider the case where d.sub.i and h.sub.i do not have the same length. If the multiplication d.sub.i.Math.b is performed according to the “long multiplication” algorithm (like on paper), utilizing the digits of the multiplier d.sub.i for multiplying the multiplicand b considered as unknown, then where a given bit of d.sub.i is one, the result is b, while where the given bit is zero, the result is zero. The value of the given bit can practically be considered random, so statistically the intermediate result will be non-zero in half of the cases. The intermediate results are then shifted according to the corresponding digit of the multiplier and are added up. As a result, as many equations as the number of non-zero bits in d.sub.i are obtained, the number being on average half the length of d.sub.i. In all of these equations it is stated that the sum of particular bits of b equals the value of the corresponding digit of d.sub.i, i.e. zero or one. With a key size of 2048 bits, the solution has to satisfy, on average, 1024 statements. On the right side of the equations a value of one or zero can stand, which is obtained exclusively as a sum of zeroes and ones. If, for a randomly chosen b value the calculated value is not the same as the zero or one at the just-calculated digit of the product, then changing a single bit of b included in the calculation will yield a correct result. This, however, each time cuts in half the number of the potentially selectable values that are binary representations of b. Therefore, an operation that yields the result of an exponentiation performed applying d.sub.i as the exponent is carried out with a probability of 2.sup.−1024, i.e. the unencrypted data is obtained with a probability of 10.sup.−308. The unencrypted data is therefore practically never generated in the course of the calculations.
[0047] The common identifiers of the data gathering network are characterised by the following. In the course of mapping the keys, the modulus N is generated applying the method specified in WO 2017/141065 A1, as the product N=p.Math.q of two prime numbers of appropriate magnitude. In such a case, φ(N)=(p−1).Math.(q−1). The only requirement for the exponent b utilized for mapping the unencrypted data into the pseudonym is that it is relatively prime to φ(N). The modular multiplicative inverse of b for φ(N) (the multiplicative inverse of a is a.sup.−1 modulo m if a.sup.−1 a≡1 mod m) is not computed because it is not used for any calculations.
[0048] The process of generating a pair of mapping keys is the following: After carrying out the above calculations, an exponent d.sub.i that is relatively prime to φ(N) is chosen randomly. Then the extended Euclidean algorithm is applied for computing e.sub.i, for which the formula e.sub.id.sub.i≡1 mod φ(N) will hold true. After that, h.sub.i≡d.sub.i.Math.b mod φ(N) is computed.
[0049] Because in possession of b a pseudonym can be generated from the unencrypted data, this data can be made accessible to such entities that cannot access the unencrypted data at all. Therefore, in such systems a key manager KM is required that is adapted for generating the keys and making them accessible to the entities performing the computations. The values of p, q and b are thus generated by the key manager and are kept secret. The value φ(N) is accessible also only to the key manager, i.e. it is the only entity that is capable of generating keys (e.sub.i, d.sub.i and h.sub.i), i.e. the above described pair of mapping keys is generated by the key manager KM.
[0050] It may be allowed by legislation or by the rules of the data gathering and data analysing community that an authority A or an authority trusted by the community is given access to the unencrypted data. Because it is not desired that even this entity is capable of making a connection between unencrypted data and the pseudonyms, ciphers originating directly from the data sources DS.sub.i are preferably applied for this purpose. In order to carry out the decryption operation it is required that the key manager KM pass on the key d.sub.i to this entity over an encrypted data channel. Then, provided that it has the required legal authorisation or the permission of the data gathering community, it may request the required cipher data from the mapper M. This entity is thus capable of accessing the received unencrypted data by performing the calculation D=C.sub.i.sup.d.sup.
[0051] An exemplary solution that can be seen in
[0057] In the above described embodiment, the data are encrypted by the data sources DS.sub.i applying respective own secret cryptographic keys e.sub.i identified by the index i, where a data source DS.sub.i can have an arbitrary number of keys.
[0058] It is particularly preferable to choose prime numbers as the values p and q, because in that case the number of relative primes is known (it is (p−1).Math.(q−1)).
[0059] As it was mentioned in the introduction, pseudonym mapping can also be performed applying points of elliptic curves (see for example the Wikipedia article “Elliptic curve”) defined over the number field of residue classes modulo p (where p is a prime). In this context, let the algebraic structure be the set of points satisfying the equation y.sup.2=x.sup.3+Ax+B mod p, where x, y, A and B are the residue classes of the prime number p. First, the unencrypted entity identifier m has to be assigned to a point of the curve. Let us choose a point G of the curve having an order q that is sufficiently great that the points of the message space can be assigned to the points generated by G applying a one-to-one mapping. (For all points of the curve there is a number q being the number of additions to itself of the point required for reaching the point O at infinity. The smallest of such numbers q gives the order of the point.) To achieve that, for example the following method can be applied (Aritro Sengupta, Utpal Kumar Ray: Message mapping and reverse mapping in elliptic curve cryptosystem (2016)). At the low order digits the binary representation of D is complemented by 8 bits. In the above defined formula of the curve, x is substituted with the value thus obtained. If no solution exists for y, then the value of x is increased by one. If a solution does exist, then a point M of the finite algebraic structure defined by the curve has been obtained. The description related to the specification of the objects above is applied here such that this point is projected by the i-th data source DS.sub.i to another point C.sub.i of the curves applying its own cryptographic key, followed by it being projected by the mappers to the point P utilized as a pseudonym such that the different ciphers C.sub.i are assigned to the same point P if and only if the point M was identical.
[0060] Because the solution based on algebraic structures forming an additive cyclic group operates in an analogous manner to the solution based on a multiplicative cyclic group, it is not shown separately. The references shown in the figures can be substituted, where needed, with the corresponding operations and references included in the following description. The values x, y, A, B and p adapted to define the algebraic structure are defined by the key manager KM that also selects the point G with a known order greater than the multiplicity of the message space. It then shares these data with the data sources DS.sub.i and the mapper M. A respective secret key b is then chosen randomly or arbitrarily from the residue classes mod q, selecting values different from zero and one.
[0061] For data provision, a number a.sub.i is selected, randomly or arbitrarily, for the i-th data source DS.sub.i, from the residue classes mod q by the key manager KM, which number will be utilized (after being received in encrypted form) by the data source DS.sub.i as a cryptographic key e.sub.i=a.sub.i. Applying the formula h.sub.i=a.sub.i+b, it generates the mapping key corresponding to the data source DS.sub.i. The key is then passed on to the mapper M in encrypted form.
[0062] After that, the encryption operation is performed by the data source DS.sub.i by adding two points: C.sub.i=M⊕a.sub.iG, where the operator ⊕ denotes the addition of two points of the curve, and scalar multiplication denotes repeated addition. The above process carried out on residue classes is modified only in that the below described operation is performed on the points of the curve. The following operation is performed by the mapper M on the data originating from the i-th data source: P=C.sub.i⊕(−h.sub.iG), where the unary operator “—” denotes the reflection of a curve point over the x axis. The operation ⊕ utilizing such values are hereinafter denoted with the operator ⊖. Thus, the pseudonym P is obtained through the combination of the two mappings:
[0063] Thus, the same entity identifier D is sent by each data source as a different cipher, but finally it is assigned to the same pseudonym P. Optionally, the x coordinate of the point P can also be applied as the pseudonym.
[0064] It may be allowed by legislation or by the rules of the data gathering and data analysing community that an authority A or an authority trusted by the community is given access to the unencrypted data. Because it is not desired that even this entity is capable of making a connection between unencrypted data and the pseudonyms, ciphers originating directly from the data sources DS.sub.i are preferably applied for this purpose. In order to carry out the decryption operation it is required that the key manager KM pass on the key a; to this entity over an encrypted data channel. The encrypted data and also the pseudonymised data can be stored, and passed on to authorised parties, by the key manager KM. In such a case, provided that it has the required legal authorisation or the permission of the data gathering community, the authority A may request the required cipher data from the mapper M. In possession of these data, the unencrypted data can be obtained by performing the calculation D=C.sub.i⊖a.sub.iG.
[0065] Therefore, in order to ensure that possessing any component of the system is not sufficient to allow for deciphering the relationship between the pseudonym P and the entity identifier D, the following data conversion is performed by the pseudonym mapping system according to the invention: [0066] — producing from data that are available at data sources DS.sub.i and that are suitable for identifying persons, things or other entities by a characteristic name, i.e. from the entity identifier D, [0067] — such pseudonymised data, in which the entity identifiers D are replaced by a pseudonym P assigned thereto in a one-to-one manner independent of the cryptographic key e.sub.i utilized by the data source DS.sub.i,
[0068] such that [0069] — applying an encryption means/module, an element b is chosen randomly or arbitrarily by the key manager KM from an algebraic structure forming a multiplicative or additive cyclic group utilized by the cryptographic algorithm (preferably, here the following functionality is implemented by the encryption means/module: it generates a random number which can be mapped into the key space applying a suitable mapping in order to select from among the elements of the key space with a near-uniform probability, i.e. randomly), and [0070] — from the same algebraic structure, it selects randomly or arbitrarily for each data source DS.sub.i at least one element e.sub.i or a.sub.i that it passes on to the data source DS.sub.i in such a manner that it can be accessed only by the particular data source DS.sub.i that will apply it as a cryptographic key, [0071] — such that the unencrypted data D is mapped into the cipher C.sub.i, and [0072] — depending on the algebraic structure, it computes the multiplicative or additive inverse cryptographic key d.sub.i or a.sub.i of the cipher, and [0073] — depending on the algebraic structure, it multiplies it with or adds it to the element b, and it passes on the element h.sub.i thus obtained to the mapper M such that it can only be accessed by the mapper M, [0074] — which mapper M will then utilize h.sub.i as a cryptographic key for mapping data encrypted by the data source DS.sub.i into a pseudonym P
[0075] and, optionally, [0076] — if conditions defined outside the system are fulfilled, it passes on the cryptographic key d.sub.i to the authorised entity (i.e. the authority A), which entity [0077] — if conditions defined outside the system are fulfilled, receives the ciphers C.sub.i corresponding to the data specified by such conditions from the mapper M of the database DB, [0078] — such that it can obtain the unencrypted data D.
[0079] The computer system for cryptographic pseudonymisation according to the invention comprises [0080] — data sources DS.sub.i containing data related to entities, the data being identified at the data sources DS.sub.i by the entity identifiers D of the entities, [0081] — a pseudonymised database DB wherein the data are identified applying respective pseudonyms P assigned, in a one-to-one manner, to each of the entity identifiers D, [0082] — a mapper M, [0083] — a key manager KM, and [0084] — modules implementing the above described functions and/or entities, which modules can be hardware, software, or combined hardware-software modules.
[0085] The key manager KM is preferably an apparatus comprising a processor adapted for executing a program and memory adapted for providing data writing, storage, and read-out functions. The program run on the apparatus is adapted to generate the data required for executing the mappings, for example the modular exponent adapted for generating a pseudonym from unencrypted data and the totient value of the modulus. The apparatus is adapted for storing these values such that they cannot be accessed by anybody else, but it can still be capable of performing computations utilizing them. In addition to that, it is also capable of computing modular exponent key pairs applying the above described process, for example the extended Euclidean algorithm, and of passing on the encrypted exponent to the data source over a secure data channel and computing the exponent applied for pseudonym mapping, which latter it can also pass on to the entity performing the mapping over a secure data channel. All these requirements are fulfilled for example by the above-mentioned Trusted Platform Module (TPM) circuits.
[0086] The mapper M.sub.j is preferably an apparatus that is adapted for reading any input parameters of modular exponentiation (base, exponent, modulus), as well as executing the operation and making the result available for readout. Such a module can for example be implemented as a general-purpose computer or microcontroller. TPM circuits also fulfil all the above listed requirements.
[0087] Another aspect of the invention is a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method according to the invention. The invention further relates to a computer-readable medium adapted for storing the above-mentioned computer program.
[0088] The invention can be applied for various purposes; one of these being the analysis of loyalty card purchase databases involving multiple stores. Let us assume that a company engaged in business analysis and market research activities prepares an analysis of typical customer behaviour in retail stores, which is then purchased by its clients. The analysis is aimed at defining customer groups based on characteristics like the products purchased, the frequency of purchases, the relationship between type and location of stores, the season of year, and the products purchased, etc.
[0089] In order to prepare the analysis, the company needs data. In addition to using publicly available statistical data, such companies often seek to motivate retailer chains and individual stores into cooperating with them. To facilitate that, they for example share part of their research results with the retailers so that they can improve the efficacy of their advertising and improve their selection of products. In many store purchase transactions, none of the characteristics of the customer are known. Although the data included in the receipt can be utilized, the only extra information it provides compared to product sale statistics is that it includes information on products sold during a single purchase transaction and the exact time and date thereof. At the same time, the stores can also offer loyalty card programs. Customers are offered various discounts for taking part in such programs. In the case of such purchases, personal information on the customer and other data thereof relevant for analytic purposes are known. Such data have been passed on (in varying detail) to market research companies by some of the stores (data sources), however, due to a change in legislation related to protecting personal data, this practice will soon end. So, the most important product of the market research company, the “retail market report” has become jeopardized. The regulation on personal data protection makes the above business impossible, although analysing the behaviour of customer groups does not require the possession of concrete personal data of any of the customers.
[0090] If those pieces of data that are applicable for personal identification are simply removed from the data passed on by the stores (except for, possibly, sex, age and postcode) then more valuable results can be obtained compared to those based on purchase receipts, but the information related to particular purchases of a given (anonymous) person at a given store is lost, although possessing and processing such information is not legally prohibited. The stores have therefore committed to use a made-up identifier, i.e. a pseudonym for the identification of the purchases of a given customer. This further improves analysability, but this way a customer who made purchases in different stores will be treated as multiple different persons if the mode of pseudonymization is not uniform.
[0091] The idea may arise that a mapping implemented utilizing a so-called “salted” cryptographic hash function can be applied to the personal data (such as name, sex, birth date, and postcode), but certain lawyers representing the stores may reject this option because the resulting hash data can be connected, by the entity performing the data analysis, to the personal data simply by registering itself as a store and compiling a rainbow table for example from the electoral register. The invention provides a solution to this problem. The implementation of the solution according to the invention can comprise a server software component that allows that the data sources DS.sub.i receive the cryptographic key generated by the key manager KM over an encrypted data channel after authentication at a web page. A computer implementation of the computations performed by the mapper M can be provided. The key generation and mapping service can be activated a cloud service provider such that its operation cannot be affected (except for starting and stopping it) by any of the entities; this setup can preferably also be audited. The mapping factors corresponding to the cryptographic keys of the stores are passed on by the key manager KM to the mapper M over an encrypted data channel, the mapper M then applying them for computing the pseudonym P.
[0092] Thereby, the ciphers generated individually by the different stores are mapped into the same value by the entire computational chain.
[0093] Thus, the analyses can be applied for picking out customers who typically make their purchases in a given store but usually buy a particular product somewhere else, or on certain days do their shopping at a different location shortly after store closure. These are valuable pieces of information that can support business decisions. For example, it is preferable to stock another brand of a particular product, or to close an hour later on Fridays.
LIST OF REFERENCE SIGNS
[0094] D entity identifier [0095] P pseudonym [0096] DS.sub.i data sources [0097] M mapper [0098] KM key manager [0099] d.sub.i inverse [0100] e.sub.i element applicable as cryptographic key (in a multiplicative algebraic structure) [0101] a.sub.1 element applicable as cryptographic key (in an additive algebraic structure) [0102] b secret element [0103] C.sub.i encrypted entity identifier [0104] h.sub.i mapping cryptographic keys [0105] DB database