METHOD AND SYSTEM FOR CONFIDENTIAL CLASSIFICATION OF DATA

Abstract

A computer platform includes an artificial neural network (ANN) as well as a classifier. The ANN is configured, after a learning phase, to transform an input data vector into a discriminating feature vector having a smaller dimension. A user then generates, from a plurality of reference data vectors, the same plurality of reference feature vectors, which are encrypted in an encryption module using the public key of a homomorphic cryptosystem and stored in a reference database of the platform. When the user requests the classification of an input data vector, the ANN, or a copy thereof, provides the classifier with a corresponding discriminating feature vector (y). Distances from the vector to the different reference feature vectors are calculated in the homomorphic domain and the index of the reference feature vector closest to y, i.e. the identifier i.sub.0 of the class to which it belongs, is returned to the user.

Claims

1: A confidential data classification platform comprising: an artificial neural network cascaded with a classifier, the artificial neural network being configured to be trained during a learning phase on data vectors from a learning database and to transform, in an operational phase, input data vectors into discriminative characteristic vectors, said discriminative characteristic vectors being provided to the classifier and having a size smaller than that of the data vectors; and a reference base in which are stored, during an initialisation phase of the classifier, reference characteristic vectors, obtained by transforming, using the artificial neural network or a copy thereof, reference data vectors, the reference characteristic vectors being stored in encrypted form by a public key of a homomorphic cryptosystem of a user, wherein, after the initialisation phase, when the user requests the classification platform to classify an input data vector, the classifier evaluates a classification function in a homomorphic domain, from a discriminative characteristic vector (y) provided by the artificial neural network or the copy thereof, and from the reference characteristic vectors stored in encrypted form in the reference base (Enc(y.sub.i.sup.ref,HE.pk), i=1, . . . , N), and transmits a result of the evaluation to the user.

2: The confidential data classification platform according to claim 1, wherein the artificial neural network is an auto-encoder.

3: The confidential data classification platform according to claim 1, wherein the homomorphic system is additive.

4: The confidential data classification platform according to claim 1, wherein the homomorphic system is a SHE or FHE system.

5: The confidential data classification platform according to claim 1, wherein: the user downloads from the platform a copy of the artificial neural network at an end of the learning phase, and the copy of the neural network is configured to provide, during the initialisation phase, the reference characteristic vectors from the reference data vectors, the user having a module for homomorphic encryption by the public key configured to encrypt the reference characteristic vectors and transmit them to the platform where they are stored in the reference base.

6: The confidential data classification platform according to claim 1, wherein the platform further comprises a second artificial neural network configured to perform in the homomorphic domain a transformation operation equivalent to that performed by the artificial neural network in a cleartext domain, the user having a module for homomorphic encryption by the public key configured to encrypt the reference data vectors and transmit them to the platform where they are transformed by the second neural network into reference characteristic vectors encrypted in the homomorphic domain before being stored in the reference base.

7: The confidential classification platform according to claim 5, wherein the classifier is configured to compute in the homomorphic domain a scalar product of the discriminative characteristic vector, y, with the reference characteristic vectors from the cleartext components (y.sub.1, . . . , y.sub.M) of the discriminative characteristic vector, and the encrypted components of the reference characteristic vectors, using $Enc ({.Math.}_{i = 1}^{M} y_{i, m}^{ref} y_{m}; HE . p k) = (\underset{m = 1, .Math., M}{\oplus} (Enc (y_{i, m}^{ref}, HE . p k) .Math. y_{m}))$ where ⊕ is an internal addition operation in the homomorphic domain and .Math. is an external multiplication operation between an element of the homomorphic domain and a cleartext.

8: The confidential classification platform according to claim 7, wherein said scalar product is computed using batch processing, by clustering different components of each reference characteristic vector within a first composite cleartext message, the first composite cleartext message being subsequently homomorphically encrypted to obtain a first composite encrypted message, and by clustering different components of the discriminative characteristic vector within a second composite cleartext message, the batch processing comprising an external multiplication of the encrypted first composite message with the second composite cleartext message.

9: The confidential data classification platform according to claim 7, wherein the user provides the classifier with the public key HE.pk of his/her homomorphic cryptosystem and the classifier is further configured to compute the ciphertext of a norm of the discriminative characteristic vector, Enc(∥y∥.sup.2,HE.pk).

10: The confidential data classification platform according to claim 9, wherein the classifier is further configured to compute, in the homomorphic domain, Euclidean distances between the discriminative characteristic vector (y) and the reference characteristic vectors (y.sub.i.sup.ref, i=1, . . . , N), from an encrypted norm of the discriminative characteristic vector, encrypted norms of the reference characteristic vectors, and respective scalar products between the discriminative characteristic vector and the reference characteristic vectors.

11: The confidential data classification platform according to claim 10, wherein the classifier compares the Euclidean distances in the homomorphic domain and returns to the user an index of the reference characteristic vector corresponding to a smallest Euclidean distance.

12: The confidential data classification platform according to claim 10, wherein the classifier returns to the user the Euclidean distances obtained in the homomorphic domain and the user derives therefrom using a private key of his/her homomorphic cryptosystem, respective probabilities that the input data vector corresponds to the reference data vectors.

13: A method for confidentially classifying data using an artificial neural network and a classifier, wherein the artificial neural network is trained during a learning phase on data vectors of a learning base, and the artificial neural network or a copy thereof transforms, in an operational phase, input data vectors into discriminative characteristic vectors, said discriminative characteristic vectors being provided to the classifier and having a size smaller than that of the data vectors, said method comprising: during an initialisation phase of the classifier, storing in a reference base reference characteristic vectors obtained by transforming, using the artificial neural network or the copy, reference data vectors, the reference characteristic vectors being stored in encrypted form by a public key of a homomorphic cryptosystem of the user; and after the initialisation phase, when the user requests classification of an input data vector from the classification platform, a classification function is evaluated by the classifier in a homomorphic domain, from a discriminative characteristic vector (y) provided by the artificial neural network and the reference vectors stored in encrypted form in the reference base (Enc(y.sub.i.sup.ref,HE.pk), i=1, . . . , N), a result of the evaluation being transmitted to the user.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] Further characteristics and advantages of the invention will become apparent from a preferred embodiment of the invention, described with reference to the accompanying figures, among which:

[0028] FIG. 1 schematically represents a confidential data classification system according to a first embodiment of the invention;

[0029] FIG. 2 represents the flowchart of a confidential data classification method according to a first embodiment of the invention;

[0030] FIG. 3 schematically represents a confidential data classification system according to a second embodiment of the invention;

[0031] FIG. 4 represents the flowchart of a confidential data classification method according to a second embodiment of the invention.

DETAILED DISCLOSURE OF PARTICULAR EMBODIMENTS

[0032] In the following, a confidential data classification system will be considered. Without loss of generality, a facial recognition application operating on image data will be referred to. However, the skilled person will understand that this system can be applied to the confidential classification of other data types such as speech data for speaker recognition or even traffic measurements in a computer network for recognition of a type of cyber attack or anomaly.

[0033] FIG. 1 schematically represents a confidential data classification system according to a first embodiment of the invention.

[0034] The confidential classification system, 100, also referred to as a confidential classification platform, first comprises an artificial neural network, 110, for learning, in a learning phase, discriminative characteristics from input data.

[0035] The input data may, where appropriate, have been pre-processed in a pre-processing module (not represented) before being provided to the artificial neural network. Examples of such pre-processing are a spatial or temporal Fourier transform as appropriate, a wavelet transform, a decomposition into an ad hoc orthogonal basis etc.

[0036] In any case, it will be assumed that the input data, whether raw data or pre-processed data, form a vector x in a space of dimension n, x∈ϵ.sup.n. For example, in the case of image data, n may be a number of pixels or a number of spectral components.

[0037] The artificial neural network is adapted to provide a compressed representation of the input data in the form of discriminative characteristics, y∈ϵ.sup.m, belonging to a space of reduced dimension m<n. The artificial neural network may especially be an auto-encoder comprising one or more hidden layers. This auto-encoder is trained in an unsupervised manner on a learning database, 150, herein an image database in the case of a facial recognition system. In the case of an image database, the artificial neural network may be a convolutional neural network.

[0038] The compression (or projection) function ƒ of the input data on the space of discriminative characteristics is such that if two input data vectors x.sub.1,x.sub.2 are close in the sense of a certain proximity relation in ϵ.sup.n, the corresponding discriminative characteristic vectors ƒ(x.sub.1), ƒ(x.sub.2) are also close in the sense of a norm in ϵ.sup.m. In other words, if two images in the learning database 150 are similar, their compressed representations in the characteristic space will be “close” in the sense of a certain metric from a norm of ϵ.sup.m.

[0039] Once the artificial neural network has been trained on the vectors x.sub.train of the learning database, in other words once it is initialised, the network is defined by a synaptic weight matrix (for example Python numpy file).

[0040] The user imports the synaptic weight matrix from the platform and builds a local neural network, 110′, with the same synaptic weights as the one trained on the platform.

[0041] According to a first embodiment, the user then provides reference data to the local neural network. These reference data are data that will later have to be recognised. For example, the reference data can be images of individuals that will later have to be identified. It is important to note that these reference data are generally not present in the learning database. The reference data vectors will be noted in the following, x.sub.1.sup.ref, . . . , x.sub.N.sup.ref.

[0042] The user thus obtains, output from the local neural network, discriminative characteristic vectors corresponding to these reference data vectors, y.sub.1.sup.ref=ƒ(x.sub.1.sup.ref), . . . , y.sub.N.sup.ref=ƒ(x.sub.N.sup.ref). These vectors will be referred to as reference characteristic vectors in the following.

[0043] Now it is assumed that the user wants to classify an unknown image represented by an input data vector, x. This vector is evaluated by the local neural network 110′, which provides a discriminative characteristic vector y=ƒ(x). Distances between the vector y thus obtained and the reference characteristic vectors, y.sub.i.sup.ref, i=1, . . . , N make it possible to classify the input data vector, x.

[0044] According to a first alternative, the classification can be performed by searching for the nearest neighbour, that is the reference characteristic vector with the smallest distance to the discriminative characteristic vector y:

[00002] $\begin{matrix} i_{0} = \underset{i = 1, .Math., N}{argmin} ({.Math. y - y_{i}^{ref} .Math.}^{2}) = F (y) & (1) \end{matrix}$

[0045] In other words, the reference data vector x.sub.i.sub.0.sup.ref is recognised in the input data vector x.

[0046] Alternatively, the classifier can compute probabilities that each reference data vector is recognised in the input data vector

[00003] $\begin{matrix} p_{i} = \frac{.Math. y - y_{i}^{ref} .Math.}{{.Math.}_{k = 1}^{N} .Math. y - y_{k}^{ref} .Math.} & (2) \end{matrix}$

[0047] According to a second alternative, several reference data vectors x.sub.i,1.sup.ref, . . . , x.sub.i,M.sup.ref can be associated with the same identifier i. This will be the case especially when M different images of the face of a same individual are evaluated by the artificial neural network and the M corresponding reference characteristic vectors are available. A k-nearest neighbour type classification can then be performed to determine the identifier.

[0048] The purpose of the invention, namely confidential classification over a confidential set of classes, is achieved in that the classifier 130 of the platform operates in the homomorphic domain. More precisely, the discriminative characteristic vectors are encrypted using a homomorphic encryption and the search for the nearest neighbour is performed in the homomorphic domain.

[0049] It is assumed that the user has a homomorphic cryptosystem, comprised of a private key, HE.sk, and a public key, HE.pk.

[0050] The reference characteristic vectors y.sub.i.sup.ref, i=1 . . . , N, obtained by means of the local neural network, are encrypted by the user using his/her public key, and then transmitted to the platform 100, which stores them in the reference base, 120. The reference characteristic vectors thus encrypted are noted Enc(y.sub.i.sup.ref,HE.pk). More precisely, Enc(y.sub.i.sup.ref,HE.pk)=(Enc(y.sub.i,1.sup.ref,HE.pk), . . . , Enc(y.sub.i,1.sup.ref,HE.pk)) is the vector whose components are the ciphertexts of the components of y.sub.i.sup.ref. In the same way, the user encrypts the norms of these vectors using his/her public key, Enc (∥y.sub.i.sup.ref∥.sup.2,HE.pk), i=1, . . . , N and transmits them to the platform which stores them in relation to the encrypted reference characteristic vectors, Enc(y.sub.i.sup.ref,HE.pk), i=1, . . . , N.

[0051] If the reference base is to be subsequently completed by the user, for example if a new face is to be added to those to be identified, the image of this face is provided to the local neural network 110′. The user encrypts the reference characteristic vector obtained, as well as its norm, by means of his/her public key HE.pk as previously explained. The user transmits the encrypted vector and norm to the platform, which stores them in the reference base.

[0052] According to one advantageous embodiment, the homomorphic cryptosystem is simply of the additive type, that is it satisfies the following relationships:

Enc(a,HE.pk)⊕Enc(b,HE.pk)=Enc(a+b,HE.pk) (3-1)

and

Enc(a,HE.pk).Math.k=Enc(ka,HE.pk) (3-2)

where ⊕ is an addition operation between ciphertexts in the homomorphic domain and .Math. is an external multiplication operation between a ciphertext and a clear.

[0053] It is considered here the scenario where a user wishes to classify (or identify) an input data vector x, for example the image of an individual.

[0054] The input data vector x is provided directly to the platform. The input data can be, for example, from a video stream directly transmitted to the platform without passing through the user. In this case, the neural network 110 performs the compression y=ƒ(x), and then provides the vector y to the classifier 130.

[0055] Alternatively (a variant represented by the dotted arrows in FIG. 1), the user has access to the cleartext input data vector and provides it to the local neural network 110′. The latter then transforms it into a discriminative characteristic vector, also in cleartext, y=ƒ(x) and then transmits the vector y to the classifier 130.

[0056] In any case, the classifier can compute in the homomorphic domain, by means of operations (3-1) and (3-2), the Euclidean distances between the (cleartext) vector y and the (encrypted) reference characteristic vectors y.sub.i.sup.ref:

[00004] $\begin{matrix} Enc ({.Math. y - y_{i}^{ref} .Math.}^{2}, HE . p k) = Enc ({.Math. y .Math.}^{2}, HE . p k) \oplus Enc ({.Math. y_{i}^{ref} .Math.}^{2}, HE . p k) \oplus (\underset{m = 1, .Math., M}{\oplus} (Enc (y_{i, m}^{ref}, HE . p k) .Math. y_{m})) .Math. (- 2) & (4) \end{matrix}$

where y=(y.sub.1, . . . , y.sub.M) and y.sub.i.sup.ref(y.sub.i,1.sup.ref, . . . , y.sub.i,M.sup.ref).

[0057] The first term is the homomorphically encrypted norm of the discriminative characteristic vector. This encrypted norm can be transmitted by the user. Alternatively, it can be easily computed by the classifier if it has the cleartext vector y and the public key HE.pk (or even the encrypted components Enc(y.sub.m,HE.pk), the encrypted norm being then obtained by

[00005] $\underset{m = 1, .Math., M}{\oplus} (Enc (y_{m}, HE . p k) .Math. y_{m}) .$

[0058] The second term is the homomorphically encrypted norm of the reference characteristic vector of index i, Enc(∥y.sub.i.sup.ref∥.sup.2,HE.pk): this term is read from the reference base where it is stored in relation to the ciphertext of the reference characteristic vector i, Enc(y.sub.i.sup.ref,HE.pk).

[0059] The third term is computed by the classifier in the homomorphic domain after retrieving from the reference base the encrypted components Enc(y.sub.i,m.sup.ref,HE.pk) stored in relation to the ciphertext of the reference characteristic vector of index i, Enc(y.sub.i.sup.ref,HE.pk). It will be understood that the part in brackets of the third term is none other than the ciphertext of the scalar product

[00006] $Enc ({.Math.}_{i = 1}^{M} y_{i, m}^{ref} y_{m}; HE . p k) .$

[0060] Finally, the sum of the three terms is obtained by the classifier in the homomorphic domain.

[0061] The comparison of distances can be performed in the homomorphic domain by means of Boolean circuits for the operator “>” (greater than) on binary representations of encrypted data, as described in the article by J. Garay et al. entitled “Practical and secure solutions for integer comparison”, published in T. Okamoto and X. Wang editors, Public Key Cryptography—PKC 2007, volume 4450 of Lecture Notes in Computer Science, pages 330-342. Springer Berlin, Heidelberg, 2007.

[0062] Finally, the index i.sub.0 of the reference characteristic vector with the minimum distance to the vector y,

[00007] $i_{0} = \underset{i = 1, .Math., N}{argmin} ({.Math. y - y_{i}^{ref} .Math.}^{2}),$

is transmitted to the user. Since the platform does not have access to the cleartext reference characteristic vectors, the result of the classification remains confidential to the platform.

[0063] According to one alternative, the classifier can transmit the Euclidean distances in encrypted form to the user. The user then decrypts the distance values with his/her private key HE.sk and can either search for the index of the vector with the smallest distance to the vector y (cf. expression (1)) or estimate the probabilities that the vector y falls into one of N classes of the classifier (cf. expression (2)).

[0064] So far it has been assumed that classification was based on the search for the nearest neighbour. However, other classification functions F can be contemplated, for example a classification function in the form of a linear function or a polynomial function of the data to be classified. The classification function associates a class with each input data vector. This class can be defined by a representative, for example a reference data vector or, equivalently, the reference characteristic vector obtained by transforming the previous one by means of the artificial neural network.

[0065] By way of example of a linear function, mention can be made of a hyperplane classifier and by way of example of a polynomial (quadratic) classification, mention can be made of a Gaussian classifier. Preferably, an additive homomorphic cryptosystem in the previous sense will no longer be used, but a somewhat homomorphic cryptosystem (SHE) or even a fully homomorphic cryptosystem (FHE).

[0066] Advantageously, a SHE or FHE encryption is used which allows batching operations to be carried out. A description of such an encryption method can be found in the article by J-S. Coron et al. entitled “Batch fully homomorphic encryption of the integers” published in Advances in Cryptology—EUROCRYPT 2013, Lecture Notes in Computer Science, vol 7881. Springer, Berlin, Heidelberg.

[0067] The principle of batch processing is to multiplex several cleartexts to form a composite cleartext in order to obtain a single ciphertext. Thus, instead of encrypting the cleartexts independently of each other, a composite cleartext message is encrypted that is built from the cleartexts in question.

[0068] Batch processing allows a same operation to be parallelised over a plurality of ciphertexts in the homomorphic domain. More precisely, if a plurality of first cleartexts are noted α.sub.1, . . . , α.sub.L, a first composite cleartext message built by batching from these first cleartexts is noted (α.sub.1, . . . , α.sub.L) and if the same plurality of second cleartexts are noted β.sub.1, . . . , β.sub.L, a second composite cleartext message built by batching from these second cleartexts is noted (β.sub.1, . . . , β.sub.L) and if the first and second composite messages in question are encrypted:

Enc(α.sub.1, . . . ,α.sub.L;HE.pk)=α (5-1)

Enc(β.sub.1, . . . ,β.sub.L;HE.pk)=β (5-2)

then an addition or multiplication operation can be performed in parallel on the ciphertexts in the homomorphic domain by computing α⊕β and αϵβ (the notation ϵ denotes the internal multiplication law) respectively, given that:

Dec(α⊕β;HE.sk)=(α.sub.1+β.sub.1, . . . ,α.sub.L+β.sub.L) (6-1)

Dec(αϵβ;HE.sk)=(α.sub.1×β.sub.1, . . . ,α.sub.L×β.sub.L) (6-2)

[0069] In the same way, an external multiplication on ciphertexts can be performed in parallel by computing k.Math.α, given that:

Dec(k.Math.α;HE.sk)=(kα.sub.1, . . . ,kα.sub.L) (7)

[0070] Furthermore, some homomorphic encryption methods, such as BGV, make it possible to perform accumulation of the constituent cleartexts of the composite cleartext, as described in the original article by Z. Brakerski et al. entitled “Fully homomorphic encryption without bootstrapping” published in Cryptology ePrint Archive, Report 2011/277. More precisely, a second ciphertext α′ can be obtained from the ciphertext α whose decryption gives:

[00008] $\begin{matrix} Dec ({\overline{α}}^{'}; HE . sk) = ({.Math.}_{ℓ = 1}^{L} α_{ℓ}, .Math., {.Math.}_{ℓ = 1}^{L} α_{ℓ}) & (8) \end{matrix}$

In other words, after decryption, a second composite message is obtained whose constituent elements are all equal to the sum of the cleartexts composing the original composite message.

[0071] The above-mentioned batch processing properties allow the computation of Euclidean distances in the homomorphic domain in a small number of operations. Indeed, in this case, a single homomorphic ciphertext represents all the components of a reference characteristic vector y.sub.i.sup.ref(y.sub.i,1.sup.ref, . . . , y.sub.i,M.sup.ref). The third term of expression (4) can then be computed by means of only two homomorphic operations, regardless of the dimension M of the characteristic space. Two additional additions in the homomorphic domain allow the computation in the homomorphic domain of a Euclidean distance to completed. The computation of Euclidean distances is thereby particularly efficient.

[0072] Alternatively, batch processing can be carried out component by component on all the reference characteristic vectors. More precisely, each ciphertext represents all the components of the same rank of the different reference characteristic vectors. This batch processing is adapted to the case where the classification function is more complex than that of a search for the nearest neighbour, especially when it is expressed as a polynomial function of the data to be classified.

[0073] FIG. 2 represents the flowchart of a confidential data classification method according to one embodiment of the invention.

[0074] This method implements an artificial neural network cascaded with a classifier of a classification platform as represented in FIG. 1.

[0075] In a first phase, called a learning phase, the artificial neural network, advantageously an auto-encoder, is trained in 210 on a learning database. Thus the auto-encoder can be trained in an unsupervised manner on images stored in this database. At the end of this learning phase, the artificial neural network is able to represent the input data vectors (an image) in a characteristic space of lower dimension (embedding).

[0076] The user then downloads the file of synaptic coefficients from the neural network thus trained to build in 220 a local neural network, NN′, identical to the one of the platform.

[0077] An operational phase follows this learning phase. This includes an initialisation phase of the homomorphic classifier followed by a classification phase.

[0078] In the initialisation phase, the user provides the platform with a plurality of reference data vectors (for example a set of images of faces to be recognized), encrypted in the homomorphic domain.

[0079] These reference data vectors are transformed in 230 by the local neural network into reference characteristic vectors. In step 240, the reference characteristic vectors are homomorphically encrypted by the user using his/her public key HE.pk. Also, the norms of these reference characteristic vectors are homomorphically encrypted. In step 250, the user transmits the vectors and corresponding norms thus encrypted to the platform to be stored in the reference base. It should be noted that the platform is agnostic because it does not have access to the cleartext reference data (for example, the faces to be recognised).

[0080] In the classification phase itself, the user can have the platform classify an input data vector (for example images). This input data vector is first transformed in 260 by the neural network NN of the platform into a discriminative characteristic vector, which is then provided to the classifier. Alternatively, as seen above, the input data vector can be transformed by the user's local neural network and then transmitted by the latter to the platform.

[0081] In any case, the classifier evaluates in 270 the classification function F in the homomorphic domain, based on the discriminative characteristic vector and the reference characteristic vectors stored in encrypted form in the reference base. Finally, it transmits the result of the evaluation to the user in 280.

[0082] The evaluation result may be the index of the reference characteristic vector (and thus the index of the corresponding reference data vector) with the smallest distance to the discriminative characteristic vector, y. Alternatively, the Euclidean distances homomorphically encrypted, Enc(∥y−y.sub.i.sup.ref∥.sup.2,HE.pk), i=1, . . . , N can be transmitted to the user who decrypts them with his/her secret key HE.pk and can determine, by means of expression (2), the respective probabilities that the input data vector corresponds to the different reference data vectors.

[0083] FIG. 3 schematically represents a confidential data classification system according to a second embodiment of the invention.

[0084] As in the first embodiment, the classification platform includes an artificial neural network, 310, cascaded with a classifier 330, adapted to operate in the homomorphic domain.

[0085] The artificial neural network is trained on data vectors x.sub.train read from the learning database 350. Furthermore, the platform builds the equivalent neural network in the homomorphic domain, 315, hereafter referred to as NNH.

[0086] Unlike the first embodiment, the reference data vectors, x.sub.1.sup.ref, . . . , x.sub.N.sup.ref, are encrypted by the user in the homomorphic domain and transmitted to the platform, 300, to be provided to the neural network NNH. Specifically, the neural network NNH performs equivalent of the compression ƒ in the homomorphic domain, in other words it outputs the homomorphic ciphertext of the reference characteristic vector, Enc(y.sub.i.sup.ref,HE.pk) from the homomorphic ciphertext of the reference data vector, Enc(x.sub.i.sup.ref,HE.pk). This compression operation in the homomorphic domain is performed once and for all, on the set of reference data vectors x.sub.i.sup.ref, . . . , x.sub.N.sup.ref. The homomorphic ciphertexts of the reference characteristic vectors, Enc(y.sub.i.sup.ref,HE.pk), i=1, . . . , N, are stored in the reference database, 320, the homomorphic ciphertext Enc(y.sub.i.sup.ref,HE.pk) being represented by an encrypted component vector Enc (y.sub.i,m.sup.ref,HE.pk), m=1, . . . , M. The norm of each vector y.sub.i.sup.ref, i=1, . . . , N is deduced therefrom in the homomorphic domain, that is Enc(∥y.sub.i.sup.ref∥.sup.2,HE.pk), and is stored in the reference base 320 in relation to the vector Enc(y.sub.i.sup.ref,HE.pk).

[0087] Again, this is a scenario where the user wishes to classify an input data vector x (for example an image from a video stream for facial recognition). This vector is usually provided directly to the platform without passing through the user. In some cases, however, the user could provide the input data vector.

[0088] The neural network of the platform performs compression of the input data vector to deduce the discriminative characteristic vector y therefrom, and then provides it to the classifier 330. The function of the classifier 330 is identical to that of the classifier 130. In particular, the classifier 330 computes in the homomorphic domain the respective distances between the discriminative characteristic vector and the different reference vectors. It returns to the user the index of the reference vector corresponding to the smallest distance or the distances to these vectors, encrypted in the homomorphic domain.

[0089] FIG. 4 represents the flowchart of a confidential data classification method according to a second embodiment of the invention.

[0090] Unlike the first embodiment, the learning phase comprises, further to a training step, 410, of the neural network NN (identical to step 210 of FIG. 2), a step 425 in which the platform builds a neural network NNH performing equivalent of the compression ƒ in the homomorphic domain.

[0091] This learning phase is followed by an operational phase, comprising an initialisation phase 435-450 and a classification phase 460-480.

[0092] In step 435, the user encrypts the reference data vectors x.sub.1.sup.ref, . . . , x.sub.N.sup.ref and then transmits them to the platform.

[0093] In step 440, the neural network NNH performs compression of the reference data vectors in the homomorphic domain to obtain the reference characteristic vectors, encrypted in the homomorphic domain, Enc(y.sub.i.sup.ref,HE.pk), i=1, . . . , N. The norms of these vectors are also computed in the homomorphic domain, Enc(∥y.sub.i.sup.ref∥.sup.2,HE.pk), i=1, . . . , N.

[0094] In step 450, these vectors (as well as their norms) are stored in the reference base of the platform.

[0095] The classification phase comprises steps 460-480 as in the first embodiment. Step 460 may differ from step 260 in that the input data vector X to be classified is still provided to the platform to be transformed by the artificial neural network NN into a discriminative characteristic vector y. In other words, the compression is here performed by the neural network of the platform and never by the local neural network of the user.

[0096] On the other hand, the following steps 470 and 480 are respectively identical to the steps 270 and 280 described in connection with FIG. 2.

[0097] At the end of the classification phase, the platform transmits to the user the index of the reference characteristic vector with the smallest distance to the discriminative characteristic vector, y, or the Euclidean distances homomorphically encrypted, Enc(∥y−y.sub.i.sup.ref∥.sup.2,HE.pk).

[0098] The second embodiment differs from the first one in that the artificial neural network is not shared between the platform and the user. It therefore implies that the platform performs compression of the reference data vectors into the homomorphic domain, which, as has been seen in the introductory section, can be highly detrimental to the computational volume. However, it is essential to note that, unlike prior art, the operation of the network in the homomorphic domain takes place here only in the initialisation phase and not at each classification request.

METHOD AND SYSTEM FOR CONFIDENTIAL CLASSIFICATION OF DATA

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/088

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

H04L9/008

ELECTRICITY

Classification Explorer

G06F21/6254

PHYSICS

Classification Explorer

H04L2209/42

ELECTRICITY

International classification

Classification Explorer

H04L9/00

ELECTRICITY

Classification Explorer

G06F21/62

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Abstract

Claims

Description