Method and unit of operating a storage means, storage means and system for data processing

Abstract

A method of operating a storage means, wherein for writing and storing a storage item to the storage means the storage item to be written and stored—in particular by using the concept and theory of identification—is provided, a encoding process by means of randomization is applied to the storage item to generate and to provide a randomized encoded storage item, and the randomized encoded storage item is written and stored to the storage means. At least a first randomization process is underlying the encoding process and is a randomization process dedicated and assigned to the underlying storage means. The present disclosure further refers to a unit for operating a storage means, to a storage means and to a system for processing data. By having two randomization processes underlying the encoding process, a distinction can be made between a secrecy insuring and secrecy non-ensuring randomization processes.

Claims

1. A method (S) of operating a storage means (10), the method comprising: writing and storing a storage item (d) to the storage means (10), wherein the writing and storing comprises: providing (S1) the storage item (d) to be written and stored, applying, to the storage item (d), an encoding process (S2) by means of randomization in order to generate and to provide a randomized encoded storage item (U.sup.k), and writing and storing (S3) the randomized encoded storage item (U.sup.k) to the storage means (10), wherein at least a first randomization process (S4) is underlying the encoding process (S2), and wherein said first randomization process (S4) is a randomization process dedicated and assigned to the underlying storage means (10), wherein the encoding process (S2) and an underlying encoder (Φ.sub.d) for the encoding process (S2) are configured in order to generate from the provided storage item (d) the encoded storage item (U.sup.k) based on a source item (X.sup.n) obtained from a discrete memoryless source (30, 40) as a concatenation of (i) helper data (M) derived by a dedicated unit of the encoder (Φ.sub.d), (ii) a helper message ({acute over (M)}) also derived by a dedicated unit of the encoder (Φ.sub.d), and (iii) an image of common randomness (K) under a mapping (T.sub.d) characteristic for an underlying identification protocol and corresponding to the storage item (d), encrypted using a secret key ({acute over (K)}), in particular by group composition, and wherein said common randomness (K) and said secret key ({acute over (K)}) are generated and derived by dedicated units of the encoder (Φ.sub.d) and/or based on the storage item (d) and the source item (X.sup.n) obtained from a public source (30), a Physical Unclonable Function (PUF) source (40) and/or a general and underlying discrete memoryless multiple source (P.sub.xy) on an underlying alphabet (X×Y).

2. The method (S) according to claim 1, wherein at least one second randomization process (S5) is underlying the encoding process (S2).

3. The method (S) according to claim 2, wherein a respective randomization process of the first randomization process (S4) and the second randomization process (S5) is obtained from and/or based on a discrete memoryless multiple source with respect to one or multiple underlying probability distributions and alphabets.

4. The method (S) according to claim 2, wherein the second randomization process (S5) is a randomization process dedicated to a particular hardware item.

5. The method (S) according to claim 4, wherein the second randomization process (S5) is based on a physical unclonable function (PUF) signature of the underlying hardware item.

6. The method (S) according to claim 1, wherein the first randomization process (S4) is a public randomization process.

7. The method (S) according to claim 1, wherein the method further comprises identifying within the storage means (10) the presence or absence of a storage item (d), wherein the identifying includes providing (S1) the storage item (d) to be identified with respect to its presence or absence in the storage means (10), applying a decoding process (S6) for identification by means of randomization to the storage item (d) in order to generate and to provide a randomized encoded storage item (U.sup.k′), checking the randomized encoded storage item (U.sup.k′) to be identified (S7) within the storage means (10), and generating and/or outputting (S8) an identification message being representative for the presence or absence of the randomized encoded storage item (U.sup.k′) in the storage means (10).

8. The method (S) according to claim 7, wherein the encoding process (S2) and its underlying encoder (Φ.sub.d) and/or the decoding process (S6) and an underlying decoder (Ψ.sub.d) are configured, such that by taking into account said helper data (M) and said helper message (M) conveyed with the encoded storage item (U.sup.k′) written to the storage means (10) (a) together with a source item (Y.sup.n) obtained from an underlying randomized source (30, 40) the decoder (Ψ.sub.d) is capable of reconstructing common randomness ({circumflex over (K)}) and a secret key ({circumflex over (K)}) as attempts or approximations of common randomness (K) and the secret key (K) at the site of an encoder (Φ.sub.d) respectively, and equaling the same with a high degree of probability, and (b) the decoder (Ψ.sub.d) is capable of reconstructing the image (T.sub.d(K)) of common randomness (K) from an encrypted image (T.sub.d(K)⊕K) of common randomness (K) and by using the inverse and thus decrypted form (−{circumflex over (K)}) of the secret key (K).

9. The method (S) according to claim 8, wherein for the identification process (S7) and/or for the outputting process (S8) regarding the identification message, the decoding process (S6) and its underlying decoder (Ψ.sub.d) are configured in order to compare reconstructions (T.sub.d(K), T.sub.d(K)) of an underlying secret key (K) for an interested storage item (d) and for any storage item (d) stored in the storage means (10) in view of the mapping (T.sub.d) being characteristic for the underlying identification protocol and in particular to output a confirming message in case that the reconstructions coincide for at least one storage item (d) stored in the storage means (10) and to output a non-confirming message in case that the reconstructions do not coincide for each storage item (d) stored in the storage means (10).

10. A unit for operating a storage means (10), which is configured to initiate, perform and/or control a method according to claim 1.

11. A storage means (10), which is configured to store storage items and comprises the unit according to claim 10 and/or a connection to the unit according to claim 10.

12. A system (1) for data processing, which comprises a storage means (10) according to claim 11.

13. A method (S) of operating a storage means (10), the method comprising: identifying within the storage means (10) the presence or absence of a storage item (d), wherein the identifying comprises: providing (S1) the storage item (d) to be identified with respect to its presence or absence in the storage means (10), applying to the storage item (d) a decoding process (S6) for identification by means of randomization d in order to generate and to provide a randomized encoded storage item (U.sup.k′), checking the randomized encoded storage item (U.sup.k′) to be identified (S7) within the storage means (10), and generating and/or outputting (S8) an identification message being representative for the presence or absence of the randomized encoded storage item (U.sup.k′) in the storage means (10), and wherein the decoding process (S6) and an underlying decoder (Ψ.sub.d) are configured, such that by taking into account helper data (M) and a helper message (M) conveyed with the encoded storage item (U.sup.k′) written to the storage means (10) (a) together with a source item (Y.sup.n) obtained from an underlying randomized source (30, 40), the decoder (Ψ.sub.d) is capable of reconstructing common randomness ({circumflex over (K)}) and a secret key ({circumflex over (K)}) as attempts or approximations of common randomness (K) and the secret key (K) at the site of an encoder (Φ.sub.d), respectively, and equaling the same with a high degree of probability, and (b) the decoder (Ψ.sub.d) is capable of reconstructing an image (T.sub.d(K)) of common randomness (K) from an encrypted image (T.sub.d(K)⊕K) of common randomness (K) and by using the inverse and thus decrypted form (−{circumflex over (K)}) of the secret key (K).

Description

(1) These and further details, advantages and features of the present invention will be described based on embodiments of the invention and by taking reference to the accompanying figures.

(2) FIG. 1 is a schematic block diagram of the first embodiment of the storage system and the method for operating a storage means according to the present invention with a particular focus on the process of encoding and writing a provided storage item.

(3) FIG. 2 is a schematic block diagram of the further embodiment of the storage system and the method for operating a storage means according to the present invention with a particular focus on the process of identifying a provided storage item with respect to its possible presence or absence in an underlying storage means.

(4) FIGS. 3 and 4 are schematic block diagrams of further embodiments of the storage system and the method of operating a storage means according to the present invention with a particular focus on a single PUF source and a PUF source combined with the public source, with all sources being formed as discrete memoryless multiple sources.

(5) In the following embodiments and the technical background of the present invention are presented in detail by taking reference to accompanying FIGS. 1 to 4. Identical or equivalent elements and elements which act identically or equivalently are denoted with the same reference signs. Not in each case of their occurrence a detailed description of the elements and components is repeated.

(6) The depicted and described features and further properties of the invention's embodiments can arbitrarily be isolated and recombined without leaving the gist of the present invention.

(7) The present invention refers to a method S of operating a storage means 10, wherein for writing and storing a storage item d to the storage means 10 the storage item d to be written and stored—in particular by using the concept and theory of identification—is provided S1, a encoding process S2 by means of randomization is applied to the storage item d in order to generate and to provide a randomized encoded storage item U.sup.k, and the randomized encoded storage item U.sup.k is written and stored S3 to the storage means 10. At least a first randomization process S4 is underlying the encoding process S2. Said first randomization process S4 is a randomization process dedicated and assigned to the underlying storage means 10. The present invention further refers to a unit for operating a storage means 10, to a storage means 10 and to a system 1 for processing data. By having two randomization processes S4, S5 underlying the encoding process S2, a distinction can be made between a secrecy insuring and secrecy non-ensuring randomization processes.

(8) FIG. 1 is a schematic block diagram of the first embodiment of the data processing system 1 and the method S for operating a storage means 10 according to the present invention with a particular focus on the process S2 of encoding and S3 of writing a provided message or storage item.

(9) In FIG. 1, a message or storage item d chosen from a set custom character of messages or storage items d is provided according to process S1 and applied to a process S2 of encoding the storage item d based on a combination of randomization processes S4 and S5 which are based on a public source 30 dedicated to the underlying storage means 10 and a PUF source 40 which is secret and given to an authorized user or group of authorized users, only.

(10) The process S2 of encoding the storage item d or message is realized by an encoder Φ.sub.d as already defined above and as further elucidated below and it yields an encoded storage item U.sup.k or message which is written by a process S3 and thereby stored into the underlying storage means 10, which is for instance realized by a public database.

(11) The process S2 of encoding the storage item d as realized by the encoder Φ.sub.d is thus dependent on the underlying message or storage item d to be written or stored as well as on the source inputs X.sup.n provided by the first and second randomization processes S4 and S5 and its underlying sources 30 and 40, respectively, and eventually on the further concrete nature and properties of the encoder Φ.sub.d.

(12) FIG. 2 is a schematic block diagram of the further embodiment of the data processing system 1 and the method S for operating a storage means 10 according to the present invention with a particular focus on the process S7 of identifying a provided storage item d with respect to its possible presence or absence in the underlying storage means 10.

(13) FIG. 2 elucidates in more detail the identification process S8 for a message or storage item d to be checked. First of all the storage item d to be checked regarding its presence or absence in the storage means 10 is provided S1 to a process S6 of decoding the storage item d. Said process S6 of decoding is based on one or plural randomization processes S4 and S5 and respective underlying sources 30 and 40, respectively, too, and in particular on the source inputs Y.sup.n and it is realized by a decoder Ψ.sub.d as already described above and as further elucidated below.

(14) The result of the decoding process S6 is provided to a process S7 of identification which controls—by a process S7′—a subsequent process S8 of outputting an identification message.

(15) Said identification message provided by the process S8 of outputting yields a confirming result and for instance a “yes” in case that the investigated message d has an instance or a representation stored in the storage means 10. The process S8 of outputting an identification message yields a result confirming the absence of the investigated message d and for instance a “no” if the storage means 10 does not contain any instance or representation for the investigated message d.

(16) FIGS. 3 and 4 are schematic block diagrams of further embodiments of the data processing system 1 and the method S of operating a storage means 10 according to the present invention, namely with a particular focus on the provision of a single PUF source 40 and the provision of a PUF source 40 combined with the public source 30, respectively, with all sources 30, 40 being preferably formed as a discrete memoryless multiple source.

(17) FIGS. 3 and 4 in more detail elucidate the presence of one or two discrete memoryless multiple sources 30 and 40, namely of the single PUF source 40 in FIG. 3 and dedicated to a certain external hardware item on the one hand and a public source 30 together with a PUF source 40 on the other hand in FIG. 4.

(18) In addition, the attack of an eavesdropper 20 is elucidated in FIGS. 3 and 4 having access to the public database 10 but having no access to the PUF source 40 and its corresponding data for reconstructing an underlying secret key K as elucidated already above and as further explained below. Consequently, the eavesdropper 20 is not capable of identifying any storage item or message d—detecting and/or confirming the presence or absence—which might have been written and stored to the storage means 10 making use of the data of the PUF source 40.

(19) More details on the embodiments shown in FIGS. 3 and 4 are provided in the following sections.

(20) Thus, these and further aspects of the present invention will also be described in detail in the following:

(21) General Technical Formulation

(22) In connection with the present invention, secure storage on a public database such that a stored messages can be identified is considered. It is assumed that legitimate users have access to the output of a source. This source is configured and used to generate common randomness which is used for identification. A protocol is defined for secure storage for identification such that the number of messages that can be identified grows doubly exponentially with the number of symbols read from the source. In addition, privacy leakage of the protocols used for identification is considered.

(23) In the following, some of the aspects of identification underlying the present invention will first of all be developed by means of the concept of point-to-point transmissions and will then be used and applied in order to develop a storage for identification model on which the present invention resides:

(24) One of the most basic models in information theory is the discrete memoryless channel or DMC for point to point transmission. This concept has been introduced by C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379 to 423, 1948, [1]. For this model, the Shannon capacity is defined as the supremum of all achievable transmission rates.

(25) Informally speaking, a rate is achievable if it is possible to transmit messages at this rate, while the message sent can be reconstructed from the channel output with high probability. The number of messages that can reliably be transmitted for this notion of achievability grows exponentially with the block length. In addition to the Shannon capacity, the identification capacity may be introduced as set forth by R. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 15 to 29, 1989, [2].

(26) Here again point to point transmission over a discrete memoryless channel is considered, but the definition of achievability is different.

(27) The decoder now does not try to find out the message that has been sent from the channel output, but the decoder is interested in a distinct or single message and tries to find out whether or not this message has been, i.e. the decoder tries to identify the message. Of course the sender does not know the message the receiver is interested in.

(28) In this scenario the probability that the receiver correctly identifies the message should be close to one. For this notion of achievability the number of messages that can reliably be identified grows doubly exponentially with the block length.

(29) A corresponding strong converse result has been found T. S. Han and S. Verdu, “New results in the theory of identification via channels”, IEEE Transactions on Information Theory, vol. 38, no. 1, pp. 14 to 25, 1992, [3].

(30) The further development of the concept of identification used for and applied to a model of storage for identification can be further motivated by having a look at possibly use cases taken from a variety of possible applications one can think of in the context of the present invention: Storing private information of users in public data clouds by a service provider. Storage of data in industrial applications on databases, for example data (production data, operating data, etc.) of the industry in the operation of complex equipment with subsequent identification of errors or exceptional situations. Storage of large amounts of data in public space (for example video surveillance) for the later identification of persons or danger situations, whereby on the one hand certain persons/events should be identified and on the other hand the privacy of other persons should be protected.

(31) In addition, there exists a variety of execution examples, one of which being given by the following scheme: Data is stored on a database by an authority. When identifying persons and events, the police may access the database. The police tries to identify the specific person k. This task can be allowed by judges. Then the police must use a decoder D.sub.k for person k for identification. The police only get the answer yes or no. Furthermore, decoder D.sub.k cannot be used to identify another person because the decoder always provides a wrong answer for another person.

(32) In this context two different models are considered: 1. Secure storage on a public database for identification: In particular, a physical unclonable function (PUF) source is involved. A PUF source is essentially equivalent to a biometric source. We assume that the output of a biometric source uniquely characterizes a person whereas the output of a PUF source uniquely characterizes a device. This allows us to use the output of a PUF source for secure storage. Considering the secure storage for identification process depicted in FIG. 4, the process consists of two phases. In the first phase the system gets the message d that should be stored on the database consisting of k storage cells which can each store a value from the alphabet custom character . The system reads X.sup.n from the PUF source. The system then generates U.sup.k i.e. the item to be stored for the message d—from X.sup.n using an encoder depending on d and stores U.sup.k on the public database. In the second phase the system reads U.sup.k from the database and Y.sup.n from the PUF source. The system then uses a decoder, depending on the message d of interest, in order to decide whether d is a message stored on the underlying database, making use of Y.sup.n and U.sup.k. 2. Storage for identification model with two sources: The secure storage for identification process depicted in FIG. 4 is considered. The process consists of two phases. In the first phase the system gets the message d that should be stored on the database consisting of k storage cells which can each store a value from the alphabet u. We use a for timesharing between the PUF source and the public source. The system reads X.sub.1.sup.┌αn┐ from the public source and X.sub.2.sup.n−┌αn┐ from the PUF source. The system then generates U.sup.k from (X.sub.1.sup.┌αn┐,X.sub.2.sup.n−┌αn┐) using an encoder depending on d and stores U.sup.k on the public database. In the second phase the system reads U.sup.k from the database, Y.sub.1.sup.┌αn┐ from the public source and Y.sub.2.sup.n−┌αn┐ from the PUF source. The system then uses a decoder, depending on the message d of interest in order to decide whether d is a message stored on the database, making use of (Y.sub.1.sup.┌αn┐,Y.sub.2.sup.n−┌αn┐) and U.sup.k.
Further Technical Background

(33) In the following, additional technical background for better understanding the present invention's gist and its differences when compared to common strategies of transmitting, writing and/or storing storage items or messages is summarized:

(34) Storing is traditionally only performed in the Shannon picture. Here, all messages are stored so that exponentially many messages can be stored. When reading the memory contents, the question is answered which message was stored.

(35) R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography. ii. cr capacity”, IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 225 to 240, 1998, [4], defined the so called source model for generating common randomness.

(36) Common randomness plays an important role for identification. In addition to R. Ahlswede and I. Csiszar [4], it is described by R. Ahlswede and V. B. Balakirsky, “Identification under random processes”, Citeseer, 1995, [5], how to make use of the common randomness generated from the source to reliably identify a message by sending a helper message over a channel. Here the number of messages that can reliably be identified grows doubly exponentially with the number of symbols read from the source.

(37) Security is a key requirement for modern communication and storage systems. A promising approach to realize security is physical layer security based on information theoretic security.

(38) A basic model considered in information theoretic security is the wiretap channel as discussed by A. D. Wyner, “The wire-tap channel”, Bell Labs Technical Journal, vol. 54, no. 8, pp. 1355 to 1387, 1975, [6], and I. Csiszar and J. Korner, “Broadcast channels with confidential messages”, IEEE transactions on information theory, vol. 24, no. 3, pp. 339-348, 1978, [7].

(39) In this background context and in contrast to point-to-point transmissions it is preferably assumed in the context of the present invention that an attacker or eavesdropper has access to the message sent via an additional discrete memoryless channel. In particular, the present invention is concerned with protocols which allow for reliable communication between the legitimate users while making it hard for an attacker to decode the message from the channel output he has access to. The number of messages that can reliably and securely be transmitted in this scenario grows exponentially with the block length.

(40) According to R. Ahlswede and Z. Zhang, “New directions in the theory of identification via channels”, IEEE transactions on information theory, vol. 41, no. 4, pp. 1040 to 1050, 1995, [8], identification for the wiretap channel is considered. It can be shown that the number of messages that can reliably be identified as described above in this case grows doubly exponentially with the block length. The secure identification capacity even equals the Shannon capacity of the main channel. This result can be generalized according to H. Boche and C. Deppe, “Secure identification for wiretap channels; robustness, super-additivity and continuity”, IEEE Transactions on Information Forensics and Security, 2018, [9], and “Secure identification under jamming attacks”, in Information Forensics and Security (WIFS), 2017 IEEE Workshop on. IEEE, 2017, pp. 1 to 6, [10], robust identification for wiretap channels is considered.

(41) For the source model one can also consider secret key generation as indicated by R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography—Part i: secret sharing”, IEEE Transactions on Information Theory, vol. 39, no. 4, 1993, [11].

(42) T. Ignatenko and F. M. Willems, “Biometric security from an information theoretical perspective”, Now, 2012, [12], and L. Lai, S.-W. Ho, and H. V. Poor, “Privacy security trade offs in biometric security systems”, in Communication, Control, and Computing, 2008, 46.sup.th Annual Allerton Conference on IEEE, 2008, pp. 268 to 273, [13], interpret the discrete memoryless source from the source model as a biometric source and they consider the privacy leakage of the protocols for secret key generation.

(43) Some results concerning common randomness and secret key generation from a discrete memoryless multiple source are essential for the present invention. In the following common randomness is also referred to as CR, a secret key is also referred to SK, and a discrete memoryless multiple source is also referred to as DMMS.

(44) In the following, particular information theoretic entities and requirements for defining the present invention will be motivated, introduced and defined:

(45) First of all, in the context of the present invention inter alia the following information theoretic model is considered:

(46) Definition 1.

(47) Let n∈ custom character a natural number. The source model consists of a discrete memoryless multiple source (DMMS) P.sub.XY, a (possibly randomized) encoder F:.sup.n.fwdarw.× and a (possibly randomized) decoder G: .sup.n×.fwdarw.. Let X.sup.n and Y.sup.n be the output of the DMMS. The random variables or RVs (K,M) are generated from X.sup.n using F and the random variable {circumflex over (K)} is generated from (Y.sup.n,M) using G. We call (F,G) a common randomness/secret key or CR/SK generation protocol.

(48) In addition, the generation of common randomness or CR as introduced above is considered.

(49) Definition 2.

(50) Let L≥0. The item R(L)≥0 is called an achievable common randomness or CR generation rate with forward communication rate constraint L for the source model if for every δ>0 there is an n.sub.0=n.sub.0(δ) such that for all n≥n.sub.0 there is a common randomness/secret key or CR/SK generation protocol such that the relations

(51) $\begin{matrix} \frac{1}{n} \log .Math. ℳ .Math. \leq L + δ P r (K = \hat{K}) \geq 1 - δ \frac{1}{n} H (K) \geq R - δ & (1) \\ \frac{1}{n} \log .Math. 𝒦 .Math. \leq c & (2) \end{matrix}$
are fulfilled for a c>0. The corresponding CR/SK generation protocols are called common randomness or CR generation protocols with rate constraint. The supremum of all achievable CR generation rates with forward communication rate constraint L is denoted by CR capacity C.sub.CR(L).

(52) Remark 1.

(53) The last achievability requirement (2) is required in order to avoid protocols where the CR is generated deterministically while H(K) is arbitrarily large. It can be motivated to require custom character =by arguing that, together with the bound on |, this implies an arbitrarily small distance between

(54) $\frac{1}{n}$
H({circumflex over (K)}) and

(55) $\frac{1}{n}$
H(K). So custom character = is required.

(56) Remark 2.

(57) It can be seen that for each CR generation protocol with rate constraint one can find a CR generation protocol with rate constraint such that

(58) $\underset{k \in 𝒦}{.Math.} .Math. P_{K} (k) - \frac{1}{.Math. 𝒦 .Math.} .Math. < \exp (- n c)$
is valid for a c>0. That is why in the following one can always consider such protocols where the distribution of the common randomness CR is in this sense near the uniform distribution.

(59) In Ahlswede and Csiszar 1998 C.sub.CR(L) has been further characterized.

(60) Also privacy leakage for the source model is considered. This makes sense when one assumes that the DMMS, that is part of the source model, models a PUF source.

(61) Definition 3.

(62) A triple (R.sub.CR,R.sub.FC,R.sub.PL), R.sub.CR,R.sub.FC,R.sub.PL≥0 is called an achievable CR generation rate versus forward communication rate versus privacy leakage rate triple for the source model if for every δ>0 there is an n.sub.0=n.sub.0(δ) such that for all n≥n.sub.0 there is a CR/SK generation protocol such that the relations

(63) $P r (K = \hat{K}) \geq 1 - δ$ $\frac{1}{n} \log .Math. ℳ .Math. \leq R_{FC} + δ$ $\frac{1}{n} I (M; X^{n}) \leq R_{P L} + δ$ $\frac{1}{n} H (K) \geq R_{CR} - δ$ $\frac{1}{n} \log .Math. 𝒦 .Math. \leq c,$
are fulfilled for a c>0. The corresponding CR/SK generation protocols are referred to as private CR generation protocols. The set of all rate triples that are achievable using private CR generation protocols is referred to as the CR capacity region custom character .sub.CR.

(64) In the context of the present invention, one is interested in custom character .sub.CR. In a first approach one considers private CR generation protocols with deterministic encoders and decoders (f, g).

(65) The corresponding CR capacity region is denoted by custom character .sub.CR.sup.d.

(66) In Ahlswede and Csiszar 1998, deterministic CR generation protocols with rate constraint have been considered and the corresponding capacity has been characterized, which is here referred to as C.sub.CR.sup.d(L).

(67) They following property is valid:

(68) Theorem 1.

(69) It holds that

(70) $C_{C R}^{d} (L) = \max_{V} I (V; X),$
where the maximization runs over all random variables V such that the property V−X−Y and the property I(V;X)−I(V;Y)≤L are fulfilled. One also only has to consider random variables V obeying | custom character |≤||.

(71) One also considers secret key generation with perfect secrecy.

(72) Definition 4.

(73) The item R≥0 is called an achievable SK generation rate for the source model if for every δ>0 there is an n.sub.0=n.sub.0(δ) such that for all n≥n.sub.0 there is a CR/SK generation protocol such that the relations

(74) $P r (K = \hat{K}) \geq 1 - δ$ $I (K; M) = 0$ $H (K) = \log .Math. 𝒦 .Math.$ $\frac{1}{n} \log .Math. 𝒦 .Math. \geq R - δ .$
are fulfilled. The corresponding CR/SK generation protocols are denoted by perfect SK generation protocols. One refers the supremum of all achievable SK generation rates as the SK capacity C.sub.SK.

(75) In the following result can be proven:

(76) Theorem 2.

(77) It holds that C.sub.SK=I(X;Y).

(78) Remark 3.

(79) In the achievability proof one can use a deterministic encoder and decoder. This implies the relation

(80) $\frac{1}{n}$
log| custom character |≤log||.
Limitations of the Current State of the Art

(81) In the description of the section describing the state of the art, the storage of exponential data amount according to the Shannon picture as used today has been described. In Shannon's picture of storage, big data is actually a huge problem. The gap between the data rate generated by big data and what Moor's law provides for the development of storage media continues to diverge. For storage for later identification, this problem does not exist.

(82) One considers the source model for generating common randomness. But in contrast to Ahlswede and Csiszar 1998 one may also consider privacy leakage of the corresponding protocols while interpreting the source as a biometric source. One can then use common randomness for identification.

(83) Therefore, the invention's contribution is inter alia twofold in the following sentence.

(84) The capacity for common randomness generation is characterized from a discrete memoryless source while considering privacy leakage. Protocols for identification using a discrete memoryless source are constructed. In contrast to Ahlswede and Csiszar 1998 and Ahlswede and Balakirsky 1995 it is assumed in the context of the present invention that a helper message is stored on a public database.

(85) The protocols for identification are constructed such that they provide secrecy. So these protocols allow for secure storage for identification. The present invention may also consider the privacy leakage of these protocols.

(86) Aspects of the Present Invention

(87) The present invention is inter alia based on the presentation of a model for secure storage for identification and corresponding protocols.

(88) In the following, an information theoretic model of the storage process for identification underlying the present invention is defined.

(89) Definition 5.

(90) Let k, n∈ custom character . The storage for identification model consists of the alphabet , a discrete memoryless multiple source (DMMS) P.sub.XY on the alphabet X×, a set of (possibly randomized) encoders {, Φ.sub.d:.sup.n.fwdarw..sup.k and a set of (possibly randomized) decoders {, Ψ.sub.d: .sup.k×.sup.n.fwdarw.{0,1} or all d∈ custom character for. Let X.sup.n and Y.sup.n be the random variables (RVs) generated from P.sub.XY. We call ({, {) a storage for identification protocol.

(91) Assume that for each storage cell we read B>0 symbols from the PUF source. Now properties of intuitively good storage for identification protocols are discussed.

(92) When the decoder Ψ.sub.d is interested in the message d it is reasonable to require that when d is stored on the database the decoder Ψ.sub.d decides correctly with high probability. One refers to the corresponding error as an error of the first kind. So the probability that the decoder makes an error of the first kind should be small.

(93) When the message stored on the database is not d the decoder Ψ.sub.d should also decide correctly with high probability. We call the corresponding error an error of the second kind. So the probability that the decoder Ψ.sub.d makes an error of the second kind should be small.

(94) One is interested in the largest possible identification rate, where one considers the number of storage cells as a resource. As usual for identification one considers the second order rate.

(95) One considers an eavesdropper 20 who reads from the public database 10. It is assumed that the eavesdropper 20 wants to identify a specific message. The eavesdropper 20 knows the protocol used and one can even assume that the eavesdropper 20 knows the message the decoder wants to identify. It is desired that the sum of the probability that the eavesdropper makes an error of the first kind and the probability that the eavesdropper makes an error of the second kind is close to one.

(96) The output of the PUF source uniquely characterizes a device, so one possibly wants to reuse parts of it. That is why one wants that the attacker does not have a lot of information about the PUF source output X.sup.n.

(97) This motivates the following definition of achievability for the storage for identification model.

(98) Definition 6.

(99) Let B>0. The tuple (R.sub.ID,R.sub.PL) R.sub.ID,R.sub.PL≥0 is called an achievable rate pair for the storage for identification model if for every δ>0 there is a k.sub.0=k.sub.0(δ) such that for all k≥k.sub.0 and n=┌B.Math.k┐ there exists a storage for identification protocol such that for all d, d∈ custom character , d≠d the following relations

(100) 0 $P r (Ψ_{d} (Φ_{d} (X^{n}), Y^{n}) = 0) \leq δ$ $P r (Ψ_{d} (Φ_{\overline{d}} (X^{n}), Y^{n}) = 1) \leq δ$ $P r (Ψ_{d}^{E} (Φ_{d} (X^{n})) = 0) + P r (Ψ_{d}^{E} (Φ_{\overline{d}} (X^{n})) = 1) \geq 1 - δ$ $\frac{1}{k} \log \log .Math. 𝒟 .Math. \geq R_{I D} - δ \frac{1}{k} I (Φ_{d} (X^{n}); X^{n}) \leq R_{P L} + δ$
are fulfilled for all decoding strategies { custom character of an eavesdropper 20.

(101) The first item describes decoder errors of the first kind, the second item describes decoder errors of the second kind, the third item describes the property of the model in view of an eavesdropper 20, the fourth item describes the increase of manageable storage items in the model with its double exponential growth or increase, the fifth item describes the model's privacy leakage properties.

(102) The corresponding storage for identification protocols are referred to as secure storage protocols. We call the set of all rate pairs that are achievable using such storage for identification protocols capacity region custom character .sub.ID(B).

(103) Remark 4.

(104) Requirement (3)—the third item as given above in definition 6—ensures that the protocols are optimal considering security in the following sense. There are decoding strategies for the eavesdropper such that the sum of the probability that the eavesdropper makes an error of the first kind and the probability that he makes an error of the second kind is 1, while the eavesdropper does not use any of his observations from the public database.

(105) Remark 5.

(106) The secret model chosen from Ignatenko and Willems 2012 can be interpreted as a model for secure storage making use of a biometric source. But here the decoder reconstructs the message stored on the database instead of identifying it. Correspondingly, the set of messages that can be stored on the database grows exponentially with the block length, instead of doubly exponentially.

(107) The following observation concerning the capacity region custom character can be derived:

(108) Lemma 1.

(109) Let B>0. custom character .sub.ID(B) is a closed set.

(110) One may use Theorem 1 obtained from Ahlswede and Csiszar 1998 in order to characterize custom character .sub.CR.sup.d.

(111) Theorem 3.

(112) It holds that

(113) $ℛ_{CR}^{d} = \underset{\begin{matrix} V : \\ V - X - Y \end{matrix}}{.Math.} {(R_{CR}, R_{FC}, R_{P L}) : 0 \leq R_{C R} \leq I (V; X) R_{FC} \geq I (V; X | Y) R_{P L} \geq I (V; X | Y)}$
and one only has to consider random variables V fulfilling | custom character |≤||+1.

(114) Now one considers CR generation with randomized private CR generation protocols.

(115) Theorem 4.

(116) It holds that

(117) $ℛ_{CR} = \underset{\begin{matrix} V : \\ V - X - Y \end{matrix}}{.Math.} {(R_{CR}, R_{FC}, R_{P L}) : 0 \leq R_{C R} \leq R_{F C} + I (V; Y) R_{F C} \geq I (V; X | Y) R_{P L} \geq I (V; X | Y)}$
and one only has to consider random variables V fulfilling | custom character |≤||+1.

(118) Now custom character .sub.ID(B) is characterized. In order to do so one makes use of results for CR and SK generation while considering the privacy leakage. Firstly consider deterministic secure storage for identification protocols ({, {). One denotes the corresponding capacity region by .sub.ID.sup.d(B) and obtains the following achievability result.

(119) Theorem 5.

(120) It holds that

(121) $ℛ_{ID}^{d} (B) \supseteq \underset{V}{.Math.} {(R_{ID}, R_{P L}) : 0 \leq R_{ID} \leq I (V; X) B R_{PL} \geq I (V; X | Y) B},$
wherein the union is taken over all random variables V fulfilling V−X−Y and I(V;X|Y)B≤log| custom character |.

(122) Now randomized secure storage for identification protocols are considered.

(123) Theorem 6.

(124) It holds that

(125) $ℛ_{I D} (B) \supseteq \underset{ϵ > 0}{.Math.} \underset{V}{.Math.} {(R_{ID}, R_{P L}) : 0 \leq R_{ID} \leq \log .Math. 𝒰 .Math. + I (V; Y) B R_{P L} \geq I (V; X | Y) B},$
wherein the union is taken over all random variables V such that V−X−Y and I(V;X|Y)B≤log| custom character |−ϵB.

(126) In the following, an information theoretic model of a storage process for identification with two sources 30, 40 is defined.

(127) Definition 7.

(128) Let k, n∈ custom character , a finite set and 1≥α≥0. The two source storage for identification model consists of the alphabet , two discrete memoryless multiple sources (DMMSs) P.sub.X.sub.1.sub.Y.sub.1 and P.sub.X.sub.2.sub.Y.sub.2 on the alphabets .sub.1×.sub.1 and .sub.2×.sub.2 respectively, a set of (possibly randomized) encoders { custom character , Φ.sub.d:.sub.1.sup.┌αn┐×z,900 .sub.2.sup.n−┌αn┐.fwdarw..sup.k for all d∈ and a set of (possibly randomized) decoders {, Φ.sub.d: .sup.k×.sub.1.sup.┌αn┐×.sub.2.sup.n−┌αn┐.fwdarw.{0,1} for all d∈. Let X.sub.1.sup.┌αn┐ and Y.sub.1.sup.┌αn┐ be the random variables (RVs) generated from P.sub.X.sub.1.sub.Y.sub.1 and let X.sub.2.sup.n−┌αn┐ and Y.sub.2.sup.n−┌αn┐ be the RVs generated from P.sub.X.sub.2.sub.Y.sub.2. One defines X.sup.n=(X.sub.1.sup.┌αn┐,X.sub.2.sup.n−┌αn┐) and Y.sup.n=(Y.sub.1.sup.┌αn┐,Y.sub.2.sup.n−┌αn┐). The entity ({ custom character , {, α) is referred to as a two source storage for identification protocol. Now properties that storage for identification protocols should have so that they are considered good storage for identification protocols intuitively are discussed in the following.

(129) It is reasonable to require a small probability that an error of the first kind occurs when using the decoder Ψ.sub.d for the message or storage item d to find out whether or not the message or storage item d is stored on the database 10. One also desires that an error of the second kind occurs with a small probability. One considers an eavesdropper 20 who reads from the public database 10 and who wants to find out whether or not message d is stored on the database 10. The eavesdropper 20 also has access to the public source 30. It is desired that the sum of the probability that the eavesdropper 20 makes an error of the first kind and the probability that the eavesdropper 20 makes an error of the second kind is close to 1.

(130) One is interested in the largest possible identification rate, where one considers the number of storage cells as a resource. One considers a fixed ratio B of the number of symbols read from the two sources and the number of storage cells in the database 10.

(131) The output of the PUF source uniquely characterizes a device, so one possibly wants to reuse parts of it. That is why one desires that the attacker 20 does not have a lot of information about the PUF source output X.sub.2.sup.n−┌αn┐.

(132) This motivates the following definition of achievability for the storage for identification model.

(133) Definition 8.

(134) Let B>0. We call the tuple (R.sub.ID,R.sub.PL) R.sub.ID,R.sub.PL≥0 an achievable rate pair for the storage for identification model if for every δ>0 there is a k.sub.0=k.sub.0(δ) such that for all k≥k.sub.0 and n=┌B.Math.k┐ there exists a storage for identification protocol such that for all d, d∈ custom character , d≠d the following relations

(135) $\Pr (Ψ_{d} (Φ_{d} (X^{n}), Y^{n}) = 0) \leq δ$ $\Pr (Ψ_{d} (Φ_{\overline{d}} (X^{n}), Y^{n}) = 1) \leq δ$ $P r (Ψ_{d}^{E} (Φ_{d} (X^{n}), X_{1}^{.Math. α n .Math.}, Y_{1}^{.Math. α n .Math.}) = 0) + P r (Ψ_{d}^{E} (Φ_{\overline{d}} (X^{n}), X_{1}^{.Math. αn .Math.}, Y_{1}^{.Math. α n .Math.}) = 1) \geq 1 - δ$ $\frac{1}{k} \log .Math. 𝒟 .Math. \geq R_{I D} - δ$ $\frac{1}{k} I (Φ_{d} (X^{n}) X_{1}^{.Math. αn .Math.}, Y_{1}^{.Math. α n .Math.}; X_{2}^{n - .Math. αn .Math.}) \leq R_{P L} + δ$
are fulfilled for all decoder strategies { custom character of an eavesdropper 20. The set of all rate pairs that are achievable using such storage for identification protocols is referred to as the capacity region .sub.ID,2(B).

(136) The considerations on CR generation may be extended by adding a second source.

(137) Definition 9.

(138) Let n∈ custom character and let 1≥α≥0. A two source model consists of two discrete memoryless multiple sources (DMMSs) P.sub.X.sub.1.sub.Y.sub.1 and P.sub.X.sub.2.sub.Y.sub.2 on the alphabets .sub.1×.sub.1 and .sub.2×.sub.2 respectively, a (possibly randomized) encoder F:.sup.n.fwdarw.× and a (possibly randomized) decoder G: custom character .sup.n×.fwdarw.:Let X.sub.1.sup.┌αn┐ and Y.sub.1.sup.┌αn┐ be the random variables (RVs) generated from P.sub.X.sub.1.sub.Y.sub.1 and X.sub.2.sup.n−┌αn┐ and let Y.sub.2.sup.n−┌αn┐ be the RVs generated from P.sub.X.sub.2.sub.Y.sub.2. One defines X.sup.n=(X.sub.1.sup.┌αn┐,X.sub.2.sup.n−┌αn┐) and Y.sup.n=(Y.sub.1.sup.┌αn┐,Y.sub.2.sup.n−┌αn┐). The RVs (K,M) are generated from X.sup.n using F and the RV {circumflex over (K)} generated from (Y.sup.n,M) using G. One can call (F,G,α) a two source CR generation protocol.

(139) Inspired by the discussion on the achievability for the source model and the storage for identification model one can define achievability for the two source model.

(140) Definition 10.

(141) The triple (R.sub.CR,R.sub.FC,R.sub.PL), R.sub.CR,R.sub.FC,R.sub.PL≥0 is referred to as an achievable CR generation rate versus forward communication rate versus privacy leakage rate pair for the two source model if for every δ>0 there is an n.sub.0=n.sub.0(δ) such that for all n≥n.sub.0 there is a CR generation protocol such that the relations

(142) $P r (K = \hat{K}) \geq 1 - δ$ $\frac{1}{n} \log .Math. ℳ .Math. \leq R_{F C}$ $\frac{1}{n} I (M, X_{1}^{.Math. α n .Math.}, Y_{1}^{.Math. α n .Math.}; X_{2}^{n - .Math. α n .Math.}) \leq R_{P L} + δ$ $\frac{1}{n} H (K) \geq R_{C R} - δ$ $\underset{k \in 𝒦}{.Math.} .Math. P_{K} (k) - \frac{1}{.Math. 𝒦 .Math.} .Math. < \exp (- n c),$
are fulfilled for a c>0. The set of all rate triples that are achievable using such CR generation protocols is referred to as the CR capacity region custom character .sub.CR′.

(143) In addition to the foregoing description of the present invention, for an additional disclosure explicit reference is taken to graphic representation of FIGS. 1 to 4.

LIST OF REFERENCES

(144) [1] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379 to 423, 1948 [2] R. Ahlswede and G. Dueck, “Identification via channels,” IEEE Transactions on Information Theory, vol. 35, no. 1, pp. 15 to 29, 1989 [3] T. S. Han and S. Verdu, “New results in the theory of identification via channels”, IEEE Transactions on Information Theory, vol. 38, no. 1, pp. 14 to 25, 1992 [4] R. Ahlswede and I. Csiszar [4], it is described by R. Ahlswede and V. B. Balakirsky, “Identification under random processes”, Citeseer, 1995 [5] R. Ahlswede and V. B. Balakirsky, “Identification under random processes”, Citeseer, 1995 [6] A. D. Wyner, “The wire-tap channel”, Bell Labs Technical Journal, vol. 54, no. 8, pp. 1355 to 1387, 1975 [7] I. Csiszar and J. Korner, “Broadcast channels with confidential messages”, IEEE transactions on information theory, vol. 24, no. 3, pp. 339-348, 1978 [8] R. Ahlswede and Z. Zhang, “New directions in the theory of identification via channels”, IEEE transactions on information theory, vol. 41, no. 4, pp. 1040 to 1050, 1995 [9] H. Boche and C. Deppe, “Secure identification for wiretap channels; robustness, super-additivity and continuity”, IEEE Transactions on Information Forensics and Security, 2018 [10] H. Boche and C. Deppe, “Secure identification under jamming attacks”, in Information Forensics and Security (WIFS), 2017 IEEE Workshop on. IEEE, 2017, pp. 1 to 68 [11] R. Ahlswede and I. Csiszar, “Common randomness in information theory and cryptography—Part i: secret sharing”, IEEE Transactions on Information Theory, vol. 39, no. 4, 1993 [12] T. Ignatenko and F. M. Willems, “Biometric security from an information theoretical perspective”, Now, 2012 [13] L. Lai, S.-W. Ho, and H. V. Poor, “Privacy security trade offs in biometric security systems”, in Communication, Control, and Computing, 2008, 46th Annual Allerton Conference on IEEE, 2008, pp. 268 to 273 [14] I. Csiszar and J. Korner, “Information theory: coding theorems for discrete memoryless systems”, Cambridge University Press, 2011 [15] S. Verdu and V. K. Wei, “Explicit Construction of Optimal Constant-Weight Codes for Identification Via Channels”, IEEE Transactions on Information Theory, Vol. 39, No. 1, 1993

LIST OF REFERENCE SIGNS AND ABBREVIATIONS

(145) 1 system, storage system 10 storage means, public data base 20 eavesdropper 30 public source 40 PUF source B (incoming) storage item, bit sequence B′ (randomized encoded) storage item CR common randomness custom character set of messages d message, storage item (from the set of messages) written/stored in the storage means 10 d message, storage item (from the set of messages) to be identified DMC discrete memoryless channel DMMS discrete memoryless multiple source K common randomness K secret key M helper data M helper message PUF physical unclonable function P.sub.XY discrete memoryless multiple source P.sub.X.sub.1.sub.Y.sub.1 discrete memoryless multiple source P.sub.X.sub.2.sub.Y.sub.2 discrete memoryless multiple source S operating method S1 providing storage item S2 encoding of storage item (based on randomization process) S3 writing encoded storage item S4 1.sup.st randomization process (encoder site) S4′ 1.sup.st randomization process (decoder site) S5 2.sup.nd randomization process (encoder site) S5′ 2.sup.nd randomization process (decoder site) S6 decoding storage item for identification (based on randomization process) S7 identification process S7′ control process (applied to process of outputting identification message S8) S8 outputting identification message SK secret key T.sub.d mapping being characteristic for an identification protocol U.sup.k encoded storage item/message (in particular written/stored to storage means 10) custom character alphabet X.sup.n source item, random variable X.sub.1.sup.┌αn┐ source item, random variable X.sub.2.sup.n−┌αn┐ source item, random variable alphabet × alphabet .sub.1×.sub.1 alphabet .sub.2×.sub.2 alphabet Y.sup.n source item, random variable Y.sub.1.sup.┌αn┐ source item, random variable Y.sub.2.sup.n−┌αn┐ source item, random variable custom character alphabet Φ.sub.d encoder Ψ.sub.d decoder

Method and unit of operating a storage means, storage means and system for data processing

Assignee

Inventors

Cpc classification

Classification Explorer

G06F3/0604

PHYSICS

Classification Explorer

H04L9/0866

ELECTRICITY

Classification Explorer

G06F3/0655

PHYSICS

Classification Explorer

H04L9/14

ELECTRICITY

Classification Explorer

G06F21/78

PHYSICS

Classification Explorer

G06F3/0673

PHYSICS

International classification

Classification Explorer

G06F3/06

PHYSICS

Abstract

Claims

Description