IDENTIFICATION ESTIMATE RISK EVALUATION APPARATUS, IDENTIFICATION ESTIMATE RISK EVALUATION METHOD, AND PROGRAM
20220350924 · 2022-11-03
Assignee
Inventors
Cpc classification
G09C1/00
PHYSICS
International classification
G06F21/62
PHYSICS
Abstract
An identification-estimation risk assessment apparatus includes a first calculation unit for calculating a first value Σ.sub.γPr[Δ(t)=t′∘γ], which is a sum related to all shuffles of a probability Pr[Δ(t)=t′∘γ] that a table Δ(t) obtained by anonymizing the original table t through randomization Δ coincides with a table t′∘γ represented as a composite of a shuffle γ and the anonymized table t′; a second calculation unit for calculating a second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], which is a sum related to a shuffle that satisfies γ(r)=r′ of the probability Pr[Δ(t)=t′∘γ]; and a third calculation unit for calculating a third value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ]/Σ.sub.γPr[Δ(t)=t′∘γ] as a risk assessment value of a risk that a record number r is identified and estimated as corresponding to a record number r′, based on the first value and the second value.
Claims
1. An identification-estimation risk assessment apparatus that calculates a risk assessment value of a risk that a record number r is identified and estimated as corresponding to a record number r′ from an original table t, an anonymized table t′, a record number r∈[N] in the original table t, and a record number r′∈[N] in the anonymized table t′, wherein [N]={1, 2, . . . , N} (here, N is an integer of 1 or more) and a shuffle is a mapping of [N].fwdarw.[N], the apparatus comprising: processing circuitry configured to: execute a first calculation processing for calculating a first value Σ.sub.γPr[Δ(t)=t′∘γ], which is a sum related to all shuffles of a probability Pr[Δ(t)=t′∘γ] that a table Δ(t) obtained by anonymizing the original table t through randomization Δ coincides with a table t′∘γ represented as a composite of a shuffle γ and the anonymized table t′; a second calculation processing for calculating a second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], which is a sum related to a shuffle that satisfies γ(r)=r′ of the probability Pr[Δ(t)=t′∘γ]; and a third calculation processing for calculating, as the risk assessment value, a third value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ]/Σ.sub.γPr[Δ(t)=t′∘γ] based on the first value Σ.sub.γPr[Δ(t)=t′∘γ] and the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ].
2. The identification-estimation risk assessment apparatus according to claim 1, wherein the first calculation processing calculates the first value Σ.sub.γPr[Δ(t)=t′∘γ] using the following equation:
Σ.sub.γPr[Δ(t)=t′∘γ]=Σ.sub.γΠ.sub.i∈[N]a.sub.iγ(i) [Math. 13] (here, a.sub.ij (i∈[N], j∈[N]) represents the probability that a record with a record number i in the original table t becomes a record with a record number j in the anonymized table t′ through the randomization Δ).
3. The identification-estimation risk assessment apparatus according to claim 2, wherein A is a matrix with the probability a.sub.ij as a (i, j)th element, and the second calculation processing calculates the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] through pseudo-cofactor expansion of (r, r′) for the matrix A.
4. The identification-estimation risk assessment apparatus according to claim 1, wherein the randomization Δ is PRAM.
5. An identification-estimation risk assessment method by which an identification-estimation risk assessment apparatus calculates a risk assessment value of a risk that a record number r is identified and estimated as corresponding to a record number r′ from an original table t, an anonymized table t′, a record number r∈[N] in the original table t, and a record number r′∈[N] in the anonymized table t′, wherein [N]={1, 2, . . . , N} (here, N is an integer of 1 or more) and a shuffle is a mapping of [N].fwdarw.[N], the method comprising: a first calculation step in which the identification-estimation risk assessment apparatus calculates a first value Σ.sub.γPr[Δ(t)=t′∘γ], which is a sum related to all shuffles of a probability Pr[Δ(t)=t′∘γ] that a table Δ(t) obtained by anonymizing the original table t through randomization Δ coincides with a table t′∘γ represented as a composite of a shuffle γ and the anonymized table t′; a second calculation step in which the identification-estimation risk assessment apparatus calculates a second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], which is a sum related to a shuffle that satisfies γ(r)=r′ of the probability Pr[Δ(t)=t′∘γ]; and a third calculation step in which the identification-estimation risk assessment apparatus calculates, as the risk assessment value, a third value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ]/Σ.sub.γPr[Δ(t)=t′∘γ] based on the first value Σ.sub.γPr[Δ(t)=t′∘γ] and the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ].
6. A non-transitory computer-readable storage medium storing a program for causing a computer to function as the identification-estimation risk assessment apparatus according to claim 1.
7. A non-transitory computer-readable storage medium storing a program for causing a computer to function as the identification estimate risk evaluation apparatus according to claim 2.
8. A non-transitory computer-readable storage medium storing a program for causing a computer to function as the identification estimate risk evaluation apparatus according to claim 3.
9. A non-transitory computer-readable storage medium storing a program for causing a computer to function as the identification estimate risk evaluation apparatus according claim 4.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
DESCRIPTION OF EMBODIMENTS
[0015] Hereinafter, an embodiment of the present invention will be described in detail. Note that constituent units having the same functions are assigned the same reference numerals, and redundant description is omitted.
[0016] Prior to the description of the embodiment, a notation method in this specification will be described.
[0017] _ (underscore) denotes a subscript. For example, “x.sup.y_z” represents that y.sub.z is a superscript for x, and “x.sub.y_z” represents that y.sub.z is a subscript for x.
[0018] Superscripts “{circumflex over ( )}” and “˜” for a certain letter x, as in “{circumflex over ( )}x” and “˜x”, should originally be written directly above “x”, but are written as “{circumflex over ( )}x” and “˜x” due to the restrictions of the descriptive notation of the specification.
TECHNICAL BACKGROUND
Definitions
[0019] For sets A and B, a set of all mappings from. A to B is denoted as A.Math.B. For a positive integer N, [N]={1, . . . , N}.
[Table]
[0020] R is a set of individuals. Also, N=|R| (here, N is an integer of 1 or more). That is to say, N represents the cardinality of the set R.
[0021] Labels 1, . . . , N are assigned to respective individuals who are the elements of the set R. A is a set of all attributes, V.sub.a is a set of attribute values of an attribute a for each a∈A, and V=⊂.sub.a∈AV.sub.a. At this time, T=[N].Math.V, and each t∈T is called a table. For a table t, r∈[N] is called a record number of the table t.
[0022]
[Table Protection]
[0023] For table sets T and T′, a mapping Δ:T.fwdarw.T′ is to output a table t′∈T′ in response to an input table t∈T in accordance with a probability distribution corresponding to t, and this mapping Δ is referred to as randomization.
[Shuffle]
[0024] A mapping γ: [N].fwdarw.[N] is called a shuffle. A set of all shuffles γ is Γ. That is to say, Γ is a set of all patterns of shuffles.
[PRAM]
[0025] PRAM is an anonymization processing method that satisfies Pk-anonymity. PRAM protects privacy by probabilistically replacing an attribute value of each record in a table based on a matrix called a transition probability matrix. Here, a transition probability matrix refers to a |V.sub.a|×|V.sub.a| matrix P.sub.a that has, as an element, a probability Pr(v′.sub.a|v.sub.a) (which is called a transition probability) that an attribute value v.sub.a∈V.sub.a of a certain attribute a∈A is replaced with an attribute value v′.sub.a∈V.sub.a.
[0026] As a method for setting a transition probability, a method is conceivable in which a value is retained with a certain probability ρ.sub.a, and the value is randomly rewritten with a probability of 1−ρ.sub.a, as expressed by the following equation.
[0027] Accordingly, the transition probability matrix P.sub.a is represented by the following equation.
<<Risk Assessment>>
[0028] Here, a description will be given of an identification-estimation probability used in the risk assessment in the embodiment of the present invention.
[0029] When tΣT, t′ΣT′, r, r′∈[N], and t∈T are tables that an attacker has as background knowledge, an identification-estimation probability n (t, t′, r, r′) is defined as the following equation.
[Math. 3]
η(t,t′,r,r′)=Pr[Γ(r)=r′|Δ(t)=t′∘Γ] (1)
[0030] Here, t is an original table, t′ is an anonymized table, r is a record number in the original table t, and r′ is a record number of the anonymized table t′.
[0031] When an attacker who has a table t as background knowledge is assumed, the identification-estimation probability η(t, t′, r, r′) represents the probability that the record number r in the original table t is identified and estimated as corresponding to the record number r′ in the anonymized table t′.
[0032] A defining equation (1) of the identification-estimation probability n (t, t′, r, r′) includes a probability variable Γ. Thus, if the equation (1) is deformed using an instance γΣΓ, it results in the following equation.
[0033] Accordingly, the probability that the record number r is identified and estimated as corresponding to the record number r′ can be obtained by calculating the denominator and the numerator of the equation (2).
(Method for Calculating Denominator of Equation (2))
[0034] When the probability that a record with a record number i in the original table t becomes a record with a record number j in the anonymized table t′ through randomization Δ is a.sub.ij (i∈[N], j∈[N]), Σ.sub.γPr[Δ(t)=t′∘γ] is calculated by the following equation.
[Math. 5]
Σ.sub.γPr[Δ(t)=t′∘γ]=Σ.sub.γΠ.sub.i∈[N]a.sub.iγ(i) (3)
[0035] If a matrix with a.sub.ij as a (i, j)th element is A, the right side of the equation (3) coincides with an equation called a permanent of the matrix A.
[0036] Note that, for a permanent of an N×N matrix, a method of calculation with a complexity O(2.sup.N) (Ryser's Algorithm) is known.
(Method for Calculating Numerator of Equation (2))
[0037] Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] can be calculated by performing an operation similar to cofactor expansion of a matrix (hereafter, this operation is called pseudo-cofactor expansion). Pseudo-cofactor expansion of (i, j) for the matrix A is defined as the product of the (i, j)th element of A and the permanent of a (N−1)×(N−1) matrix excluding the ith rows and the jth column.
[0038] Accordingly, Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] can be calculated as the pseudo-cofactor expansion of (r, r′) for the matrix A, i.e., the product of the (r, r′) element of A and the permanent of the (N−1)×(N−1) matrix excluding the rth row and the r′th column.
Specific Example
[0039] Methods for calculating the identification-estimation probability will be described in detail below. First, the premise common to the calculation methods will be described. Here, the identification-estimation probability in the case of using PRAM as randomization is calculated.
[0040]
[0041] The transition probability matrix P.sub.attr1 and the transition probability matrix P.sub.attr2 represent that the probability that a certain attribute value takes the same value as a result of randomization is 0.8, and the probability that this attribute value takes other values is 0.1, respectively.
[0042] Based on the above premise, methods for calculating the identification-estimation probability η(t, t′, 1, 1) will be described.
(Calculation Method 1)
[0043] First, consideration is given to the case where there has been no change in the arrangement due to the shuffle κ. If there has been no change in the arrangement due to the shuffle κ, Pr[Δ(t)=t′] is the product of the probability that an element in the original table t transitions to a corresponding element in the anonymized table t′, i.e., Pr[Δ(t)=t′]=(the probability that a.fwdarw.a)×(the probability that A.fwdarw.C)×(the probability that b.fwdarw.b)×(the probability that B.fwdarw.B)×(the probability that c.fwdarw.b)×(the probability that C.fwdarw.A). Accordingly, Pr[Δ(t)=t′] is calculated by the following equation.
[0044] The probability Pr[Δ(t)=t′∘γ] is similarly calculated for all patterns of shuffles.
[0045] Then, the denominator of the equation (2) can be obtained by taking the sum of the probability Pr[Δ(t)=t′∘γ]. Meanwhile, for the numerator of the equation (2), the sum of the probability Pr[Δ(t)=t′∘γ] for the shuffle that satisfies γ(1)=1′, of all patterns of shuffles, is taken.
[0046] As understood from the above, the probability calculation needs to be performed for all patterns of shuffles in the calculation method 1, and therefore the complexity is O(N!).
(Calculation Method 2)
[0047] Here, a description will be given of a method using a matrix permanent and pseudo-cofactor expansion.
[0048] First, calculation of Σ.sub.γPr[Δ(t)=t′∘γ], which is the denominator of the equation (2), will be described. Initially, the matrix A is obtained. For example, the probability that a record with a record number 1 in the original table t transitions to a record with a record number 1 in the anonymized table t′ is the product of 0.8, which is the probability that a.fwdarw.a, and 0.1, which is the probability that A.fwdarw.C, i.e., 0.08. The matrix A is obtained by subsequently performing calculation similarly.
[0049] Next, the permanent perm (A) of the matrix A is calculated. If calculation is performed in accordance with the defining equation of the permanent, the following result is obtained.
[0050] Although perm(A) is calculated here using a primitive method according to the defining equation of permanent, perm(A) can be efficiently calculated by using Ryser's Algorithm, as mentioned above.
[0051] Meanwhile, for Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], which is the numerator of the equation (2), pseudo-cofactor expansion of the matrix is used. In the case of calculating the identification-estimation probability η(t, t′, 1, 1), using pseudo-cofactor expansion of the matrix A while assuming that i=1, j=1,
[0052] is the numerator of the equation (2).
[0053] From the above, η(t, t′, 1, 1)=0.000576/0.004745=0.12139.
[0054] For reference, if Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] is calculated for all combinations of r∈[3] and r′∈[3], and a matrix is created in which the results of r′=1, 2, 3 for r=1 are arrayed in the first row, the results of r′=1, 2, 3 for r=2 are arrayed in the second row, and the results of r′=1, 2, 3 for r=3 are arrayed in the third row,
[0055] is obtained.
[0056] In the calculation method 2, the complexity is O(2.sup.N) due to using Ryser's Algorithm.
First Embodiment
[0057] The identification-estimation risk assessment apparatus 100 will be described below with reference to
[0058] The identification-estimation risk assessment apparatus 100 calculates a risk assessment value of the risk that the record number r is identified and estimated as corresponding to the record number r′ from the original table t, the anonymized table t′, a record number r∈[N] in the original table t, and a record number r′∈[N] in the anonymized table t′.
[0059] An operation of the identification-estimation risk assessment apparatus 100 will be described in accordance with
[0060] In S110, the first calculation unit 110 receives input of the original table t and the anonymized table t′. The first calculation unit 110 calculates, based on the original table t and the anonymized table t′, a first value Σ.sub.γPr[Δ(t)=t′∘γ], which is the sum related to all shuffles of the probability Pr[Δ(t)=t′∘γ] that a table Δ(t) obtained by anonymizing the original table t through the randomization Δ coincides with a table t′∘γ represented as a composite of the shuffle γ and the anonymized table t′, and outputs the result. PRAM can be used in the randomization Δ, for example. In this case, the probability Pr[Δ(t)=t′∘γ] can be calculated using the transition probability.
[0061] The first calculation unit 110 may alternatively calculate the first value Σ.sub.γPr[Δ(t)=t′∘γ] using the following equation.
Σ.sub.γPr[Δ(t)=t′∘γ]=Σ.sub.γΠ.sub.i∈[N]a.sub.iγ(i) [Math. 12]
[0062] Here, a.sub.ij (i∈[N], j∈[N]) represents the probability that the record with the record number i in the original table t becomes the record with the record number j in the anonymized table t′ through the randomization Δ.
[0063] In S120, the second calculation unit 120 receives input of the original table t, the anonymized table t′, the record number r in the original table t, and the record number r′ in the anonymized table t′. The second calculation unit 120 calculates a second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], which is the sum related to the shuffle that satisfies γ(r)=r′ of the probability Pr[Δ(t)=t′∘γ], based on the original table t, the anonymized table t′, the record number r in the original table t, and the record number r′ in the anonymized table t′, and outputs the result.
[0064] Wherein A is a matrix with the probability a.sub.ij that the ith record in the original table t becomes the jth record in the anonymized table t′ as a result of the randomization Δ as the (i, j)th element, the second calculation unit 120 may calculate the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] through pseudo-cofactor expansion of (r, r′) for the matrix Δ.
[0065] In S130, the third calculation unit 130 receives input of the first value Σ.sub.γPr[Δ(t)=t′∘γ] calculated in S110 and the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ] calculated in S120, calculates, as a risk assessment value, a third value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ]/Σ.sub.γPr[Δ(t)=t′∘γ] based on the first value Σ.sub.γPr[Δ(t)=t′∘γ] and the second value Σ.sub.γ(r)=r′Pr[Δ(t)=t′∘γ], and outputs the result.
[0066] According to the embodiment of the present invention, it is possible to assess the risk that a record in a randomized table is identified and estimated in the case where knowledge that an attacker has about the table is fixed.
<Supplementary Note>
[0067]
[0068] For example, the apparatus of the present invention has, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display can be connected, a communication unit to which a communication device (e.g., a communication cable) capable of communicating with a device outside the hardware entity can be connected, a CPU (Central Processing Unit, which may also include a cache memory, a register, or the like), a RAM and a ROM, which are memories, an external storage device, which is a hard disk, and a bus that connects the input unit, the output unit, the communication unit, the CPU, the RAM, the ROM, and the external storage device to each other such that data can be exchanged therebetween. Further, the hardware entity may also be provided with a device (drive) or the like that can write and read data to and from a recording medium such as a CD-ROM, as necessary. A physical entity equipped with those hardware resources is, for example, a general-purpose computer.
[0069] A program required to realize the above-described functions, data required for processing of this program, or the like is stored in the external storage device of the hardware entity (not limited to the external storage device, for example, the program may alternatively be stored in a ROM, which is a read-only storage device). Data or the like that is obtained as a result of processing of the program is stored in the RAM, the external storage device, or the like, as appropriate.
[0070] In the hardware entity, each program stored in the external storage device (or the ROM etc.) and data required for processing of the program are loaded to the memory as needed, and is interpreted, executed, and processed by the CPU, as appropriate. As a result, the CPU realizes predetermined functions (the above-described functional units represented as “ . . . unit”, “ . . . means”, etc.).
[0071] The present invention is not limited to the above embodiment, and may be modified, as appropriate, without departing from the gist of the present invention. The processing described in the above embodiment may not only executed in time series in the above-described order, but also executed in parallel or separately, in accordance with the processing capacity of the device that executes the processing, or as needed.
[0072] As already described, when the processing functions of the hardware entity (the apparatus of the present invention) described in the above embodiment are realized by a computer, the processing content of the functions that the hardware entity should have are described by the program. By executing this program on the computer, the processing functions of the above-described hardware entity are realized on the computer.
[0073] The program that describes this processing content can be recorded in a computer-readable recording medium. The computer-readable recording medium may be of any kind, e.g., a magnetic recording device, an optical disk, a magneto-optical recording medium, a semiconductor memory, or the like. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like can be used as a magnetic recording device. A DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only), a CD-R (Recordable)/RW (ReWritable), or the like can be used as an optical disk. An MO (Magneto-Optical disc) or the like can be used as a magneto-optical recording medium. An EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used as a semiconductor memory.
[0074] This program is distributed by, for example, selling, transferring, or lending a portable recording medium, such as a DVD or a CD-ROM, in which the program is recorded. Furthermore, a configuration may also be employed in which this program is stored in a storage device in a server computer, and is distributed by transferring the program from the server computer to other computers via a network.
[0075] For example, first, a computer that executes this program temporarily stores the program recorded in the portable recording medium or the program transferred from the server computer in a storage device of this computer. When executing the processing, the computer loads the program stored in its own storage device, and executes the processing in accordance with the loaded program. As another mode of executing this program, the computer may directly load the program from the portable recording medium and execute the processing in accordance with the program, or may sequentially execute the processing in accordance with a received program every time the program is transferred from the server computer to this computer. A configuration is also possible in which the above-described processing is performed through a so-called ASP (Application Service Provider)-type service that realizes processing functions only by giving instructions to execute the program and acquiring the results, without transferring the program to the computer. Note that the program in this mode may include information for use in processing performed by an electronic computer that is equivalent to a program (e.g., data that is not a direct instruction to the computer but has properties that define computer processing.
[0076] In this mode, the hardware entity is constituted by executing a predetermined program on a computer, but at least a part of the processing content may alternatively be realized by hardware.
[0077] The above description of the embodiment of the present invention has been presented for the purposes of illustration and description. There is no intention to be exhaustive, nor is there any intention to limit the invention to the exact form disclosed. Modifications and variations are possible from the above teachings. The embodiments have been chosen and expressed to provide the best illustration of the principles of the invention and to enable those skilled in the field to utilize the invention in a variety of embodiments and with various additional modifications to make the invention suitable for contemplated and actual use. All such modifications and variations are within the scope of the invention as set forth by the appended claims construed according to the fairly and legally given breadth.