INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, AND INFORMATION PROCESSING METHOD, AND PROGRAM
20180366227 ยท 2018-12-20
Inventors
Cpc classification
G09C1/00
PHYSICS
H04L2209/46
ELECTRICITY
G06F21/62
PHYSICS
G16H50/20
PHYSICS
G06F17/18
PHYSICS
International classification
G16H50/20
PHYSICS
G06F17/18
PHYSICS
Abstract
To achieve high-speed and efficient parameter calculation processing of a logistic regression model. A logistic regression parameter is calculated, the logistic regression parameter being a parameter of the logistic regression model indicating the relationship between an explanatory variable and an outcome variable being secure data corresponding to each sample. A data processing unit calculates the inner product (t_s) of the explanatory variable and the outcome variable with application of secure computation being computation processing applied with converted data of each of the variables, and performs computation processing excluding the calculation processing of the inner product, as computation processing without the converted data, to calculate the logistic regression parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method).
Claims
1. An information processing device comprising: a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample, wherein the data processing unit calculates an inner product (t_s) of the first variable and the second variable with application of secure computation being computation processing applied with converted data of each of the variables, and performs computation processing excluding the calculation processing of the inner product, as computation processing without the converted data, to calculate the logistic regression parameter.
2. The information processing device according to claim 1, wherein the data processing unit calculates the logistic regression parameter in accordance with a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
3. The information processing device according to claim 1, wherein the first variable is an explanatory variable, and the second variable is an outcome variable.
4. The information processing device according to claim 3, wherein the data processing unit performs the calculation processing of the inner product (t_s) of the explanatory variable and the outcome variable with the secure computation applied with segmented data of the explanatory variable and segmented data of the outcome variable.
5. The information processing device according to claim 3, wherein the information processing device is a retaining device of the explanatory variable, and the data processing unit performs the computation processing excluding the calculation processing of the inner product, applied with the explanatory variable, as computation processing applied with the explanatory variable remaining intact, without the application of the secure computation, in the calculation processing of the logistic regression parameter based on a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
6. The information processing device according to claim 3, wherein the information processing device is a retaining device of the explanatory variable, and the data processing unit receives a computed result applied with the outcome variable from an outcome-variable retaining device, and calculates the logistic regression parameter with the computed result applied with the received outcome variable.
7. The information processing device according to claim 6, wherein the computed result applied with the outcome variable is a sum total (t_0) of the outcome variable.
8. The information processing device according to claim 3, wherein the information processing device is a retaining device of the explanatory variable, and the data processing unit outputs the logistic regression parameter calculated to an outcome-variable retaining device.
9. An information processing system comprising: an explanatory-variable retaining device retaining an explanatory variable being secure data associated with each sample; and an outcome-variable retaining device retaining an outcome variable being secure data associated with each sample, wherein the outcome-variable retaining device calculates and outputs a sum total (t_0) of the outcome variable associated with each sample to the explanatory-variable retaining device, the explanatory-variable retaining device includes a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship with the outcome variable, and the data processing unit calculates an inner product (t_s) of the explanatory variable and the outcome variable, with application of secure computation being computation processing applied with converted data of each of the variables, and calculates the logistic regression parameter with application of the inner product (t_s) calculated and the sum total (t_0) of the outcome variable input from the outcome-variable retaining device.
10. The information processing system according to claim 9, wherein the data processing unit calculates the logistic regression parameter in accordance with a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
11. The information processing system according to claim 9, wherein the data processing unit performs the calculation processing of the inner product (t_s) of the explanatory variable and the outcome variable, with the secure computation applied with segmented data of the explanatory variable and segmented data of the outcome variable.
12. The information processing system according to claim 9, wherein the data processing unit performs computation processing excluding the calculation processing of the inner product, applied with the explanatory variable, as computation processing applied with the explanatory variable remaining intact, without the application of the secure computation, in the calculation processing of the logistic regression parameter based on a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
13. The information processing system according to claim 9, wherein the explanatory-variable retaining device outputs the logistic regression parameter calculated to the outcome-variable retaining device.
14. An information processing method to be performed in an information processing device including a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample, the information processing method comprising: calculating, by the data processing unit, an inner product (t_s) of the first variable and the second variable with application of secure computation being computation processing applied with converted data of each of the variables; and calculating the logistic regression parameter with performance of computation processing excluding the calculation processing of the inner product, as computation processing without the converted data.
15. An information processing method to be performed in an information processing system including: an explanatory-variable retaining device retaining an explanatory variable being secure data associated with each sample; and an outcome-variable retaining device retaining an outcome variable being secure data associated with each sample, the information processing method comprising: calculating and outputting, by the outcome-variable retaining device, a sum total (t_0) of the outcome variable associated with each sample, to the explanatory-variable retaining device; and by a data processing unit included in the explanatory-variable retaining device, configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship with the outcome variable, calculating an inner product (t_s) of the explanatory variable and the outcome variable with application of secure computation being computation processing applied with converted data of each of the variables and calculating the logistic regression parameter with application of the inner product (t_s) calculated and the sum total (t_0) of the outcome variable input from the outcome-variable retaining device.
16. A program for causing information processing to be executed in an information processing device including a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample, the program causing the data processing unit to execute: processing of calculating an inner product (t_s) of a first variable and a second variable with application of secure computation being computation processing applied with converted data of each of the variables; and processing of calculating the logistic regression parameter with performance of computation processing excluding the processing of calculating the inner product, as computation processing without the converted data.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
MODE FOR CARRYING OUT THE INVENTION
[0056] An information processing device, an information processing system, and an information processing method, and a program according to the present disclosure will be described in detail below with reference to the drawings. The descriptions will be given in accordance with the following items.
[0057] 1. Outline of Logistic Regression Analysis
[0058] 2. Parameter Estimation Processing with Logistic Regression Analysis
[0059] 3. Estimation Processing of Logistic Regression Parameter with Maximum Likelihood Method
[0060] 4. Estimation Method of Logistic Regression Parameter with Secure Computation
[0061] 5. Estimation Method of Logistic Regression Parameter with Secure Computation Reduced
[0062] 6. Reduction Effect in Computational Complexity of Parameter Calculation Processing according to Present Disclosure
[0063] 7. Exemplary Hardware Configuration of Information Processing Device
[0064] 8. Summary of Configuration of Present Disclosure
[0065] [1. Outline of Logistic Regression Analysis]
[0066] First, an outline of logistic regression analysis will be described.
[0067] The logistic regression analysis has been known as a technique of predicting an outcome variable (y) from an explanatory variable (x).
[0068] Processing with the logistic regression analysis will be described.
[0069]
[0070] A list of an outcome variable (y) and an explanatory variable (x) for a plurality of samples (i) is illustrated. A sample i corresponds to, for example, one user i.
[0071] The outcome variable (y) includes onset or non-onset of disease, for example, hyperlipemia (onset=1, non-onset=0).
[0072] The explanatory variable (x) includes gender (x1), age (x2), and cholesterol level (x3).
[0073] As described above, an organization A (entity A), specifically, for example, the operator of a Web site can acquire the explanatory variables (x1 to x3) for a large number of users (samples (i)), for example, 100 people (i=1 to 100), on the basis of, for example, browsing information from browsing users of the Web site.
[0074] The data generated and acquired by the organization A (entity A) on the basis of, for example, the browsing information from the browsing users of the Web site, is valuable in marketing. However, the data is information including personal information, and thus is undesirable to release. That is, the data is secure data (also referred to as, for example, sensitive data) and thus is to be prevented from leaking out.
[0075] Meanwhile, a different organization B (entity B), for example, a hospital retains the outcome variable (y) for the one hundred users (samples), namely, (y1): onset or non-onset of disease (e.g., hyperlipemia) (onset=1, non-onset=0).
[0076] The data retained by the hospital is also secure data, and thus is to be prevented from leaking out.
[0077] That is, the explanatory variables (x1 to x3) and the outcome variable (y1) illustrated in
[0078] Therefore, there is provided an arrangement in which a third party is not allowed to check the explanatory variables (x1 to x3) and the outcome variable (y1) together, similarly to the organizations A and B.
[0079] In such an arrangement, for example, the retainer of the explanatory variable (x) uses the logistic regression analysis in order to predict the outcome variable (y) from the explanatory variable (x).
[0080] Exemplary specific logistic regression analysis processing will be described.
[0081] As illustrated in
[0082] (x1): gender of user (male=1, female=0),
[0083] (x2): age of user (from 0), and
[0084] (x3): cholesterol level of user (e.g., 150 to 250). In addition, the outcome variable (y) is defined as the one outcome variable (y1):
[0085] (y1): onset or non-onset of disease (e.g., hyperlipemia) (onset=1, non-onset=0).
[0086] As described above, the organization A (entity A), specifically, for example, the operator of the Web site can acquire the explanatory variables (x1 to x3) for a large number of users, for example, 100 people, on the basis of, for example, the browsing information from the browsing users of the Web site.
[0087] However, the outcome variable (y) for the one hundred users, namely, (y1): onset or non-onset of disease (e.g., hyperlipemia) (onset=1, non-onset=0), is the secure data retained by the different organization B (entity B), for example, the hospital.
[0088] Therefore, the organization A (entity A) is not allowed to acquire the outcome variable (y) for the one hundred users.
[0089] Similarly, the retainer of the explanatory variable (x) being the secure data is not allowed to receive the outcome variable (y) from the retainer of the outcome variable (y) being the secure data. However, the retainer of the explanatory variable (x) is allowed to receive data including the outcome variable (y) subjected to cryptographic processing or conversion processing, namely, converted data (concealed data) of the secure data.
[0090] The retainer of the explanatory variable (x) receives the converted data (concealed data) of the outcome variable (y) and then performs various types of arithmetic, so that the outcome variable (y) associated with a predetermined explanatory variable (x) can be estimated.
[0091] One representative technique of the estimation processing is the logistic regression analysis.
[0092] The logistic regression analysis is one type of statistical regression model often used in medical science or social science, and is a data analysis technique for predicting an outcome variable from an explanatory variable.
[0093] In the logistic regression analysis, an expression of calculating the probability p(x) of occurrence of an event is set under a condition including observation values of the explanatory variable (x), such as (x1 to x3) illustrated in
[0094] In the example illustrated in
[0095] Under a condition including the observation values (x1 to xr) of the explanatory variable (x) given, an expression of calculating the probability p(x) of occurrence of an event, is given in (Expression 1) below.
[0096] (Expression 1) above is referred to as a logistic regression model.
[0097] x_1, . . . , x_r represent explanatory variables in (Expression 1) above.
[0098] _0, . . . , _r represent logistic regression parameters. Hereinafter, the logistic regression parameters are simply referred to as parameters.
[0099] Note that, a character subsequent to an underscore (e.g., _0) represents a subscript in the following descriptions.
[0100] _0, . . . , _r represent .sub.0 to .sub.r, respectively.
[0101] Processing of estimating the parameters _0, . . . , _r in (Expression 1) above, is performed in the logistic regression analysis.
[0102] Determination of the parameters _0, . . . , _r enables the probability p(x) of occurrence of the event, to be calculated under the condition including the observation values (x_1, . . . , x_r) of the explanatory variable (x) given, in accordance with (Expression 1) above.
[0103] [2. Parameter Estimation Processing with Logistic Regression Analysis]
[0104] Next, the parameter estimation processing with the logistic regression analysis will be described.
[0105]
[0106] As illustrated in
[0107] The information processing device A 110 and the information processing device B 120 each retain only either the explanatory variable (x) or the outcome variable (y).
[0108] According to the present embodiment, the information processing device A 110 is an outcome-variable retaining device that retains the outcome variable (y) and the information processing device B 120 is an explanatory-variable retaining device that retains the explanatory variable (x).
[0109] For example, the two information processing devices A 110 and 120 hold pieces of data as in
[0110] In addition, the companies each are in a state where the data is an asset having an economic value and is undesirable to supply to a different company.
[0111] Meanwhile, there is a need for acquisition of much more knowledge with a data combination between different companies than individual use. In the processing to be described below according to the present disclosure, the two entities (information processing device A 110 and information processing device B 120) securely estimate the logistic regression parameters, namely, the parameters: _0, . . . , _r in (Expression 1) described earlier, without sharing the data itself mutually.
[0112] The processing to be described below according to the present technology enables the two entities (information processing device A 110 and information processing device B 120) to estimate the logistic regression parameters _0, . . . , _r without the mutual data sharing. The parameter estimation enables each of the entities (information processing device A 110 and information processing device B 120) to derive (estimate) the relationship between the explanatory variable (x) and the outcome variable (y).
[0113] As illustrated in
[0114] Note that, the logistic regression model is the expression of calculating the event occurrence probability p(x) from the explanatory variable (x) and the logistic regression parameters _0, . . . , _r, expressed in (Expression 1) described earlier. The event occurrence probability p(x) corresponds to, for example, the estimate (0 to 1) of the outcome variable (y).
[0115] Specifically, p(x)=1 represents the outcome variable y=1, namely, onset of disease, and p(x)=0 represents the outcome variable y=0, namely, non-onset of disease.
[0116] Estimation of the parameters _0, . . . , _r by the parameter estimation with the logistic regression model expressed in (Expression 1), setting of the estimated parameters into (Expression 1), and substitution of the explanatory variables (x1 to x3) of a user i (sample i) having the outcome variable (y) not acquired enable a value of 0 to 1 to be calculated for the event occurrence probability p(x).
[0117] If the calculated value p(x) is approximate to 1, a high possibility of onset of disease can be determined for the user i (sample i).
[0118] Meanwhile, if the calculated value p(x) is approximate to 0, a low possibility of onset of disease can be determined for the user i (sample i).
[0119] A specific embodiment for estimating the logistic regression parameters _0, . . . , _r, will be described below.
[0120] Before the specific description, definition of terms and fundamental algorithms will be first described.
[0121] (2-1. Explanatory Variable)
[0122] (2-1-1) Parameter Estimation Algorithm for Explanatory Variable (x) being Continuous Variable
[0123] A continuous variable is a measurable variable in number or quantity, and is, for example, age, cholesterol level, or the like in the example illustrated in
[0124] In this manner, in a case where the explanatory variable (x) is the continuous variable, the value of the explanatory variable (x) being the continuous variable, remaining intact may be substituted for the explanatory variables (x_1, . . . , x_r) of the probability estimation expression based on (Expression 1) described earlier.
[0125] That is, for example, age data (54) indicating age, data (213) indicating cholesterol level, and the like in the explanatory variable (x) remaining intact may be substituted for the explanatory variables (x_1, . . . , x_r) in (Expression 1).
[0126] (2-1-2) Parameter Estimation Algorithm for Explanatory Variable (x) being Categorical Variable
[0127] A categorical variable is an unmeasurable variable in number or quantity, and is, for example, data of gender or the like (e.g., male=1, female=0). In a case where two values to be taken by the categorical variable are provided, the value of the explanatory variable (x) is 0 or 1.
[0128] In this case, the value (0 or 1) of the explanatory variable (x) remaining intact may be substituted for the explanatory variables (x_1, . . . , x_r) of the probability estimation expression based on (Expression 1) described earlier.
[0129] In a case where three or more values to be taken by the categorical variable are provided, for example, in a case where the explanatory variable (x) having three or more categories, such as residence (Tokyo, Kanagawa, Saitama, and the like), is used, the value of the explanatory variable (x) remaining intact cannot be substituted for the explanatory variables (x_1, . . . , x_r) of the probability estimation expression based on (Expression 1) described earlier.
[0130] A category number of three or more in the j-th explanatory variable (x_j) is defined as K, and a categorical identifier is defined as k=1, 2, . . . , K.
[0131] At this time, K number of explanatory variables (x_jk) corresponding to the category number K, are set for the j-th explanatory variable (x_j), and the K number of explanatory variables (x_jk) in value are set as follows:
[0132] x_jk=1: belonging to the k category of the j-th explanatory variable, and
[0133] x_jk=0: not belonging to the k category of the j-th explanatory variable.
[0134] k includes 1 to K, and the explanatory variables (x_jk) are set in the same number as the category number K.
[0135] Furthermore, for the parameter , parameters are set in corresponding number to the category number K in the j-th explanatory variable (x_j). That is, the parameter _jk (k=1, . . . , K_j) is a parameter corresponding to the explanatory variable (x_jk).
[0136] The processing alters (Expression 1) described earlier, namely, the expression of calculating the probability p(x) of occurrence of the event under the condition including the observation values (x1 to xr) of the explanatory variable (x) given, into (Expression 2) below.
[0137] In (Expression 2) above, x_1k, . . . , x_rk each are the explanatory variable of the category k (k=1 to K_j) of the event j (j=1 to r).
[0138] The explanatory variable (x_jk) is a provisional explanatory variable corresponding to the category, generated from the original explanatory variable (x_j), and is also referred to as a dummy variable.
[0139] In addition, _0, _1k, . . . , _rk are logistic regression parameters.
[0140] Note that, _1k, . . . , _rk each are the logistic regression parameter corresponding to the explanatory variable of the category k (k=1 to K_j) of the event j (j=1 to r).
[0141] Note that, for use of (Expression 2) above, the estimate of the parameter (_jk) corresponding to each category is ineffective for an absolute value, but is effective for a relative difference, and thus a first category parameter is typically set to zero, for example. Thus, the degree of freedom is K1 for the category number K.
[0142] (2-1-3) Parameter Estimation Algorithm for Explanatory Variable (x) Including Continuous Variable and Categorical Variable Mixed
[0143] Next, a parameter estimation algorithm for the explanatory variable (x) including the continuous variable and the categorical variable mixed, will be described.
[0144] Parameters to be set corresponding to the explanatory variable (x_j) corresponding to the continuous variable and the explanatory variable (x_jk) corresponding to the categorical variable, are as follows:
[0145] (a) a parameter (_j) corresponding to the explanatory variable (x_j) corresponding to the continuous variable, and
[0146] (b) a parameter (_jk) corresponding to the explanatory variable (x_jk) corresponding to the categorical variable.
[0147] The degree of freedom of each parameter (number of parameters to be estimated independently) is as follows:
[0148] (a) 1 for the parameter (_j) corresponding to the explanatory variable (x_j) corresponding to the continuous variable, and
[0149] (b) K1 (category number=K) for each j for the parameter (_jk) corresponding to the explanatory variable (x_jk) corresponding to the categorical variable.
[0150] Therefore, in a case where s number of explanatory variables (x_j) corresponding to the continuous variable and t number of explanatory variables (x_jk) corresponding to the categorical variable are mixed, the number of independent parameters relating to the s number of explanatory variables (x_j) corresponding to the continuous variable is s in number and the number of independent parameters relating to the t number of explanatory variables (x_jk) corresponding to the categorical variable with a category number of (K_j) is (K_11)+(K_21)+ . . . +(K_t1) in number.
[0151] (2-1-4) Sample and Profile
[0152] Next, a sample being data to be used for the parameter estimation and a profile being an intermediate data structure to be generated from the sample, will be described.
[0153] The sample includes, for example, the samples (i) of
[0154] Each of the samples (i) has j number of explanatory variables (x_j) and at least one outcome variable (y) set in value.
[0155] (i) Sample
[0156] With the sample being n in size (number), the value of the outcome variable (y_i) corresponding to the i-th sample (i=1, n), is defined as follows:
[0157] y_i=1: occurrence of an event, and
[0158] y_i=0: non-occurrence of the event.
[0159] Similarly, r number of explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) are ready for the explanatory variable (x_j) corresponding to the i-th sample (i=1, n).
[0160] For example, the data is similar to (1) sample unit data illustrated on the left of
[0161] The number of times of occurrence of the event corresponding to the number of samples satisfying that the value of the outcome variable (y) is 1, namely, satisfying y_i=1, is expressed in (Expression 3) below.
[0162] (ii) Profile
[0163] A vector including the configuration values of the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r), note that i=1 to n, is defined as an explanatory variable vector x.sup.i.
[0164] For x_j (j=1, J), different patterns extracted and numbered from n number of explanatory variable vectors x.sup.i are referred to as the profile.
[0165] The profile extraction generates (2) profile unit data illustrated on the right of
[0166] When the number of samples and the number of times of occurrence of the event in the profile x_j are defined as n_j and d_j, respectively, (Expression 4) below is satisfied.
[0167] In (Expression 4) above, J represents the number of patterns of the explanatory variable occurring in the sample.
[0168] In addition, the following expression is defined: x_j=(x_j1, . . . , x_jr).
[0169] (d), in (2) the profile unit data, includes data corresponding to the number of samples having the outcome variable (y) satisfying y=1.
[0170] [3. Estimation Processing of Logistic Regression Parameter with Maximum Likelihood Method]
[0171] As described earlier, the estimation of the logistic regression parameters (_0, . . . , _r) with (Expression 1) above, namely, (Expression 1) based on the logistic regression model, enables, when values of the explanatory variable (x) are given, the outcome variable (y) corresponding to the explanatory variable more reliably.
[0172] (Expression 1: the logistic regression model) above is the expression of calculating the probability p(x) of occurrence of the event with arithmetic of the observation values (x1 to xr) of the explanatory variable (x) and the logistic regression parameters (_0, . . . , _r).
[0173] A method of estimating the parameter =_0, . . . , _r with the maximum likelihood method in a case where the sample and the profile have been given, will be first described.
[0174] For example, the method is parameter estimation processing in a case where all the data illustrated in
[0175] That is, for example, the method of estimating, in a case where one organization (entity) retains data including both an outcome variable value and an explanatory variable value and a storage unit in an information processing device available to the one organization (entity) stores data including the outcome variable value and the explanatory variable value for a plurality of samples, the parameter =_0, . . . , _r with the maximum likelihood method with the data will be described.
[0176] The likelihood of a group having the profile x_j observed, is defined in (Expression 5) below.
[0177] With the likelihood of the group having the profile x_j observed is defined in (Expression 5) above, the entire likelihood is expressed in (Expression 6) below.
[0178] The maximum likelihood method finds the most suitable value of the parameter when the samples are given. That is, the value of the parameter at which the likelihood of the observed data set is maximum is found from all available values of the parameter .
[0179] Specifically, a maximum likelihood estimate _ML maximizing a likelihood function like () is acquired to estimate the parameter maximizing the likelihood. (Expression 7) below is used for the computation.
[0180] Simultaneous equations in which (Expression 7) above differentiated partially with respect to the parameter is defined as zero, are only required to be solved.
[0181] That is, simultaneous equations in (Expression 8) below are solved.
[0182] Because the simultaneous equations expressed in (Expression 8) above are nonlinear with respect to the parameter , is acquired by linear approximation of Taylor expansion with the Newton-Raphson method (iterative convergence method).
[0183] The parameter is calculated with the Newton-Raphson method (iterative convergence method). Typically, the solution of the maximum likelihood estimate of the parameter can be calculated by iterative computation below.
[Math. 9]
.sup.(k+1)=.sup.(k)+I.sup.1(.sup.(k))S(.sup.(k))(Expression 9)
[0184] (Expression 9) above is repeated until (Expression 10) below is satisfied.
[0185] Note that, k in (Expression 9) above represents the number of repetitions.
[0186] An appropriate arbitrary value is set to a parameter initial value: .sup.(k) with k=0, and then the iterative computation starts.
[Math. 10]
|{L(.sup.(k+1))L(.sup.(k))}/L(.sup.(k))|<(=approximately 0.00001) (Expression 10)
[0187] The iterative computation of (Expression 9) above until the satisfaction of (Expression 10) above, can acquire the parameter .
[0188] The meaning of each variable is expressed in (Expression 11) below.
[0189] The technique described above is a parameter estimation method in the situation in which the explanatory variable (x) and the outcome variable (y) both are known.
[0190] However, as described above, practically, the explanatory variable (x) and the outcome variable (y) each are often the secure data, such as personal data, and thus the situation in which the explanatory variable (x) and the outcome variable (y) both are known is often difficult to acquire.
[0191] A parameter estimation method in that case will be described below.
[0192] [4. Estimation Method of Logistic Regression Parameter with Secure Computation]
[0193] Next, a method of estimating the parameter =_0, . . . , _r with the maximum likelihood method with secure computation, in a case where the pieces of data of the explanatory variable (x) and the outcome variable (y) are separately retained by, for example, different organizations and the pieces of data are not allowed to be disclosed mutually as illustrated in
[0194] As described earlier with reference to
[0195] In addition, the companies each are in a state where the data is an asset having an economic value and is undesirable to supply to a different company.
[0196] Meanwhile, there is a need for acquisition of much more knowledge with a data combination between different companies than individual use.
[0197] Processing will be described below in which the two entities (information processing device A 110 and information processing device B 120) illustrated in
[0198] The processing to be described below is that the two entities (information processing device A 110 and information processing device B 120) estimate the logistic regression parameters _0, . . . , _r without the mutually sharing of the secure data.
[0199] The parameter estimation enables each of the entities (information processing device A 110 and information processing device B 120) to derive (estimate) the relationship between the explanatory variable (x) and the outcome variable (y).
[0200] The two different devices each retaining only either the explanatory variable (x) or the outcome variable (y) performs data conversion, such as encryption, to its own explanatory variable (x) or outcome variable (y), to provide the other device with converted data.
[0201] The logistic regression parameters _0, . . . , _r set in the logistic regression model, namely, (Expression 1) described above are estimated with application of the converted data.
[0202] In this manner, without performing the sharing processing of the secure data, such as the explanatory variable (x) or the outcome variable (y), each of the entities (information processing device A 110 and information processing device B 120) performs arithmetic processing with the converted data of the secure data to acquire various arithmetic results of the secure data, such as an added result, a multiplied result, and an inner product of the secure data, for example.
[0203] Note that, the computation processing with the converted data of the secure data is referred to as the secure computation.
[0204] For the secure computation, the converted data of the secure data is used instead of the secure data itself. Various types of converted data, such as encrypted data and segmented data of the secure data, for example, are provided as the converted data.
[0205] An example of the secure computation is a GMW scheme described in Non-Patent Document 1 (O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game. STOC'87, pp. 218-229, 1987), for example.
[0206] An outline of secure computation processing based on the GMW scheme will be described with reference to
[0207]
[0208] A device A 210 retains secure data X (e.g., explanatory variable (x)).
[0209] In addition, a device B 220 retains secure data Y (e.g., outcome variable (y)).
[0210] The secure data X and the secure data Y are the secure data, such as personal data, undesirable to release.
[0211] The device A 210 segments the secure data X into two pieces of data as below. Note that X is set as residual data of a predetermined numerical value m: mod m.
X=((x_1)+(x_2))mod_m
[0212] In the above expression, (x_1) is selected from 0 to (m1) uniformly and randomly and (x_2) is determined to satisfy the following expression: (x_2)=(X(x_1))mod m.
[0213] In this manner, the two pieces of segmented data (x_1) and (x_2) are generated.
[0214] Note that, here, the data to be segmented is, for example, the value (1) of gender of a sample (user) in the secure data illustrated in
[0215] The value (0) of gender can be subjected to processing such as segmentation into (40) and (60) as a segmented value.
[0216] Age (54) can be subjected to processing such as segmentation into (10) and (44) or can be subjected to other various types of segmentation processing.
[0217] An important thing is that the original secure data (explanatory variable) is prevented from being specified from individual converted data (here, one piece of segmented data).
[0218] For example, the segmented data is not released as a set, and, for example, only one piece of segmented data is released, namely, is provided to the other device.
[0219] Meanwhile, the device B 220 also segments the secure data Y into two pieces of data as below:
Y=((y_1)+(y_2))mod_m.
[0220] In the above expression, (y_1) is selected from 0 to (m1) uniformly and randomly, and (y_2) is determined to satisfy the following expression: (y_2)=(Y(y_1))mod m.
[0221] In this manner, the two pieces of segmented data (y_1) and (y_2) are generated.
[0222] As illustrated in
[0223] The device A 210 provides the device B 220 with the segmented data (x_1).
[0224] Meanwhile, the device B 220 provides the device A 210 with the segmented data (y_2).
[0225] X and Y each are the secure data, and thus are not allowed to leak.
[0226] However, even if only one piece of data of the pieces of segmented data (x_1) and (x_2) of X is acquired, the secure data X cannot be specified.
[0227] Similarly, even if only one piece of data of the pieces of segmented data (y_1) and (y_2) of Y is acquired, the secure data Y cannot be specified.
[0228] Therefore, only partial data of the segmented data of the secure data, is insufficient to specification of the secure data, and thus is allowed to be output outward.
[0229] In this manner, the device A 210 outputs the segmented data (x_1) to a computation-processing execution unit of the device B 220.
[0230] Meanwhile, the device B 220 outputs the segmented data (y_2) to a computation-processing execution unit of the device A 210.
[0231] (Step S21a)
[0232] At step S21a, the computation-processing execution unit of the device A 210 performs the following inter-segmented-data addition processing with the segmented data:
((x_2)+(y_2))mod m.
[0233] The device A 210 outputs an added result thereof to the computation-processing execution unit of the device B 220.
[0234] (Step S21b)
[0235] Meanwhile, at step S21b, the computation-processing execution unit of the device B 220 performs the following inter-segmented-data addition processing with the segmented data:
((x_1)+(y_1))mod m.
[0236] The device B 220 outputs an added result thereof to the computation-processing execution unit of the device A 210.
[0237] (Step S22a)
[0238] Next, at step S22a, the computation-processing execution unit of the device A 210 performs the following processing.
[0239] Two added results are further added, the two added results including: (1) the added result (x_2)+(y_2) of the segmented data calculated at step S21a; and (2) the added result (x_1)+(y_1) of the segmented data input from the device B 220. That is, the following computation is performed.
((x_1)+(y_1)+(x_2)+(y_2))mod m
[0240] The total added value of the segmented data is equivalent to the added value of the original secure data X and secure data Y.
[0241] That is, the following expression is satisfied: ((x_1)+(y_1)+(x_2)+(y_2))mod m=X+Y.
[0242] (Step S22b)
[0243] Meanwhile, at step S22b, the computation-processing execution unit of the device B 220 performs the following processing.
[0244] Two added results are further added, the two added results including: (1) the added result (x_1)+(y_1) of the segmented data calculated at step S21b; and (2) the added result (x_2)+(y_2) of the segmented data input from the device A 210. That is, the following computation is performed.
((x_1)+(y_1)+(x_2)+(y_2))mod m
[0245] The total added value of the segmented data is equivalent to the added value of the original secure data X and secure data Y.
[0246] That is, the following expression is satisfied: ((x_1)+(y_1)+(x_2)+(y_2))mod m=X+Y.
[0247] In this manner, both the device A and the device B can calculate, without outputting the secure data X and the secure data Y outward, respectively, the added value of the secure data X and the secure data Y, namely, X+Y.
[0248] The processing illustrated in
[0249] Note that, the processing described with reference to
[0250]
[0251] The device A 210 retains the secure data X.
[0252] In addition, the device B 220 retains the secure data Y. The secure data X and the secure data Y are the secure data undesirable to release.
[0253] The device A 210 segments the secure data X into two pieces of data:
X=((x_1)+(x_2))mod m.
[0254] In this manner, the secure data X is randomly segmented to generate the two pieces of segmented data (x_1) and (x_2).
[0255] Meanwhile, the device B 220 also segments the secure data Y into two pieces of data:
Y=((y_1)+(y_2))mod m.
[0256] In this manner, the secure data Y is randomly segmented to generate the two pieces of segmented data (y_1) and (y_2).
[0257] At step S30 illustrated in
[0258] Meanwhile, the device B 220 provides the computation-processing execution unit of the device A 210 with the segmented data (y_2).
[0259] X and Y are the secure data, and thus are not allowed to leak.
[0260] However, even if only one piece of data of the pieces of segmented data (x_1) and (x_2) of X is acquired, the secure data X cannot be specified.
[0261] Similarly, even if only one piece of data of the pieces of segmented data (y_1) and (y_2) of Y is acquired, the secure data Y cannot be specified.
[0262] Therefore, only partial data of the segmented data of the secure data, is insufficient to specification of the secure data, and thus is allowed to be output outward.
[0263] In this manner, the device A 210 outputs the segmented data (x_1) to the computation-processing execution unit of the device B 220.
[0264] Meanwhile, the device B 220 outputs the segmented data (y_2) to the computation-processing execution unit of the device A 210.
[0265] Processing in the computation-processing execution unit of the device A 210 will be described.
[0266] The device A 210 retains the pieces of segmented data (x_1) and (x_2) of X and the segmented data (y_1) of Y received from the device B 220.
[0267] The processing is performed by the following procedure.
[0268] (Step S31a)
[0269] The computation-processing execution unit of the device A 210 performs [1-out-of-m OT] having an input/output value setting including an input value being x_2 and an output value M(x_2) satisfying M (x_2)=(x_2) x (y_1)+r, together with the device B 220.
[0270] Note that, [1-out-of-m Oblivious Transfer (OT)] is an arithmetic protocol for performing the following processing.
[0271] Two entities being a sender and a selector are present.
[0272] The sender has an input value (M_0, M_1, . . . , M_(m1)) including m number of elements.
[0273] The selector has an input value being {0, 1, . . . , m1}.
[0274] The selector requests the sender having the m number of elements to send one element, so that the selector can acquire only the value of one element M_. The other (m1) number of elements: M_i (i) are not allowed to be acquired.
[0275] Meanwhile, the sender is not allowed to know the input value of the selector.
[0276] In this manner, the [1-out-of-m OT] protocol is intended for performing arithmetic processing with the transmission and reception of only one element from the m number of elements, and has a setting for preventing which one of the m number of elements has been transmitted and received, from being specified on the element reception side.
[0277] (Step S32a)
[0278] The computation-processing execution unit of the device A 210 performs [1-out-of-m OT] having an input/output value setting including an input value being y_2 and an output value M_(y_2) satisfying M_(y_2)=(x_1) x (y_2)+r, together with the device B 220.
[0279] (Step S33a)
[0280] As the output value of the device A 210, an output value: M_(x_2)+M_(y_2) is computed in accordance with the following expression:
M_(x_2)+M_(y_2)=((x_2)(y_2)+(x_2)(y_1)+r+(x_1)(y_2)+r)mod m.
[0281] Processing in the computation-processing execution unit of the other device B 220 will be described.
[0282] The device B 220 retains the pieces of segmented data (y_1) and (y_2) of Y and the segmented data (x_1) of X received from the device A 210.
[0283] The processing is performed by the following procedure.
[0284] (Step S31b)
[0285] With selection of a random number r e {0, . . . , m1}, an input value string to be used for [1-out-of-m OT] is generated on the basis of the segmented value y_1 of the secure data Y, the input value string being i x (y_1)+r, note that, i=0, 1, . . . , (m1).
[0286] Specifically, the following input value strings: M_0 to M_(m1) are generated:
[0287] The input value strings are generated.
[0288] Furthermore, the computation-processing execution unit of the device B 220 performs [1-out-of-m OT] based on the setting at step S31a described above, together with the device A 210.
[0289] (Step S32b)
[0290] With selection of a random number r{0, . . . , m1}, an input value string to be used for [1-out-of-m OT] is generated on the basis of the segmented value y_1, the input value string being i x (x_1)+r, note that, i=0, 1, . . . , (m1).
[0291] Specifically, the following input value strings: M_0 to M_(m1) are generated:
[0292] The input value strings are generated.
[0293] Furthermore, the computation-processing execution unit of the device B 220 performs [1-out-of-m OT] based on the setting at step S32a described above, together with the device A 210.
[0294] (Step S33b)
[0295] The following output value is calculated as the output value of the device B 220:
((x_1)(y_1)rr)mod m.
[0296] The value is calculated as the output value of the device B 220.
[0297] The following computation processing with the output value calculated by the device A 210 at step S33a and the output value calculated by the device B 220 at step S33b can calculate the multiplied value XY of the secure data X and the secure data Y:
[0298] The mutual provision of the calculated result at step S33a and the calculated result at step S33b between the device A 210 and the device B 220 can calculate the multiplied value XY of the secure data X and the secure data Y.
[0299] In this manner, both the device A and the device B can calculate, without outputting the secure data X and the secure data Y outward, respectively, the multiplied value of the secure data X and the secure data Y, namely, XY.
[0300] The processing illustrated in
[0301] Note that, the processing described with reference to
[0302] In addition, the exemplary secure computation processing illustrated in
[0303] Exemplary secure computation will be described with reference to
[0304] (Expression a) illustrated in
[0305] That is, (Expression a) is intended for estimating the parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method).
[0306] The parameter is calculated with the Newton-Raphson method (iterative convergence method). Typically, the solution of the maximum likelihood estimate of the parameter can be calculated by iterative computation of (Expression a) below.
[Math. 12]
.sup.(k+1)=.sup.(k)+I.sup.1(.sup.(k))S(.sup.(k))(Expression a)
[0307] (Expression a) above is repeated until (Expression a2) below is satisfied.
[Math. 13]
|{L(.sup.(k+1))L(.sup.(k))}/L(.sup.(k))|<(=approximately 0.00001) (Expression a2)
[0308] The iterative computation of (Expression a) above until the satisfaction of (Expression a2) above, can acquire the parameter .
[0309] (Expression a) above can be expanded as illustrated in
[0310] As illustrated in
[0311] Furthermore, (Expression b) above includes matrices X and V expressed in (Expression b2) below.
[0312] As illustrated in
[0313] In addition, (Expression c) above includes (Expression d) and (Expression e) below as illustrated in
[0314] (Expression d) and (Expression e) above correspond to the simultaneous equations in (Expression 8) described earlier. That is, (Expression d) and (Expression e) correspond to the simultaneous equations in which L()=log {like ()}= . . . in (Expression 7) for acquiring the maximum likelihood estimate _ML maximizing the likelihood function like () differentiated partially with respect to , is defined as 0.
[0315] As illustrated in
[0316] Note that, (d_j) included in (Expression d) and (Expression e) of
[0317] As described above, the iterative computation of (Expression a) illustrated in
[0318] However, as illustrated in
[0319] The secure data, namely, the explanatory variable (x) and the outcome variable (y) individually retained by the two different information processing devices, are not allowed to be shared or released.
[0320] Therefore, without use of the explanatory variable (x) and the outcome variable (y) remaining intact, the iterative computation processing of (Expression a) illustrated in
[0321] The secure computation performs computation applied with the converted data of each piece of secure data input or output between the devices, for example, generation of the converted data of the secure data (e.g., segmented data) and input or output of the converted data between the devices, as described with reference to
[0322] For example, the matrix X and the matrix V expressed in
[0323] Therefore, in order to perform the secure computation, there is a need to generate the converted data, such as the segmented data, for each of the explanatory variables included in the matrix X and the matrix V illustrated in
[0324] For (Expression d) and (Expression e) illustrated in
[0325] The throughput of such data conversion processing, data input/output processing, or furthermore computation processing with the converted data, increases as the amount of secure data to be applied to the secure computation increases.
[0326] Therefore, for a large amount of secure data, the iterative computation processing of (Expression a) illustrated in
[0327] [5. Estimation Method of Logistic Regression Parameter with Secure Computation Reduced]
[0328] As described above, in a case where the pieces of data of the explanatory variable (x) and the outcome variable (y) are separately retained by, for example, the different organizations and the pieces of data are not allowed to be disclosed mutually, the estimation of the parameter =_0, . . . , _r with the secure computation needs a plenty of computational time and a plenty of computational resources, and thus has a problem that the computational cost increases.
[0329] A configuration having a solution for the problem, namely, processing capable of estimating the logistic regression parameter =_0, . . . , _r with reduction of the computational complexity of the secure computation without mutual disclosure of the pieces of data of the explanatory variable (x) and the outcome variable (y), will be described below.
[0330] As described earlier with reference to
[0331] In addition, the companies each are in a state where the data is an asset having an economic value and is undesirable to supply to a different company.
[0332] Meanwhile, there is a need for acquisition of much more knowledge with a data combination between different companies than individual use. In the processing to be described below according to the present disclosure, the two entities (information processing device A 110 and information processing device B 120) illustrated in
[0333] Note that, setting the estimated parameters into, for example, the logistic regression model (Expression 1 described above), enables the probability p(x) from various values of the explanatory variable (x), namely, the estimate of the outcome variable (y) to be calculated.
[0334] That is, each of the entities (information processing device A 110 and information processing device B 120) can estimate the relationship between the explanatory variable (x) and the outcome variable (y).
[0335] The two different devices each retaining only either the explanatory variable (x) or the outcome variable (y) performs data conversion, such as encryption, to its own explanatory variable (x) or outcome variable (y), to provide the other device with converted data.
[0336] The logistic regression parameters _0, . . . , _r set in the logistic regression model, namely, (Expression 1) described above are estimated with application of the converted data.
[0337]
[0338]
[0339] The parameter-calculation execution units 111 and 121 perform the parameter estimation without leaking the explanatory variable (x) and the outcome variable (y) outward.
[0340] The parameter-calculation execution unit 111 of the information processing device A 110 being the outcome-variable retaining device, includes an input unit 131, an inner-product computation unit 132, an iterative-computation input-value generation unit 133, and a data transmission/reception unit 134.
[0341] Meanwhile, the parameter-calculation execution unit 121 of the information processing device B 120 being the explanatory-variable retaining device, includes an input unit 141, an inner-product computation unit 142, a data transmission/reception unit 143, an iterative computation unit 144, and an output unit 145.
[0342]
[0343] That is, the flowchart describes the processing sequence of estimating the logistic regression parameter =_0, . . . , _r in the logistic regression model (Expression 1), with the maximum likelihood method.
[0344] The sequence of the calculation processing of the logistic regression parameter =_0, . . . , _r with the maximum likelihood method, will be specifically described below with reference to the block diagram illustrated in
[0345] (a. Setting)
[0346] The element (i) and the explanatory variable (x) and the outcome variable (y) set corresponding to each element, included in the data to be subjected to the calculation processing of the logistic regression parameter =_0, _r in the logistic regression model (Expression 1), are set as follows:
[0347] For n number of samples and the i-th sample (i=1, . . . , n),
[0348] outcome variable: y_i {0, 1} and
[0349] explanatory variable: r number of variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r).
[0350] The explanatory variable and the outcome variable are associated with each other.
[0351] The information processing device A 110 retains data y_i (i=1, . . . , n) including an outcome variable value.
[0352] The information processing device B 120 retains data (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) (i=1, . . . , n) including an explanatory variable value.
[0353] The pieces of data are the secure data not allowed to be released.
[0354] The logistic regression parameter =_0, . . . , _r is estimated without mutual disclosure of the outcome variable and the explanatory variable individually retained by the devices.
[0355] (b. Procedure)
[0356] Next, the procedure of the estimation processing of the logistic regression parameter =_0, . . . , _r will be described.
[0357] The processing at each step in the flowchart illustrated in
[0358] (Step S101)
[0359] The processing at step S101 includes data input processing of the input units.
[0360] At step S101a, the input unit 131 of the parameter-calculation execution unit 111 in the information processing device A 110 being the outcome-variable (y) retaining device illustrated in
[0361] Meanwhile, at step S101b, the input unit 141 of the parameter-calculation execution unit 121 in the information processing device B 120 being the explanatory variable (x) retaining device acquires the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , r) (note that, i=1, . . . , n) retained in a storage unit of the information processing device B 120, from the storage unit, to input the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) into the parameter-calculation execution unit 121.
[0362] (Step S102)
[0363] The processing at step S102 includes processing to be performed by the inner-product computation units 132 and 142 in the parameter-calculation execution units 111 and 121 of the information processing device A 110 and the information processing device B 120, respectively.
[0364] The inner-product computation units 132 and 142 calculate the inner product (t_s) of the explanatory variable (x) and the outcome variable (y), in accordance with (Expression 12) below.
[Math. 17]
t.sub.s=.sub.i=1.sup.nx.sub.s.sup.iy.sub.i (s=1, . . . ,r)(Expression 12)
[0365] Note that, because the explanatory variable (x) and the outcome variable (y) both are the secure data subject to restriction of release, the calculation processing of the inner product (t_s) based on (Expression 12) above is performed with arithmetic not applied directly with the explanatory variable (x) and the outcome variable (y) being the secure data, namely, the secure computation applied with the converted data of the explanatory variable (x) and the outcome variable (y) as described with reference to
[0366] The calculation processing of the inner product (t_s) based on (Expression 12) above, is performed with the secure computation not using directly the data y_i (i=1, . . . , n) including the outcome variable value, being the input value of the information processing device A 110, and the data (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) (i=1, . . . , n) including the explanatory variable value, being the input value of the information processing device B 120.
[0367] As described earlier with reference to
[0368] Note that, the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) expressed in (Expression 12) above can be expressed in (Expression 13) below including (d) in (2) the profile unit data illustrated on the right of
[0369] The arithmetic applied with d expressed in (Expression 13) above, namely, the arithmetic expression applied with the data d corresponding to the number of samples having the outcome variable (y) satisfying y=1, is included in part of (Expression e) in the computational expression for estimating the parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method) described earlier with reference to
[0370]
[0371] As illustrated in
[0372] The calculation processing of the inner product (t_s) to be performed at step S102, namely, the calculation processing of the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) corresponds to processing of performing, as the secure computation, the arithmetic expression 301 in (Expression e) in
[0373] Note that, as described above, for the secure computation, the converted data of the secure data is used instead of the secure data itself.
[0374] Various types of converted data, such as encrypted data of the secure data and the segmented data described with reference to
[0375]
[0376]
[0377] In addition,
[0378] As described with reference to
[0379] The processing at step S102 illustrated in the flowchart of
[0380] A combination of the processing of calculating the added value of the secure data X and the secure data Y described earlier with reference to
[0381] That is, at step S102, the information processing device A 110 and the information processing device B 120 each output only the converted data to the other device to calculate the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) with the secure computation, without mutual disclosure of the value of the outcome variable (y) and the value of the explanatory variable (x) being the secure data retained by the devices.
[0382] (Step S103)
[0383] Next, at step S103 of the flow illustrated in
[0384] The data transmission/reception unit 143 of the parameter-calculation execution unit 121 in the information processing device B 120 being the explanatory-variable (x) retaining device receives the sum total (t_0) of the outcome variable (y) transmitted by the information processing device A.
[0385] Note that, the sum total (t_0) of the outcome variable (y) expressed in (Expression 14) above can be expressed in (Expression 15) below including (d) in (2) the profile unit data illustrated on the right of
[0386] The arithmetic applied with d expressed in (Expression 15) above, namely, the arithmetic expression applied with the data d corresponding to the number of samples having the outcome variable (y) satisfying y=1, is included in part of (Expression d) expressed in the computational expression for estimating the parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method) described earlier with reference to
[0387] As illustrated in
[0388] The calculation processing of the sum total (t_0) of the outcome variable (y), to be performed at step S103, corresponds to processing of performing the arithmetic expression 302 in (Expression d) in
[0389] Note that, because the processing at step S103 is performed inside the information processing device A 110 being the outcome-variable (y) retaining device, the processing is not required to be performed as the secure computation.
[0390] That is, without performance of generation processing of the converted data of the outcome variable (y) and output processing of the converted data to the external device, the processing at step S103 can be performed to calculate the sum total (t_0) of the outcome variable (y), in the arithmetic device inside the information processing device A 110 with acquisition of the outcome variable (y) being the secure data retained inside the information processing device A 110 and application of the acquired outcome variable (y) remaining intact.
[0391] Note that, the sum total (t_0) of the outcome variable (y) is not the secure data and thus can be output outward.
[0392] In this manner, the information processing device A 110 being the outcome-variable (y) retaining device calculates the sum total (t_0) of the outcome variable (y) with the typical arithmetic processing applied with the secure data, instead of the secure computation to output the sum total (t_0) of the outcome variable (y) to the information processing device B.
[0393] Such typical arithmetic processing can make a considerable reduction in computational time or computational resources in comparison to performance of the secure computation.
[0394] The iterative-computation input-value generation unit 133 in the information processing device A 110 calculates the sum total (t_0) of the outcome variable (y) in accordance with (Expression 14) or (Expression 15) described above to output the calculated value to the parameter-calculation execution unit 121 in the information processing device B 120 through the data transmission/reception unit 134.
[0395] (Step S104)
[0396] Next, at step S104, the iterative computation unit 144 of the parameter-calculation execution unit 121 in the information processing device B 120 being the explanatory-variable (x) retaining device performs the iterative computation of the Newton-Raphson method to the expression based on the logistic regression model expressed in (Expression 1) described earlier to perform updating and calculation processing of the logistic regression parameter _i (i=0, 1, . . . , r).
[0397] Specifically, computation for (a) and (b) expressed in (Expression 17) below is repeated until (Expression 16) below is satisfied in terms of preset (e.g., =0.00001).
[0398] The repeating computation for (a) and (b) expressed in (Expression 17) until the satisfaction of (Expression 16) above updates the logistic regression parameter _i (i=0, 1, . . . , r) and determines, as an output parameter, the parameter at the point in time when (Expression 16) above is satisfied.
[0399] Note that, an appropriate arbitrary value may be set to the parameter initial value: .sup.(0) in (Expression 16) and (Expression 17) above.
[0400] In addition, the meaning of each symbol expressed in (Expression 16) and (Expression 17) above is the same as that of each symbol expressed in (Expression 6) to (Expression 11) described earlier as the estimation processing of the logistic regression parameter based on the maximum likelihood method. For example, the following expression is provided:
L()=log {like()}.
[0401] At step S104, the processing to be performed by the iterative computation unit 144 of the parameter-calculation execution unit 121 in the information processing device B 120 being the explanatory-variable (x) retaining device includes the iterative computation of the Newton-Raphson method illustrated in
[0402] However, no secure computation is required in the iterative computation of the Newton-Raphson method at step S104.
[0403] Also at step S104, for example, the matrix X and the matrix V are computed in the iterative computation of the Newton-Raphson method illustrated in
[0404] However, the information processing device B 120 being the explanatory-variable retaining device performs the processing at step S104.
[0405] The information processing device B 120 being the explanatory-variable retaining device sets the matrix X and the matrix V expressed in (Expression b2) of
[0406] That is, the information processing device B 120 being the explanatory-variable retaining device does not need to output the secure data (explanatory variable) outward, and thus can perform the computation with the matrices X and V including the explanatory variable remaining intact input at step S101b.
[0407] In addition, the value (d) based on the outcome variable (y) being the secure data is used in (Expression d) illustrated in
[0408] However, at step S103, the information processing device A 110 being the outcome-variable retaining device generates the computed result with the value (d) based on the outcome variable (y), namely, the arithmetic result (t_0) of the arithmetic expression 302 illustrated in
[0409] Therefore, the information processing device B 120 is required only to substitute the input value (t_0) into (Expression d) of
[0410] The arithmetic expression 301 expressed in (Expression e) of
[0411] In this manner, the performance of the processing based on the flow illustrated in
[0412] (Step S105)
[0413] Next, at step S105, the output unit 145 of the parameter-calculation execution unit 121 in the information processing device B 120 being the explanatory-variable (x) retaining device outputs the logistic regression parameter _i (i=0, 1, . . . , r) calculated at step S104 to the data processing unit in the information processing device B 120.
[0414] The data processing unit in the information processing device B 120 substitutes the logistic regression parameter _i (i=0, 1, . . . , r) output from the parameter-calculation execution unit 121, into the logistic regression model, namely, (Expression 1) described earlier, to perform processing of estimating the outcome variable (y) from various values of the explanatory variable (x).
[0415] As described earlier, in accordance with the logistic regression model expressed in (Expression 1), the probability p(x) of occurrence of the event can be calculated under the condition including the observation values (x_1, . . . , x_r) of the explanatory variable (x) given.
[0416] The probability p(x) corresponds to the value of the outcome variable (y).
[0417] Note that, as interpreted from the flowchart illustrated in
[0418] The information processing device A 110 being the outcome-variable (y) retaining device does not perform the calculation of the logistic regression parameter _i (i=0, 1, . . . , r).
[0419] The information processing device B 120 being the explanatory-variable (x) retaining device that has performed the calculation of the logistic regression parameter _i (i=0, 1, . . . , r), can provide the calculated parameter to the information processing device A 110 in response to a request from the information processing device A 110 being the outcome-variable (y) retaining device. The logistic regression parameter _i (i=0, 1, . . . , r) itself is not the secure data, and thus is allowed to be subjected to input/output processing or sharing processing between the devices.
[0420] In the processing based on the flow illustrated in
[0421] That is, as described earlier, only the calculation processing of the inner product (t_s) based on (Expression 13) below, is included.
[0422] The inner product (t_s) of the explanatory variable (x) and the outcome variable (y) expressed in (Expression 13) above is arithmetic including the explanatory variable (x) and the outcome variable (y) being the secure data not allowed to be released, and the arithmetic is required to be performed as the secure computation.
[0423] That is, for example, as described earlier with reference to
[0424] However, in the flow illustrated in
[0425] That is, the secure computation of, for example, the matrix X and the matrix V required in the iterative computation of the Newton-Raphson method described earlier with reference to
[0426] [6. Reduction Effect in Computational Complexity of Parameter Calculation Processing According to Present Disclosure]
[0427] Next, a reduction effect in the computational complexity of the parameter calculation processing according to the present disclosure, will be described with reference to two flowcharts illustrated in
[0428]
[0429] (1) a processing flow to be performed with the secure computation having the converted data of all of the explanatory variable (x) and the outcome variable (y) to be applied to the iterative computation of the Newton-Raphason method, and
[0430] (2) a processing flow according to the present disclosure to be performed with the secure computation only for the calculation processing of the inner product (t_s) of the explanatory variable (x) and the outcome variable (y).
[0431] The calculation sequence of the logistic regression parameter _i (i=0, 1, . . . , r) based on each of the two processing flows, will be described.
[0432] First, (1) the processing to be performed with the secure computation having the converted data of all of the explanatory variable (x) and the outcome variable (y) to be applied to the iterative computation of the Newton-Raphason method will be described in accordance with the flowchart illustrated in
[0433] (Steps S201a and S201b)
[0434] The processing at steps S201a and b includes the data input processing of the input units.
[0435] At step S201a, the information processing device A 110 being the outcome-variable (y) retaining device acquires the outcome variable y_i (note that, i=1, . . . , n) retained in the storage unit of the information processing device A 110, from the storage unit, to input the outcome variable y_i into the data processing unit (arithmetic execution unit) of the information processing device A 110.
[0436] Meanwhile, at step S201b, the information processing device B 120 being the explanatory variable (x) retaining device acquires the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) (note that, i=1, n) retained in the storage unit of the information processing device B 120, from the storage unit, to input the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) into the data processing unit (arithmetic execution unit).
[0437] (Steps S202a and S202b)
[0438] The processing at steps S202a and S202b includes the generation processing of the converted data of the secure data in the data processing units (arithmetic execution units) of the information processing device A 110 and the information processing device B 120.
[0439] The explanatory variable (x) and the outcome variable (y) both are the secure data subject to restriction of release, and thus the secure data is not allowed to be directly used in the calculation processing of the logistic regression parameter _i (i=0, 1, r).
[0440] Thus, the generation processing of the converted data of the explanatory variable (x) and the outcome variable (y) being the secure data is performed.
[0441] At step S202a, the information processing device A 110 being the outcome-variable retaining device generates the converted data of the outcome variable (y).
[0442] Meanwhile, at step S202b, the information processing device B 120 being the explanatory variable (x) retaining device generates the converted data of the explanatory variable (x).
[0443] Various modes of converted data, such as encrypted data of the secure data (explanatory variable (x) and outcome variable (y)) and the segmented data described with reference to
[0444] (Step S203)
[0445] The next processing at step S203 includes the calculation processing of the logistic regression parameter _i (i=0, 1, . . . , r) based on the maximum likelihood method with the Newton-Raphson method (iterative convergence method) described earlier with reference to
[0446] As described earlier with reference to
[0447] However, as illustrated in
[0448] The secure data, namely, the explanatory variable (x) and the outcome variable (y) individually retained by the two different information processing devices are not allowed to be released mutually.
[0449] Therefore, the iterative computation processing of (Expression a) illustrated in
[0450] The secure computation needs processing of individually converting the secure data and making an input or output between the devices, for example, generation of the segmented data of the secure data and input or output of part of the segmented data between the devices as described with reference to
[0451] For example, the matrix X and the matrix V expressed in (Expression b2) of
[0452] Therefore, in order to perform the secure computation, for example, processing of generating the converted data, such as the segmented data, for each of the explanatory variables included in the matrix X and the matrix V expressed in (Expression b2) of
[0453] For (Expression d) and (Expression e) illustrated in
[0454] Such data conversion processing and data input/output processing increase as the amount of secure data to be applied to the secure computation increases.
[0455] Therefore, for a large amount of secure data, the iterative computation processing of (Expression a) illustrated in
[0456] That is, the processing at step S203 illustrated in
[0457] (Step S204)
[0458] After the calculation of the logistic regression parameter _i (i=0, 1, . . . , r) with the secure computation at step S203, the two information processing devices A and B next output the parameter to the data processing units at step S204.
[0459] The data processing units each perform, for example, processing of estimating an outcome variable from a new explanatory variable with the calculated parameter, in accordance with (Expression 1) described earlier, namely, the logistic regression model.
[0460] In the flow illustrated in
[0461] This is because, as described earlier with reference to
[0462] The matrix X and the matrix V expressed in (Expression b2) of
[0463] For (Expression d) and (Expression e) illustrated in
[0464] Therefore, in a case where the computation of the expressions is performed, there is a need to perform computation processing with generation of the converted data, such as the segmented data, corresponding to each of the explanatory variables and the outcome variables being the secure data.
[0465] In this manner, the performance of the processing based on the flow illustrated in
[0466] Next, the flow illustrated in
[0467] (Steps S301a and S301b)
[0468] The processing at steps S301a and b includes the data input processing of the input units.
[0469] At step S301a, the information processing device A 110 being the outcome-variable (y) retaining device acquires the outcome variable y_i (note that, i=1, . . . , n) retained in the storage unit of the information processing device A 110, from the storage unit, to input the outcome variable y_i into the data processing unit (arithmetic execution unit) of the information processing device A 110.
[0470] Meanwhile, at step S301b, the information processing device B 120 being the explanatory variable (x) retaining device acquires the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) (note that, i=1, n) retained in the storage unit of the information processing device B 120, from the storage unit, to input the explanatory variables (x.sup.i_1, x.sup.i_2, . . . , x.sup.i_r) into the data processing unit (arithmetic execution unit).
[0471] (Steps S302a and S302b)
[0472] The processing at steps S302a and S302b includes the generation processing of the converted data of the secure data in the data processing units (arithmetic execution units) of the information processing device A 110 and the information processing device B 120.
[0473] The explanatory variable (x) and the outcome variable (y) both are the secure data subject to restriction of release, and thus the secure data is not allowed to be directly used in the calculation processing of the logistic regression parameter _i (i=0, 1, r).
[0474] Thus, the generation processing of the converted data of the explanatory variable (x) and the outcome variable (y) being the secure data is performed.
[0475] At step S302a, the information processing device A 110 being the outcome-variable retaining device generates the converted data of the outcome variable (y).
[0476] Meanwhile, at step S302b, the information processing device B 120 being the explanatory variable (x) retaining device generates the converted data of the explanatory variable (x).
[0477] Various modes of converted data, such as encrypted data of the secure data (explanatory variable (x) and outcome variable (y)) and the segmented data described with reference to
[0478] (Step S303)
[0479] The processing at step S303 includes the calculation processing of the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) in the data processing units (arithmetic execution units) of the information processing device A 110 and the information processing device B 120.
[0480] The processing corresponds to the processing at step S102 in the flow of
[0481] As described earlier, the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) is calculated in accordance with (Expression 12) below.
[Math. 24]
t.sub.s=.sub.i=1.sup.nx.sub.s.sup.iy.sub.i (s=1, . . . ,r) (Expression 12)
[0482] Note that, as described above, the inner product (t_s) of the explanatory variable (x) and the outcome variable (y) expressed in (Expression 12) above, can be expressed in (Expression 13) below including (d) in (2) the profile unit data illustrated on the right of
[0483] As described with reference to
[0484] Because the explanatory variable (x) and the outcome variable (y) both are the secure data subject to restriction of release, the calculation processing of the inner product (t_s) based on (Expression 12) above is required to be performed with arithmetic not applied directly with the explanatory variable (x) and the outcome variable (y) being the secure data, namely, the secure computation as described with reference to
[0485] The converted data of the secure data (explanatory variable (x) and outcome variable (y)) generated at steps S302a and S302b, is used for the secure computation.
[0486] In the flow illustrated in
[0487] Only the computation processing of part of (Expression e) described earlier with reference to
[0488] Similarly to the flow illustrated in
[0489] In the flow illustrated in
[0490] However, in the processing based on the flow illustrated in
[0491] (Step S304)
[0492] The next processing at step S304 is that the information processing device A 110 being the outcome-variable (y) retaining device calculates the sum total (t_0) of the outcome variable (y) in accordance with (Expression 14) below to output the calculated value to the parameter-calculation execution unit 121 of the information processing device B120 through the data transmission/reception unit 134.
[0493] Note that, the sum total (t_0) of the outcome variable (y) expressed in (Expression 14) above can be expressed in (Expression 15) below including (d) in (2) the profile unit data illustrated on the right of
[0494] The arithmetic applied with d expressed in (Expression 15) above, namely, the arithmetic expression applied with the data d corresponding to the number of samples having the outcome variable (y) satisfying y=1, is included in part of (Expression d) expressed in the computational expression for estimating the parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method) described earlier with reference to
[0495] As illustrated in
[0496] The calculation processing of the sum total (t_0) of the outcome variable (y), to be performed at step S304, corresponds to the processing of performing the arithmetic expression 302 in (Expression d) in
[0497] Note that, the processing at step S304 is performed inside the information processing device A 110 being the outcome-variable (y) retaining device, and thus the processing is not required to be performed as the secure computation.
[0498] That is, without performance of generation processing of the converted data of the outcome variable (y) and output processing of the converted data to the external device, the processing at step S304 can be performed to calculate the sum total (t_0) of the outcome variable (y) in the arithmetic device inside the information processing device A 110 with acquisition of the outcome variable (y) being the secure data retained inside the information processing device A 110 and application of the acquired outcome variable (y) remaining intact.
[0499] In this manner, the typical arithmetic processing applied with the secure data, instead of the secure computation, can make a considerable reduction in computational time or computational resources in comparison to performance of the secure computation.
[0500] The information processing device A 110 calculates the sum total (t_0) of the outcome variable (y) in accordance with (Expression 14) or (Expression 15) described above to output the calculated value to the information processing device B 120. The sum total (t_0) of the outcome variable (y) itself is not the secure data, and thus can be output outward.
[0501] (Step S305)
[0502] Next, at step S305, the information processing device B 120 being the explanatory variable (x) retaining device performs the iterative computation of the Newton-Raphson method described earlier with reference to
[0503] (Step S306)
[0504] Next, at step S306, the information processing device B 120 being the explanatory variable (x) retaining device, outputs the logistic regression parameter _i (i=0, 1, . . . , r) calculated at step S305, to the data processing unit of the information processing device B 120.
[0505] The data processing unit of the information processing device B 120 substitutes the logistic regression parameter _i (i=0, 1, . . . , r) into the logistic regression model, namely, (Expression 1) described earlier, to perform the processing of estimating the outcome variable (y) from various values of the explanatory variable (x).
[0506] Note that, the information processing device B 120 being the explanatory variable (x) retaining device that has performed the calculation of the logistic regression parameter _i (i=0, 1, . . . , r) provides the calculated parameter to the information processing device A 110 in response to a request from the information processing device A 110 being the outcome-variable (y) retaining device. The logistic regression parameter _i (i=0, 1, . . . , r) itself is not the secure data, and thus is allowed to be subjected to the input/output processing or the sharing processing between the devices.
[0507] In the processing based on the flow illustrated in
[0508] At step S305 in the flow described with reference to
[0509] However, because the processing at step S305 is performed in the information processing device B being the explanatory-variable retaining device, the secure data (explanatory variable) is not required to be output outward, so that the computation can be performed with the matrices X and V including the explanatory variable remaining intact input at step S101b.
[0510] In addition, the value (d) based on the outcome variable (y) being the secure data is used in (Expression d) illustrated in
[0511] However, the information processing device A being the outcome-variable retaining device generates, at step S304, the computed result with the value (d) based on the outcome variable (y), namely, the arithmetic result of the arithmetic expression 302 illustrated in
[0512] In this manner, the performance of the processing based on the flow illustrated in
[0513] [7. Exemplary Hardware Configuration of Information Processing Device]
[0514] Finally, an exemplary hardware configuration of an information processing device that performs the processing according to the embodiment, will be described with reference to
[0515]
[0516] A central processing unit (CPU) 401 functions as a control unit or a data processing unit that performs various types of processing in accordance with a program stored in a read only memory (ROM) 402 or a storage unit 408. For example, the CPU 401 performs the processing based on the sequence described in the embodiment. A random access memory (RAM) 403 stores, for example, the program to be performed by the CPU 401 and data. The CPU 401, the ROM 402, and the RAM 403 are mutually connected through a bus 404.
[0517] The CPU 401 is connected to an input/output interface 405 through the bus 404, and the input/output interface 405 is connected with an input unit 406 including various switches, a keyboard, a mouse, a microphone, and the like and an output unit 407 including a display, a speaker, and the like. The CPU 401 performs the various types of processing in response to a command input from the input unit 406 to output a processing result to, for example, the output unit 407.
[0518] The storage unit 408 connected to the input/output interface 405 includes, for example, a hard disk and the like, and stores the program to be performed by the CPU 401 and various types of data. A communication unit 409 functions as a transmission/reception unit for data communication through a network, such as the Internet or a local area network, and communicates with an external device.
[0519] A drive 410 connected to the input/output interface 405 drives a removable medium 411 such as a magnetic disk, an optical disc, a magneto-optical disc, or a semiconductor memory, such as a memory card, to perform recording or reading of data.
[0520] [8. Summary of Configuration of Present Disclosure]
[0521] The embodiment of the present disclosure has been described in detail above with reference to the specific embodiment. However, it is obvious that a person skilled in the art may make alterations or replacements to the embodiment without departing from the scope of the spirit of the present disclosure. That is, the present invention has been disclosed in an exemplified mode, and thus the present invention should not be interpreted in a limited way. The scope of the claims should be considered in order to judge the spirit of the present disclosure.
[0522] Note that, the technology disclosed in the present specification can have the following configurations.
[0523] (1) An information processing device including: a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample
[0524] in which the data processing unit calculates an inner product (t_s) of the first variable and the second variable with application of secure computation being computation processing applied with converted data of each of the variables, and
[0525] performs computation processing excluding the calculation processing of the inner product, as computation processing without the converted data, to calculate the logistic regression parameter.
[0526] (2) The information processing device described in (1), in which the data processing unit calculates the logistic regression parameter in accordance with a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
[0527] (3) The information processing device described in (1), in which the first variable is an explanatory variable, and
[0528] the second variable is an outcome variable.
[0529] (4) The information processing device described in (3), in which the data processing unit performs the calculation processing of the inner product (t_s) of the explanatory variable and the outcome variable with the secure computation applied with segmented data of the explanatory variable and segmented data of the outcome variable.
[0530] (5) The information processing device described in (3) or (4), in which the information processing device is a retaining device of the explanatory variable, and
[0531] the data processing unit performs the computation processing excluding the calculation processing of the inner product, applied with the explanatory variable, as computation processing applied with the explanatory variable remaining intact, without the application of the secure computation, in the calculation processing of the logistic regression parameter based on a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
[0532] (6) The information processing device described in any of (3) to (5), in which the information processing device is a retaining device of the explanatory variable, and
[0533] the data processing unit receives a computed result applied with the outcome variable from an outcome-variable retaining device, and calculates the logistic regression parameter with the computed result applied with the received outcome variable.
[0534] (7) The information processing device described in (6), in which the computed result applied with the outcome variable is a sum total (t_0) of the outcome variable.
[0535] (8) The information processing device described in any of (3) to (7), in which the information processing device is a retaining device of the explanatory variable, and
[0536] the data processing unit outputs the logistic regression parameter calculated to an outcome-variable retaining device.
[0537] (9) An information processing system including:
[0538] an explanatory-variable retaining device retaining an explanatory variable being secure data associated with each sample; and
[0539] an outcome-variable retaining device retaining an outcome variable being secure data associated with each sample
[0540] in which the outcome-variable retaining device calculates and outputs a sum total (t_0) of the outcome variable associated with each sample to the explanatory-variable retaining device
[0541] the explanatory-variable retaining device includes a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship with the outcome variable, and
[0542] the data processing unit calculates an inner product (t_s) of the explanatory variable and the outcome variable, with application of secure computation being computation processing applied with converted data of each of the variables, and
[0543] calculates the logistic regression parameter with application of the inner product (t_s) calculated and the sum total (t_0) of the outcome variable input from the outcome-variable retaining device.
[0544] (10) The information processing system described in (9), in which the data processing unit calculates the logistic regression parameter in accordance with a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
[0545] (11) The information processing system described in (9) or (10), in which the data processing unit performs the calculation processing of the inner product (t_s) of the explanatory variable and the outcome variable, with the secure computation applied with segmented data of the explanatory variable and segmented data of the outcome variable.
[0546] (12) The information processing system described in any of (9) to (11), in which the data processing unit performs computation processing excluding the calculation processing of the inner product, applied with the explanatory variable, as computation processing applied with the explanatory variable remaining intact, without the application of the secure computation, in the calculation processing of the logistic regression parameter based on a maximum likelihood method with a Newton-Raphson method (iterative convergence method).
[0547] (13) The information processing system described in any of (9) to (12), in which the explanatory-variable retaining device outputs the logistic regression parameter calculated to the outcome-variable retaining device.
[0548] (14) An information processing method to be performed in an information processing device including
[0549] a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample, the information processing method including:
[0550] calculating, by the data processing unit, an inner product (t_s) of the first variable and the second variable with application of secure computation being computation processing applied with converted data of each of the variables; and
[0551] calculating the logistic regression parameter with performance of computation processing excluding the calculation processing of the inner product, as computation processing without the converted data.
[0552] (15) An information processing method to be performed in an information processing system including:
[0553] an explanatory-variable retaining device retaining an explanatory variable being secure data associated with each sample; and
[0554] an outcome-variable retaining device retaining an outcome variable being secure data associated with each sample, the information processing method including:
[0555] calculating and outputting, by the outcome-variable retaining device, a sum total (t_0) of the outcome variable associated with each sample to the explanatory-variable retaining device; and
[0556] by a data processing unit included in the explanatory-variable retaining device, configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship with the outcome variable,
[0557] calculating an inner product (t_s) of the explanatory variable and the outcome variable with application of secure computation being computation processing applied with converted data of each of the variables, and
[0558] calculating the logistic regression parameter with application of the inner product (t_s) calculated and the sum total (t_0) of the outcome variable input from the outcome-variable retaining device.
[0559] (16) A program for causing information processing to be executed in an information processing device including a data processing unit configured to calculate a logistic regression parameter being a parameter of a logistic regression model indicating a relationship between a first variable and a second variable being two different types of secure data associated with each sample, the program causing the data processing unit to execute:
[0560] processing of calculating an inner product (t_s) of a first variable and a second variable with application of secure computation being computation processing applied with converted data of each of the variables; and
[0561] processing of calculating the logistic regression parameter with performance of computation processing excluding the processing of calculating the inner product, as computation processing without the converted data.
[0562] In addition, the set of processing described in the present specification can be performed by hardware, software, or a combined configuration of the two. In a case where the processing is performed by the software, a program including a processing sequence recorded is installed into a memory in a computer built in dedicated hardware or the program is installed into a general-purpose computer capable of performing various types of processing, so that the processing can be performed. For example, the program can be previously recorded in a recording medium. In addition to installation from the recording medium into a computer, the program received through a network, such as a local area network (LAN) or the Internet, can be installed into a built-in recording medium, such as a hard disk.
[0563] Note that, the various types of processing described in the specification may be performed in parallel or individually in response to the throughput of a device that performs the processing or as necessary, in addition to being performed on a time series basis in accordance with the description. In addition, a system in the present specification is a logical aggregate configuration including a plurality of devices, but is not limited to a configuration including the constituent devices in the same housing.
INDUSTRIAL APPLICABILITY
[0564] As described above, according to the configuration of one embodiment of the present disclosure, high-speed and efficient parameter calculation processing of a logistic regression model is achieved.
[0565] Specifically, a logistic regression parameter is calculated, the logistic regression parameter being a parameter of the logistic regression model indicating the relationship between an explanatory variable and an outcome variable being secure data corresponding to each sample. A data processing unit calculates the inner product (t_s) of the explanatory variable and the outcome variable with application of secure computation being computation processing applied with converted data of each of the variables, and performs computation processing excluding the calculation processing of the inner product, as computation processing without the converted data, to calculate the logistic regression parameter in accordance with the maximum likelihood method with the Newton-Raphson method (iterative convergence method).
[0566] According to the present configuration, the high-speed and efficient parameter calculation processing of the logistic regression model is achieved.
REFERENCE SINGS LIST
[0567] 110 Information processing device A [0568] 111 Parameter-calculation execution unit [0569] 112 Inner-product computation unit [0570] 113 Iterative-computation input-value generation unit [0571] 114 Data transmission/reception unit [0572] 120 Information processing device B [0573] 121 Input unit [0574] 122 Inner-product computation unit [0575] 123 Data transmission/reception unit [0576] 124 Iterative computation unit [0577] 125 Output unit [0578] 401 CPU [0579] 402 ROM [0580] 403 RAM [0581] 404 Bus [0582] 405 Input/output interface [0583] 406 Input unit [0584] 407 Output unit [0585] 408 Storage unit [0586] 409 Communication unit [0587] 410 Drive [0588] 411 Removable medium