Systems and methods for optimal privacy-preserving information revelation

11574076 · 2023-02-07

Inventors

Cpc classification

International classification

Abstract

The present system relates a platform for addressing the optimal privacy-accuracy trade-off in the revelation of a user's valuable information to a third party. Specifically, the present system formalizes the privacy-accuracy trade-off in a precise mathematical framework, wherein mathematical formalization captures user's privacy preference with a single parameter. The system possesses a revelation method of user data that is optimal, in the sense of abiding by user's privacy preference while providing the most accurate description to third party subject to the aforementioned privacy preference constraint.

Claims

1. A method of preserving privacy in a data set used to estimate information configured to be used by a third party, the method comprising the steps of: receiving an initial information data set and a user's privacy setting, the user's privacy setting including one or more privacy instructions defining conditions for sharing regarding the initial information data set; using the initial information data set and the user's privacy setting as input, producing an adjusted information data set and a privacy-preserving stochastic map describing a mechanism used to produce the adjusted information data set; and using the adjusted information data set and the privacy-preserving stochastic map as inputs, applying a stochastic inference algorithm to produce an estimate of the initial information data set and an estimation error value; wherein the estimate of the initial information data set is an estimate of the user's likelihood of having a given genetic trait, the first data set is the user's DNA, and the initial information data set is inferred from the user's DNA and produced using a DNA sequencer.

2. The method of claim 1, wherein the adjusted information data set is constrained to meet every requirement included within the user's privacy setting.

3. The method of claim 1, further including the step of extracting an initial information data set from a first data source.

4. The method of claim 3, further including the step of producing extraction noise statistics describing the noise introduced into the initial information data set by the extraction.

5. The method of claim 4, wherein the extraction noise statistics is an additional input used to produce the adjusted information data set and the privacy-preserving stochastic map describing the mechanism used to produce the adjusted information data set.

6. The method of claim 1, further including the step of updating a prior knowledge data set based on information derived from one or more elements of a first data source.

7. The method of claim 6, wherein the updated prior knowledge data set is an additional input used to produce the adjusted information data set and the privacy-preserving stochastic map describing the mechanism used to produce the adjusted information data set.

8. The method of claim 1, wherein the initial information data set is time-variant.

9. The method of claim 1, wherein the initial information data set is time-invariant.

10. The method of claim 1, wherein the initial information data set is directly available.

11. The method of claim 1, wherein the initial information data set is inferred from a first data set.

12. The method of claim 11, wherein the estimate of the initial information data set is an estimate of the user's likelihood of taking a given action.

13. The method of claim 12, wherein the first data set is a web browsing history.

14. The method of claim 13, wherein the user's likelihood of taking a given action is the user's likelihood of purchasing a given product within a given timeframe.

15. The method of claim 1, wherein the estimate of the initial information data set is an estimate of the user's likelihood of being in a given location within a given timeframe and is inferred from a history of a user's locations.

16. The method of claim 1, wherein the step of producing the adjusted information data set and the privacy-preserving stochastic map describing the mechanism used to produce the adjusted information data set includes producing the privacy-preserving stochastic map using an updated prior knowledge data set, the user's privacy setting, and extraction noise statistics as inputs and then applying the privacy-preserving stochastic map to an extracted initial information data set to produce the adjusted information data set.

17. A method of preserving privacy in a data set used to estimate information configured to be used by a third party, the method comprising the steps of: receiving an initial information data set and a user's privacy setting, the user's privacy setting including one or more privacy instructions defining conditions for sharing regarding the initial information data set; using the initial information data set and the user's privacy setting as input, producing an adjusted information data set and a privacy-preserving stochastic map describing a mechanism used to produce the adjusted information data set; and using the adjusted information data set and the privacy-preserving stochastic map as inputs, applying a stochastic inference algorithm to produce an estimate of the initial information data set and an estimation error value; wherein the step of producing the adjusted information data set and the privacy-preserving stochastic map describing the mechanism used to produce the adjusted information data set includes producing the privacy-preserving stochastic map using an updated prior knowledge data set, the user's privacy setting, and extraction noise statistics as inputs and then applying the privacy-preserving stochastic map to an extracted initial information data set to produce the adjusted information data set; and wherein: the updated prior knowledge data set is a probability distribution over a set from which the initial information data set can take its values; the user's privacy setting includes a condition enabling the user to control a statistical distance between the adjusted information data set and the initial information data set; the stochastic inference algorithm minimizes an estimation error, wherein the estimation error is defined by an expected value of a distance between the initial information data set and the estimate of the initial information data set, where the distance is measured with respect to a given loss function; and the privacy-preserving stochastic map minimizes the stochastic inference algorithm's estimation error subject to the user's privacy setting.

18. The method of claim 17, wherein: the statistical distance is defined as a convex functional of the conditional distribution between the adjusted information data set and the initial information data set; a loss function defining an induced expected distance between the initial information data set and the estimate of the initial information data set is a concave functional of the conditional distribution between the adjusted information data set and the initial information data set; and the privacy-preserving stochastic map is computed by a two-step procedure comprising: a polyhedral set of all possible maps is partitioned into sub-polyhedral regions; and a convex maximization algorithm is carried out over each convex set defined by an intersection of every sub-polyhedral region with a set of maps that satisfy user's privacy setting.

19. A system for preserving privacy in a data set used to estimate information configured to be used by a third party, the system comprising: an initial information data set; a user's privacy setting including one or more privacy instructions defining conditions for sharing regarding the initial information data set; and a processor including memory storing instructions that, when executed, cause the processor to: produce an adjusted information data set and a privacy-preserving stochastic map describing a mechanism used to produce the adjusted information data set using the initial information data set and the user's privacy setting as inputs; and apply a stochastic inference algorithm to produce an estimate of the initial information data set and an estimation error value using the adjusted information data set and the privacy-preserving stochastic map as inputs; wherein the estimate of the initial information data set is an estimate of the user's likelihood of having a given genetic trait, the first data set is the user's DNA, and the initial information data set is inferred from the user's DNA and produced using a DNA sequencer.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a block diagram of the system that facilitates optimal privacy-preserving data revelation form one party to another party.

(2) FIG. 2 is a block diagram outlining the operation of the privacy-adjusted valuable information computing unit, given in FIG. 1.

(3) FIG. 3 is a block diagram of the statistical model that encapsulates the relevant notions to formalize privacy and accuracy.

(4) FIG. 4 depicts a high-level block diagram for the implementation of a method to choose the optimal design variable in the model outlined in FIG. 3.

(5) FIG. 5 shows the flowchart of a method to compute the optimal privacy-preserving noisy channel for the provided system parameters.

(6) FIG. 6 is a block diagram of the pre-processing method used in the method to compute the optimal privacy-preserving noisy channel, given in FIG. 5.

(7) FIG. 7 is a block diagram of the purging partitioned polyhedron method used in the method to compute the optimal privacy-preserving noisy channel, given in FIG. 5.

DETAILED DESCRIPTION OF THE DRAWINGS

(8) FIG. 1 is a block diagram illustrating an example of the present system that reveals a privacy-adjusted version of a valuable user information to a third party. The revelation generated by the system achieves the optimal privacy-accuracy trade-off subject to user's privacy preference. The overall operation of the system is accomplished in three main steps. Valuable information extractor 102, Privacy-adjusted valuable information computing unit 107, and valuable information inference unit 110. Next, we explain these steps.

(9) The meta data source 101 contains the valuable information of the user. Valuable information extractor 102 takes this as input and processes it to produce extracted valuable information 103, along with statistics of the noise introduced in this process as extraction noise statistics 104 and updated prior knowledge 105.

(10) In some examples of the system, meta data source 101 is the readily available valuable information itself, hence unit 102 does not perform any processing, thus 104 is identity mapping, i.e., there is no noise in the extraction process, and 105 is the initial prior knowledge, i.e., there is no update on the prior knowledge of valuable information. Examples of such embodiments include susceptibility to a certain disease due to family history.

(11) In some embodiments of the system, the valuable information is not readily available in 101, hence 102 performs a process to distill the valuable information from the meta data source 101. For example, if the valuable information is the susceptibility to a certain disease due to inherited genetic mutations in certain genes, then 102 needs to perform a measurement operation that is prone to imperfections, i.e., noise. Thus, in such embodiments extraction noise statistics 104 is not identity, but the estimated statistics of the introduced noise, whereas 105 is still the initial prior knowledge, since the sought-after valuable information here is time-invariant, by definition.

(12) In some embodiments of the system, the valuable information is neither readily available in 101 nor time-invariant. For example, user's likelihood to purchase a merchandise varies in time and is not directly observable. As such, for the embodiments of the system for which valuable information is the aforementioned likelihood, this information should be inferred from a relevant meta data source, such as user's web history, by using a certain fraction of this meta data source, such as past web activity, as a training set. Therefore, in such examples, system outputs both statistics of the noise introduced in this process, i.e., 104, as well as updated prior knowledge, i.e., 105, gained through this training process.

(13) In light of the above examples, it will be clear to those skilled in the art that the operation of 102, and hence its outputs, are directly dictated by the type of the valuable information, and hence the meta data source 101 including it. For example, if the valuable information is whether user has genetic mutations in certain genes, which, in turn, increases her susceptibility to certain diseases, then 102 might be a DNA sequencing system, along with its data processing pipeline. Extracted valuable information 103 is a noisy observation of whether the user has genetic mutations. Hence, outputs of the system are this noisy observation, along with the statistics of the noise introduced by the aforementioned measurement process. Note that prior knowledge on whether user has these genetic mutations is agnostic to the aforementioned extraction process.

(14) As another example, if the valuable information is user's likelihood to purchase a merchandise, then 102 might be a data mining procedure implemented on a computer or a mobile device on which meta data source is generated and/or stored. By using a certain amount of user's past web activity, this procedure might be trained to obtain a potentially noisy observation of the valuable information, as well as to estimate the parameters of an underlying dynamical model that governs the valuable information which is not directly observable, and the statistics of the observation noise 104. Due to the dynamic nature of this procedure, system also produces an updated prior knowledge 105 about the valuable information as one of its outputs.

(15) There are three relevant valuable information types: (i) directly available, (ii) not-directly available and time-invariant, and (iii) not-directly available and time-varying. Note that if the valuable information is directly available, whether it's time-invariant is irrelevant. (i) For the first case, the only non-trivial output of 102 is 103, since there is no noise introduced and prior knowledge need not be updated. (ii) For the second case, the system introduces observation noise, so 103 and 104 are the non-trivial outputs of 102. Time-invariant nature of the valuable information ensures prior knowledge does not need to be updated. (iii) Finally, for the third case, 103, 104, and updated prior knowledge 105 are all non-trivial outputs of 102.

(16) Privacy-adjusted valuable information computing unit 107 performs the core operation of the system by taking extracted valuable information 103, extraction noise statistics 104, updated prior knowledge 105, and user's privacy preference 106 as inputs and creating privacy-adjusted valuable information 108, along with the optimal privacy-preserving stochastic mapping 109. The optimal privacy-preserving stochastic mapping 109 is also utilized in the system to create 108, as explained below.

(17) FIG. 2 includes a block diagram of the operation performed by the unit 107, which consists of two main components. The first component, i.e., optimal privacy-preserving stochastic mapping computing unit 201 takes updated prior knowledge 105, user's privacy preference 106, and extraction noise statistics 104 as inputs and computes the optimal privacy-preserving stochastic mapping 109. The optimal privacy-preserving stochastic mapping 109 ensures that the overall system possesses the following properties: User's privacy preference 106 can be abided by when a privacy-adjusted version of the extracted valuable information is revealed to the third party, wherein the inference of the valuable information based on the revelation is the most accurate version possible subject to the aforementioned privacy constraint. The second component, i.e., privacy-adjusted valuable information computing unit via stochastic mapping simulation 202, takes optimal privacy-preserving stochastic mapping 109 and extracted valuable information 103 as inputs and computes privacy-adjusted valuable information 108 as its output. The operation of this unit is transforming 103 according to the stochastic mapping computed by 201, i.e., 109, and performed via a pseudo-random number generator that simulates the stochastic mapping specified by 109. The system can result in the privacy-adjusted valuable information 108, which ensures that user maintains the level of privacy she specified, whereas third party, who is the recipient of 108, derives maximum possible benefit from 108 subject to the constraint of satisfying user's privacy preference.

(18) It should be noted that privacy-adjusting operation of 107 is optimal, as reflected by the optimality of 109, in the sense that it creates a version of extracted valuable information 103 such that it abides by the privacy preference of the user and the inference of the valuable information based on the revelation is the most accurate possible subject to the constraint that privacy preference of the user should be satisfied. The notions of privacy and accuracy mentioned above are formalized in terms of precise mathematical definitions disclosed herein.

(19) Finally, in FIG. 1, the last operation of the system is performed by the valuable information inference unit 110 that takes updated prior knowledge 105, extraction noise statistics 104, privacy-adjusted valuable information 108, and optimal privacy-preserving stochastic mapping 109 and outputs third party's estimate of the valuable information 111, along with the estimation error of this procedure 112. The statistical inference, i.e., estimation, procedure performed here will be explained in detail in conjunction with the detailed explanation of the computation of optimal privacy-preserving stochastic mapping 109 in the following sections, since the accuracy of 111, or equivalently its estimation error 112, is central to the notion of optimal privacy-accuracy trade-off mentioned above.

(20) It should be noted that in some examples of the system, third party may want to perform the statistical inference step by herself. In those scenarios, the output of the system would be 104, 105, 108, and 109, so that the operation of valuable information inference unit 110 can be performed by third party herself.

(21) Description of Optimal Privacy-preserving Stochastic Mapping Computing Unit. The optimal privacy-preserving stochastic mapping 109, which is the output of optimal privacy-preserving stochastic mapping computing unit 201, is used to create the privacy-adjusted revelation of the extracted valuable information of the user. The optimality is in the sense of achieving the optimal trade-off between user's privacy and third party's utility, i.e., accuracy. In order to formalize the trade-off between privacy and accuracy, one needs to a have precise way of quantifying both privacy and accuracy. Quantification of these notions, however, necessitates a rigorous mathematical framework. To this end, the system includes a statistical framework to formalize privacy and accuracy.

(22) Statistical Model. The optimal privacy-preserving stochastic mapping 109 depends on both the statistics of the noise introduced 104 in the extraction of the valuable information and the estimation procedure employed in the valuable information inference unit 110. Therefore, any attempt to formalize the optimal privacy-accuracy trade-off should account for this interplay. In order to capture this interplay in the system depicted in FIG. 1, we introduce the stochastic model that is depicted in FIG. 3. This model not only generalizes the relevant parts, as far as the sought-after notions of optimality of this section are concerned, in FIG. 1, but also places them into a precise mathematical framework.

(23) User data 301, which will be denoted by V henceforth, is assumed to be distributed according to a probability mass function (p.m.f.) P.sub.V, which will be called prior distribution of user data henceforth. It is assumed that V takes values in the set custom character :={1, . . . , K.sub.V}, where K.sub.V ∈{2, 3, . . . }. A noisy observation of user data 303, which will be denoted by Y henceforth, is available. It is assumed that Y takes values in the set :={1, . . . , K.sub.Y}, where K.sub.Y ∈{2, 3, . . . }. Further, Y is the output of a stochastic mapping P.sub.Y|V: custom character .fwdarw. with V as the input of this mapping. In other words, Y is the output of a noisy observation channel 302 with input V, where the channel noise is induced by the stochastic mapping P.sub.Y|V. Third party's observation regarding user data is 305, which will be denoted by Z henceforth. It is assumed that Z takes values in the set custom character :={1, . . . , K.sub.Z}, where K.sub.Z ∈{2, 3, . . . }. Similar to Y, Z is the output of a stochastic mapping P.sub.Z|Y: .fwdarw. with Y as the input of this mapping. In other words, Z is the output of a privacy-preserving noisy channel 304, which performs the main operation of the model. Third party's estimate of user data 307, which will be denoted by {circumflex over (V)}(Z) henceforth, is generated by the estimator 306, which is a (potentially stochastic) mapping denoted by {circumflex over (V)}: custom character .fwdarw., where is a subset of real numbers that is a superset of , i.e., the set in which V takes values. It should be noted that while it is customary to restrict the range of an estimator to the set in which the random variable that is aimed to be estimated takes values, aforementioned generalization is advantageous in terms of enabling a wider range of loss functions to be handled, as it will be evident in what follows.

(24) User data 301, i.e., V, corresponds to the valuable information of the user, which is contained in her meta data source 101. The valuable information may not be readily available in meta data source, and hence may need to be extracted. The transformation of V to Y by means of P.sub.Y|V captures this extraction process. Specifically, Y, i.e., noisy observation of user data 303, represents the extracted valuable information 103 and P.sub.Y|V, i.e., noisy observation channel 302, corresponds to the extraction noise statistics 104 in Section 2. P.sub.V, which encapsulates the prior knowledge regarding user data, corresponds to updated prior knowledge 105 on the valuable information. P.sub.Z|Y, i.e., privacy-preserving noisy channel 304, represents the statistics of the mapping carried out in privacy-adjusted valuable information computing unit 107. Z, i.e., third party's observation 305, corresponds to the privacy-adjusted valuable information 108. The operation of valuable information inference unit 110 is captured by the estimator 306 in FIG. 3, whereas {circumflex over (V)}(Z) 307 corresponds to third party's estimate of valuable information 111.

(25) FIG. 3 distills the operation of the system by introducing noisy observation channel 302 to represent the operation of valuable information extractor 102, and generalizing the operation of privacy-adjusted valuable information computing unit 107 by allowing the usage of any privacy-preserving noisy channel 304, not necessarily the optimal privacy-preserving stochastic mapping. If the privacy-preserving noisy channel 304 is chosen optimally, which is exactly the operation of optimal privacy-preserving stochastic mapping computing unit 201, then the model reduces to the system. Therefore, characterization of the operation of optimal privacy-preserving stochastic mapping computing unit 201 is equivalent to a system designer's computation of the optimal privacy-preserving noisy channel subject to a privacy-preference provided by the user.

(26) FIG. 3 follows a Bayesian framework. In this framework, V, Y, and Z form a Markov chain in that order, i.e., V and Z are statistically independent conditioned on Y. In terms of the joint distribution of V, Y, and Z, i.e., P.sub.VYZ, the aforementioned Markov condition is equivalent to the following:
P.sub.VYZ=P.sub.VP.sub.Y|VP.sub.Z|Y (1)

(27) It is assumed that prior distribution of user data 301, i.e., P.sub.V, statistics of the noisy observation channel 302, i.e., P.sub.Y|V, and the statistics of the privacy-preserving noisy channel 304, i.e., P.sub.Z|Y, are known to both system designer and third party.

(28) A brief explanation of the operational meanings of the distributions on the right side of (1) is next. P.sub.V encapsulates the prior knowledge of both system designer and third party regarding user data 301. As such, one cannot expect either of the said parties to be able to control it. P.sub.Y|V models the imperfection(s) of the system's observation of user data, in particular it accounts for the noise in the system's observation process of user data. P.sub.V and P.sub.Y|V can be assumed to be arbitrary but fixed distributions and channels, respectively. Finally, P.sub.Z|Y introduces a certain amount of noise to 303, i.e., Y, according to user's privacy preference to create third party's observation 305, i.e., Z. Recalling the main goal of the model, i.e., to reveal a version of user data to the third party such that the privacy loss user experiences satisfies her privacy preference while third party's observation is as accurate as possible, statistics of this privacy-preserving noisy channel 304, i.e., P.sub.Z|Y, is the main design variable system designer chooses. As such, by optimally choosing P.sub.Z|Y, system designer needs to achieve the following competing goals: (1) user's privacy preference is satisfied; and (2) third party's estimate 307, when it is generated by an estimator 306 that is optimal in a precise mathematical sense, is as accurate as possible.

(29) In the following sections, the notions of optimality in system designer's choice of privacy-preserving noisy channel 304, and in third party's estimate 307, are formalized.

(30) Optimum privacy-preserving noise channel. The characterization of the optimum privacy-preserving noisy channel, which will be denoted by P*.sub.Z|Y henceforth, is accomplished by imposing the privacy of the user as a constraint and optimizing the accuracy of the third party's estimate subject to this constraint. In mathematical terms, P*.sub.Z|Y is a maximizer of the following optimization problem:

(31) $\begin{matrix} \max_{P_{Z .Math. Y} : f_{priv} (P_{V}, P_{Y .Math. V}, P_{Z .Math. Y}) \leq r_{o}} f_{acc} (P_{V}, P_{Y .Math. V}, P_{Z .Math. Y}) & (2) \end{matrix}$
where the maximum is over all noisy channels from custom character to , which will be denoted by (|) henceforth, ƒ.sub.priv(P.sub.V, P.sub.Y|V, P.sub.Z|Y) is a normalized measure of privacy, ƒ.sub.acc(P.sub.V, P.sub.Y|V, P.sub.Z|Y) is a measure of accuracy, and r.sub.o ∈[0,1] is the privacy preference of the user, which captures the privacy loss user can tolerate. Specifically, r.sub.o=1 means user is indifferent about privacy loss, r.sub.o=0 means she does not want any privacy loss at all.

(32) From a practical perspective, arguably the most relevant embodiments of the model depicted in FIG. 3 are the ones for which the privacy-preserving noisy channel 304 is chosen optimally. When such an optimal privacy-preserving noisy channel is used, the observation of third party 305 is also optimal in this sense. As such, optimal 304 plays an important role in practical implementations based on the model depicted in FIG. 3. Specifically, this quantity is captured by the following definition: Z is called a privacy-preserving sufficient statistic of Y for V with respect to (ƒ.sub.priv, ƒ.sub.acc, r.sub.o) if it is the output of a noisy channel P*.sub.Z|Y that attains (2) when Y is the input.

(33) It should be noted that (2) formalizes the privacy-accuracy trade-off for the system depicted in FIG. 1. Specifically, once the measures of privacy and accuracy are selected, one can compute the optimizer in (2) to implement the said system. Further, it should also be noted that the notion of privacy-preserving sufficient statistic corresponds to the privacy-adjusted valuable information 108 of the system depicted in FIG. 1.

(34) It will be evident to those skilled in the art that the choices of privacy and accuracy measures, i.e., ƒ.sub.priv and ƒ.sub.acc, in (2) determine the specific form of the optimization problem. Thus, in order to devise a system that computes P*.sub.Z|Y operationally meaningful choices of accuracy and privacy are required, which will be explained in the following sections.

(35) Statistical measures of accuracy. Typically, performance of a statistical inference procedure in Bayesian framework, in which the model described in FIG. 3 is devised, is measured in terms of its ability to minimize the expected value of a loss function. The choice of loss function depends on particular applications said inference procedure will be used. Specifically, a loss function custom character (.,.) is a non-negative mapping defined on the cartesian product of the sets V and {circumflex over (V)} take their values, i.e., : ×.fwdarw.[0, ∞). Thus, for a given loss function (.,.), accuracy of an estimator {circumflex over (V)}(.) is measured in terms of the following quantity:

(36) $\begin{matrix} 𝔼 [ℓ (V, \hat{V} (Z))] := \underset{v, z}{.Math.} P_{V Z} (v, z) ℓ (v, \hat{V} (z)) & (3) \end{matrix}$
where custom character [.] denotes the expectation operator and P.sub.VZ denotes the joint distribution of V and Z, defined by:

(37) $\begin{matrix} P_{V Z} (v, z) := \underset{y}{.Math.} P_{V} (v) P_{Y .Math. V} (y .Math. v) P_{Z | Y} (z .Math. y) & (4) \end{matrix}$

(38) From a practical point of view, arguably the most important case is when the estimator 306 third party uses to form her estimate 307 is optimal. As such, said estimator will be assumed to be optimal henceforth.

(39) In light of the above discussion on the practical merit of using an optimal estimator in 306, following is an operationally meaningful choice for an accuracy measure, given a loss function:

(40) $\begin{matrix} f_{a c c} (P_{V}, P_{Y .Math. V}, P_{Z .Math. Y}) := c - \min_{\hat{V} : 𝒵 .fwdarw. 𝒱} 𝔼 [ℓ (V, \hat{V} (Z))] & (5) \end{matrix}$
for a constant c whose choice is a part of the definition of the accuracy measure.

(41) The minimum in (5) can depend on the choice of the loss function. Thus, a complete characterization of the accuracy measure defined in (5) necessitates to particularize a loss function. Two widely used loss functions in statistics literature are the zero-one and squared-error loss functions, defined as:

(42) $\begin{matrix} ℓ_{0 - 1} : 𝒱 \times \hat{𝒱} .fwdarw. {0, 1} such that ℓ_{0 - 1} (v, \hat{v}) := {\begin{matrix} 0, & v = \hat{v}, \\ 1, & v \neq \hat{v} \end{matrix} & (6) \\ ℓ_{S E} : 𝒱 \times \hat{𝒱} .fwdarw. [0, \infty) such that ℓ_{S E} (v, \hat{v}) := {(v - \hat{v})}^{2} & (7) \end{matrix}$
respectively. The next step in the complete characterization of the accuracy measure in (5) with loss function choices of either (6) or (7) is the characterization of the corresponding minima on the right side of (5) with these choices, which is explained next.

(43) First, it is well known that for the loss function in (6), the optimal estimator is the maximum a posteriori probability (MAP) estimator, defined as:

(44) $\begin{matrix} {\hat{V}}_{M A P} (z) = \arg \max_{v \in 𝒱} P_{V | Z} (v | z) & (8) \end{matrix}$
where P.sub.V|Z, i.e., conditional distribution of V given Z, is defined according to Bayes' rule:

(45) $\begin{matrix} P_{V .Math. Z} (v | z) := \frac{P_{V} (v) P_{Z .Math. V} (z .Math. v)}{P_{Z} (z)} & (9) \end{matrix}$
and the ties in (8) are broken uniformly at random, i.e., if the maximizer in (8) is unique, then it is declared as the outcome of the estimator; else, one of the maximizers is chosen uniformly at random and declared as the outcome of the estimator. This randomization in the presence of multiple maximizers is fairly standard in MAP estimation theory, since any optimizer of (8) is exactly as good as any other optimizer in terms of minimizing the cost function in (6). It will be evident to those skilled in the art that the average correct decision probability of the MAP rule, which will be denoted by custom character (P.sub.V, P.sub.Y|V, P.sub.Z|Y) henceforth, can be written as follows:

(46) $\begin{matrix} Γ_{M A P} (P_{V}, P_{Y | V}, P_{Z | Y}) := \underset{v, z}{.Math.} P_{V} (v) P_{Z | V} (z | v) 1 {v = {\hat{V}}_{M A P} (z)} & (10) \\ = \underset{z}{.Math.} \max_{v \in v} {P_{V} (v) P_{Z | V} (z | v)} & (11) \end{matrix}$
where 1{.Math.} is the standard indicator function. Equation (11), along with the choice of c=1 in (5), gives the following accuracy measure:
ƒ.sub.acc,MAP(P.sub.V,P.sub.Y|V,P.sub.Z|Y):= custom character (P.sub.V,P.sub.Y|V,P.sub.Z|Y) (12)

(47) Secondly, it is well known that for the loss function in (7), the optimal estimator is the conditional mean, i.e.,
{circumflex over (V)}.sub.MSE(z):= custom character [V|Z=z] (13)
with the corresponding minimum mean-square error (MMSE):

(48) $\begin{matrix} 𝔼 [Var [V | Z]] := \underset{v, z}{.Math.} P_{V} (v) P_{Z | V} (z | v) {(v - 𝔼 [V | Z = z])}^{2} & (13. a) \end{matrix}$
where Var [V|Z] denotes the conditional variance of V given Z. Hence, picking c=0 in (5) gives the following accuracy measure:
ƒ.sub.acc,MSE(P.sub.V,P.sub.Y|V,P.sub.Z|Y):=− custom character [Var [V|Z]] (14)
Equations (12) and (14) complete the characterization of (5) for the loss functions given in (6) and (7).

(49) Statistical measures of privacy. In general, it is not a straightforward task to quantify an abstract notion like privacy. Nevertheless, one can gain some insight by recalling the extremes of privacy preferences in the framework where r.sub.o=1 and r.sub.o=0. Specifically, r.sub.o=1, which means that user is indifferent about privacy loss, implies that user allows the possibility of the following scenario: Y and Z convey equal amount of information about V. Conversely, r.sub.o=0, which means that user does not want any privacy loss, necessitates the following: V and Z are statistically independent, because otherwise one cannot guarantee that knowledge of Z provides no additional information compared with the prior knowledge about V. In light of these observations, decreasing the information content of Z about V compared to the information content of Y about V is a way of reducing the privacy loss user experiences. Alternatively, the amount of statistical dependence of V and Z compared to the amount of statistical dependence of V and Y can be an indicator of the amount of privacy loss user experiences. Thus, we have the following two broad avenues to formalize a statistical measure of privacy: (1) information Z conveys about V compared to information Y conveys about V; and (2) the amount of statistical dependence between V and Z compared to the amount of statistical dependence between V and Y.

(50) In order to proceed with the first alternative above, one needs to choose a measure of information one random variable conveys about another random variable, i.e., information measure. Among many information measures in the literature, Rényi information is noteworthy thanks to its generality and operational significance in various practical settings. Specifically, for any λ>1 and pair of random variables (V, Y) with joint distribution P.sub.VY, Rényi information of order λ between V and Y, denoted by I.sub.λ(V;Y), is defined as follows:

(51) 0 $\begin{matrix} I_{λ} (V; Y) : = \frac{λ}{λ - 1} \log \underset{y}{.Math.} {(\underset{v}{.Math.} P_{V} (v) P_{Y | V}^{λ} (y | v))}^{\frac{1}{λ}} & (15) \end{matrix}$
It is well known that I.sub.λ(V;Y) is increasing in λ, and also satisfies:

(52) $\begin{matrix} \lim_{λ .fwdarw. 1} I_{λ} (V; Y) = I (V; Y) & (16) \end{matrix}$
where the right side is the well-known mutual information, which is arguably the most relevant information measure from a practical perspective, defined as:

(53) $\begin{matrix} (V; Y) := \underset{v, y}{.Math.} P_{VY} (v, y) \log \frac{P_{VY} (v, y)}{P_{V} (v) P_{Y} (y)} & (17) \end{matrix}$
with P.sub.Y denoting the marginal distribution of Y. Thus, (16) ensures that Rényi information recovers mutual information as a limiting case. Further, Rényi information also satisfies the data processing inequality, i.e.,
I.sub.λ(V;Z)≤I.sub.λ(V;Y) (18)
which, in turn, aides us to put forward the following measure of privacy:

(54) $\begin{matrix} f_{priv, \inf} (P_{V}, P_{Y | V}, P_{Z | Y}) := \frac{I_{λ} (V; Z)}{I_{λ} (V; Y)} ϵ [0, 1] & (19) \end{matrix}$
where the last assertion follows from (18) and the non-negativity of (15).

(55) A classical way of measuring statistical dependence is to quantify the “distance” between joint distributions of random variables to the product of their marginal distributions, which would have been the joint distribution if they were independent. Divergences provide such a tool that also has an operational meaning in terms of various statistical inference methods. A broad family of divergences is the ƒ-divergence. In particular, for a given convex function ƒ such that ƒ(0)=1, ƒ-divergence between P.sub.VY and P.sub.VP.sub.Y is defined as follows:

(56) $\begin{matrix} D_{f} (P_{VY} || P_{V} P_{Y}) : = \underset{v, y}{.Math.} P_{V} (v) P_{Y} (y) f (\frac{P_{VY} (v, y)}{P_{V} (v) P_{Y} (y)}) & (20) \end{matrix}$
Various well-known divergences, such as relative entropy, chi-squared divergence, Hellinger divergence, Rényi divergence, and so on, can be shown to be an ƒ-divergence for some convex function ƒ. Further,
D.sub.f(P.sub.VY∥P.sub.VP.sub.Y)≥0 (21)
with equality if and only if V and Y are independent, and it also satisfies the data processing inequality, i.e.,
D.sub.f(P.sub.VY∥P.sub.VP.sub.Y)≤D.sub.f(P.sub.VY∥P.sub.VP.sub.Y) (22)
which, in turn, aides us to put forward the following measure of privacy:

(57) $\begin{matrix} f_{priv, div} (P_{V}, P_{Y | V}, P_{Z | Y}) := \frac{D_{f} (P_{VZ} || P_{V} P_{Z})}{D_{f} (P_{VY} || P_{V} P_{Y})} ϵ [0, 1] & (23) \end{matrix}$
where the last assertion follows from (21) and (22).

(58) Analysis. The choices for measuring the accuracy and privacy give a special structure to the constrained optimization problem in (2), which, in turn, employed to devise an efficient method to compute the optimum value and an optimizer in (2).

(59) Theorem 1. Particularization of (2) with:
ƒ.sub.acc←ƒ.sub.acc,acc-meas (35)
ƒ.sub.priv←ƒ.sub.priv,priv-meas (35)
where priv-meas ∈{inf, div} and acc-meas ∈{MAP, MSE} is a convex maximization problem over a convex set.
Proof. We prove that with the aforementioned choices, the feasible set as well as the cost function is convex and begin with the feasible set, i.e., privacy measure. For notational convenience, let:

(60) $\begin{matrix} S (P_{V}, P_{Y | V}, λ, r_{o}) : = {P_{Z | Y} \in 𝒫 (𝒵 | 𝒴) : \frac{I_{λ} (V; Z)}{I_{λ} (V; Y)} \leq r_{o}} & (24) \\ S (P_{V}, P_{Y | V}, f, r_{o}) : = {P_{Z | Y} \in 𝒫 (𝒵 | 𝒴) : \frac{D_{f} (P_{VZ} || P_{V} P_{Z})}{D_{f} (P_{VY} || P_{V} P_{Y})} \leq r_{o}} & (25) \end{matrix}$
where λ>1, ƒ is a convex function with ƒ(0)=1, and r.sup.o ∈[0, 1].
Lemma 1. S(P.sub.V, P.sub.Y|V, λ, r.sub.o) is a convex set, i.e.,
αP.sub.Z|Y+(1−α)Q.sub.Z|Y∈S(P.sub.V,P.sub.Y|Vλ,r.sub.o) (26)
for any P.sub.Z|Y,Q.sub.Z|Y ∈S(P.sub.V,P.sub.Y|V,λ,r.sub.o) and α∈(0,1).
Proof. First, the condition that P.sub.Z|Y ∈ custom character (|) induces a set of linear equality constraints and the Rényi information ratio term induces a non-linear inequality constraint. In order to represent the non-linear constraint in a more convenient way, define the following function for a given P.sub.A, P.sub.B|A pair and λ>1:

(61) $\begin{matrix} ℱ_{o} (P_{A}, P_{B | A}, λ) := \underset{b}{.Math.} {(\underset{a}{.Math.} P_{A} (a) P_{B | A}^{λ} (b | a))}^{\frac{1}{λ}} & (27) \end{matrix}$
Equations (15) and (27) imply that:

(62) $\begin{matrix} I_{λ} (V; Y) = \frac{λ}{λ - 1} \log ℱ_{o} (P_{A}, P_{B | A}, λ) & (28) \end{matrix}$
which, along with the monotonicity of log(.Math.), implies that:

(63) $\begin{matrix} [\frac{I_{λ} (V; Z)}{I_{λ} (V; Y)} \leq r_{o}] \Leftrightarrow [ℱ_{o} (P_{V}, P_{Y | V}, P_{Z | Y}, λ) \leq \exp (r_{o} \log ℱ_{o} (P_{V}, P_{Y | V}, λ))] & (29) \end{matrix}$
Hence, in order to conclude the proof, it suffices to prove that custom character (P.sub.V, P.sub.Y|V, P.sub.Z|Y, λ) is a convex function of P.sub.Z|Y. Note that if we define the conditional distribution of Z given V, i.e., P.sub.Z|V, as:

(64) 0 $\begin{matrix} P_{Z | V} (z | v) : = \underset{y}{.Math.} P_{Y | V} (y | v) P_{Z | Y} (z | y) & (30) \end{matrix}$
then, it is easy for those skilled in the art to verify that as a function of P.sub.Z|V,
custom character (P.sub.V,P.sub.Z|V,λ)=(P.sub.V,P.sub.Y|VP.sub.Z|Y,λ) (31)
is convex. Further, if one views P.sub.Y|V and P.sub.Z|Y as stochastic matrices, i.e., each row containing P.sub.Y|V(.Math.|v) and P.sub.Z|Y(.Math.|y), which are probability distributions on and , respectively, then (30) can be written as a standard matrix multiplication, i.e.,
P.sub.Z|V=P.sub.Y|VP.sub.Z|Y (32)
One can conclude the proof by capitalizing on the linearity of the right side of (32), along with the aforementioned convexity of custom character (P.sub.V, . . . , λ). QED
Lemma 2. S(P.sub.V, P.sub.Y|V, ƒ, r.sub.o) is a convex set, i.e.,
αP.sub.Z|Y+(1−α)Q.sub.Z|Y ∈S(P.sub.V,P.sub.Y|V,ƒ,r.sub.o) (33)
for any P.sub.Z|Y, Q.sub.Z|Y ∈S(P.sub.V, P.sub.Y|V, ƒ, r.sub.o) and α∈(0,1).
Proof. The proof follows via arguments similar to Lemma 1, by replacing the convexity of the function defined in (27) with the convexity of D.sub.f(P.sub.VZ∥P.sub.VP.sub.Z) as a function of P.sub.Z|Y, which is an easy consequence of the facts that D.sub.f(.Math.∥.Math.) is jointly convex in its arguments, and the linearity of the operations in (32) and the marginalization of a joint distribution. QED
Lemma 3. ƒ.sub.acc,MAP (P.sub.V, P.sub.Y|V, .Math.) is a convex function over noisy channels from custom character to .
Proof. One can write Γ.sub.MAP(P.sub.V, P.sub.Y|V, P.sub.Z|Y) as follows:

(65) $\begin{matrix} Γ_{MAP} (P_{V}, P_{Y | V}, P_{Z | Y}) = \underset{Z}{.Math.} \max_{v \in V} {P_{V} (v) \underset{y}{.Math.} P_{Y | V} (y | v) P_{Z | Y} (z | y)} & (34) \end{matrix}$
In light of (34), one can conclude the proof by recalling the facts that the sum of convex functions is also a convex function and the maximum of the sum of two functions is smaller than the sum of the individual maxima of the functions. QED
Lemma 4. ƒ.sub.acc,MSE(P.sub.V, P.sub.Y|V, .Math.) is a convex function over noisy channels from custom character to .
Proof. It is easy to show that −[Var [V|Z]] is convex as a function of the joint distribution P.sub.VZ. This, along with the linearity of the operation in (32), allows us to conclude the proof. QED Lemmas 1-4 imply Theorem 1. QED

(66) Equipped with Theorem 1, a high-level description of a system to compute the statistics of an optimal privacy-preserving noisy channel is given in FIG. 4. Specifically, the system, i.e., convex maximizer 404, takes three inputs: prior distribution of user data 401, statistic of the noisy observation channel 402, and user's privacy preference 403. It computes the optimizer in (2) and outputs the statistics of an optimal privacy-preserving noisy channel 405.

(67) Computation of the optimal privacy-preserving noisy channel. All the maximizer(s) of a convex function over a convex set occur(s) at the extreme points of the feasible set. Thus, finding a global maximizer of a convex function over a convex set typically necessitates evaluating the value of the cost function for all extreme points, which, in turn, makes it computationally demanding. Further, when the feasible convex set is not a polyhedron (a polyhedron is a convex set defined by the intersection of a finite number of half-spaces, a bounded polyhedron is called a convex polytope or simply polytope), then its extreme points may form a continuum, i.e., cardinality of the set of extreme points of the feasible set is uncountable infinity, it becomes extremely difficult to devise an algorithm that computes the global maximum exactly infinite time. Thus, it is customary to devise iterative methods that will compute upper and lower bounds on this value that are approaching each other in every iteration. Although one such methodology, in the spirit of the outer approximations in convex maximization literature, can be directly applied to implement the system 404, it is possible to get a significant performance improvement by exploiting the particular properties of the problem in (2) beyond its convexity.

(68) Method for computing optimal privacy preserving noisy channel. FIG. 5 depicts a method for computing an approximation of the optimum value of the optimization problem in (2) and a channel achieving this value. It includes an offline method 501 and an online method 502. The main difference between the offline and online methods is that inputs of the online method include instance specific parameters, such as P.sub.V, i.e., prior distribution of user information, P.sub.Y|V, i.e., statistics of the noisy observation channel, r.sub.o, i.e., user's privacy preference; thus it needs to be computed every time any of these parameters changes. On the other hand, only instance specific inputs of the offline method are the cardinalities of the input and output alphabets of the privacy-preserving noisy channels, which does not change as frequent as the aforementioned inputs of the online method in typical embodiments of the model depicted in FIG. 3. As such, one can use the output of the offline method as long as cardinalities of the input and output alphabets of the privacy-preserving noisy channels stay the same.

(69) Offline method consists of two sub-methods, initial polyhedron generator 504 and the pre-processing method 507. Pre-processing method, which will be explained separately in a following section, has two inputs, depth of partitioning 506 and initial polyhedron 505. Initial polyhedron 505, denoted by S.sub.o, is generated by the method 504 whose only input 503 is the pair (K.sub.Y, K.sub.Z), i.e., cardinalities of the input and output alphabets of the privacy-preserving noisy channels. Output of the offline method is a non-redundant partition of the initial polyhedron 508.

(70) Main operation of the online method is carried out by 510, i.e., purging partitioned polyhedron (P.P.P.) method, which will be explained separately in a following section. P.P.P. method has two inputs, a non-redundant partition of the initial polyhedron 508 and a 5-tuple of parameters 509, which includes P.sub.V, i.e., prior distribution of user information, P.sub.Y|V, i.e., statistics of the noisy observation channel, r.sub.o, i.e., user's privacy preference, acc-meas ∈{MAP, MSE}, i.e., accuracy measure choice, priv-meas ∈{inf, div}, i.e., privacy measure choice. The output of the online method 511, which is also the output of the overall method, consists of an approximation of the optimum value of the optimization problem in (2) and a channel achieving this value.

(71) Unlike the pre-processing method 507 and the P.P.P. method 510, the operation of the initial polyhedron generating method 504 will be evident to those skilled in the art and it is explained for the sake of completeness. To this end, first define the following mapping from a given stochastic matrix of size K.sub.Y×K.sub.Z to an K.sub.YK.sub.Z-length vector:
{right arrow over (x)}(P.sub.Z|Y):=[P.sub.Z|Y(.Math.|1),P.sub.Z|Y(.Math.|2), . . . ,P.sub.Z|Y(.Math.|K.sub.Y)] (37)
where P.sub.Z|Y(.Math.|1) is a probability distribution on custom character , corresponding to the transition probability when the input to the channel is i ∈{1, . . . , K.sub.Y}. Let P.sub.Z|Y({right arrow over (x)}) denote its inverse mapping from a given K.sub.YK.sub.Z-length vector to a matrix of size K.sub.Y×K.sub.Z. The fact that P.sub.Z|Y is a transition probability matrix can be captured by imposing the following linear constraints on {right arrow over (x)}(P.sub.Z|Y):

(72) $\begin{matrix} {\overset{.fwdarw.}{x} (P_{Z | Y})}_{i} \geq 0, \forall i {1, .Math., K_{Y} K_{Z}} & (38) \\ {.Math.}_{j = 1}^{K_{Z}} {\overset{.fwdarw.}{x} (P_{Z | Y})}_{K_{Z} (i - 1) + j} = 1, \forall i {1, .Math., K_{Y}} & (39) \end{matrix}$
where the subscript i denotes the i-th element of a vector. Equations (38) and (39) suffice to characterize the linear constraints of the feasible set. Let A.sub.o denote the matrix that succinctly summarizes (38) and (39). In particular, let:
S.sub.o:={{right arrow over (x)}: A.sub.o{right arrow over (x)}≤{right arrow over (b)}.sub.o} (40)
denote the polyhedron corresponding to (38) and (39). Here, {right arrow over (b)}.sub.o represents the vector that collects the right sides of (38) and (39), inequality means all the elements of the vectors satisfy the inequality and we convert the equality constraints in (39) by the usual methodology exemplified below:
[a=b] custom character [a≥b and −a≥−b] (41)
for any a, b ∈. Finally, note that:
S.sub.o={{right arrow over (x)}(P.sub.Z|Y): P.sub.Z|Y∈(|)} (42)
which is 505 in FIG. 5, is the set of all stochastic mappings from to in vectorized form, in the sense of (37).

(73) Pre-processing method. Pre-processing method 507, which is depicted in FIG. 6, consists of two main steps, polyhedron partitioning routine 601 and redundancy removal routine 603. Method admits two inputs, initial polyhedron 505 and the depth of partitioning 506, which determines how many layers the partitioning operation is applied. The redundancy removal routine 603 is applied to the output of the partitioning operation 601 to deduce the output of the pre-processing method 507.

(74) It is important to note that the redundancy removal routine 603 explained in this section does not depend on P.sub.V, P.sub.Y|V, r.sub.o, acc-meas, priv-meas. As such, it can be computed offline and used for any of the problem instances with the corresponding (K.sub.Y, K.sub.Z) values.

(75) Polyhedron partitioning routine. Consider an arbitrary but fixed polyhedron S⊂ custom character .sup.n. Let {.sub.1, . . . , .sub.N} be the facets, i.e., k=1 dimensional faces, of S, and assume that N>2. Further, let {.sub.1, . . . , .sub.N} denote the V-representation of the facets, i.e., for each i ∈{1, . . . , N}, .sub.i={{right arrow over (v)}.sub.i,1, . . . , {right arrow over (v)}.sub.i,M.sub.i} denotes the vertices of custom character .sub.i, which is a polyhedron by definition. Let {right arrow over (x)}.sub.o be the center of mass of S, which can be found as the average of the vertices of S. To be precise, let ={.sub.o,1, . . . , .sub.o,M.sub.0} be the V-representation of S. Then, the center of mass, i.e., {right arrow over (x)}.sub.o, is defined as:

(76) $\begin{matrix} {\overset{.fwdarw.}{x}}_{o} : = \frac{1}{M_{o}} {.Math.}_{j = 1}^{M_{o}} {\overset{.fwdarw.}{v}}_{o, j} & (59) \end{matrix}$
For every i ∈{1, . . . , N}, define the polyhedron:
S.sub.o,i:=conv( custom character .sub.i∪{{right arrow over (x)}.sub.o}) (60)
i.e., the convex hull of the points {.sub.i,1, . . . .sub.i,M.sub.i, {right arrow over (x)}.sub.o}, i.e., the smallest convex set containing the points {.sub.i,1, . . . , .sub.i,M.sub.i, {right arrow over (x)}.sub.o}. It will be evident to those skilled in the art that the aforementioned points form a V-representation of the said polyhedron. Further, by construction, ∪.sub.i=1.sup.N S.sub.i=S and the polyhedra intersect only through their facets, i.e., their relative interiors are disjoint.

(77) One can use this procedure in the polyhedron partitioning routine 601 as many times specified by the depth of partitioning 506, by forming a tree of polyhedra, where at each step each initial polyhedra is partitioned by the aforementioned procedure to form the nodes of the next step, and the root of the tree is S.sub.o, i.e., 505. The aforementioned steps will result in a partitioning of the initial polyhedron S.sub.o, i.e., 602.

(78) Redundancy removal routine. Consider above partitioning scheme with S←S.sub.o where S.sub.o is given in (42). Based on the descriptions provided herein, it will be evident to those skilled in the art that it is possible to determine an H-representation of the facets of S.sub.o by imposing each inequality constraint as an equality constraint sequentially, since H-representation of S.sub.o is in non-redundant form. Further, the center of mass is simply

(79) ${\overset{.fwdarw.}{x}}_{o} = \frac{1}{K_{Z}} \overset{.fwdarw.}{1} ϵ ℝ^{K_{Y} K_{Z}},$
where {right arrow over (1)} denotes the vector of 1s. Hence, applying the procedure in Section 3.2.1 results in the set of sub-polyhedra:
{S.sub.o,1. . . ,S.sub.o,K.sub.Y.sub.K.sub.Z} (61)
since there are K.sub.YK.sub.Z inequality constraints in H-representation of S.sub.o.

(80) One can apply the same procedure to every S.sub.o,i to further partition the original set. However, as is argued below, some of the polyhedra in (61) might be “column permutations of each other”, a notion whose precise definition is given below, which, in turn, makes them redundant. Thus, filtering out these redundant sub-polyhedra as a starting point of the next level of partitioning significantly improves performance without resulting in a loss of optimality.

(81) Consider two arbitrary elements S.sub.o,i, S.sub.o,j of the set of polyhedra in (61). Define:
custom character .sub.o,i={.sub.o,i,1, . . . ,.sub.o,i,M.sub.i} (62)
.sub.o,j={.sub.o,j,1, . . . ,.sub.o,j,M.sub.j} (63)
denote V-representations of S.sub.o,i and S.sub.o,j, respectively. If: (i) |.sub.o,i|=|.sub.o,j| (ii) For all k ∈{1, . . . , M.sub.i},
P.sub.Z|Y(.sub.o,i,k)=P.sub.Z|Y(.sub.o,i,k)Q.sub.P (64)
for an arbitrary but fixed permutation matrix Q.sub.P ∈ custom character K.sub.Z×K.sub.Z, then S.sub.o,i and S.sub.o,j are equivalent as far as the optimization problem in (2) goes, since for any {right arrow over (x)}∈S.sub.o,i (resp. {right arrow over (y)}∈S.sub.o,j), there exists some {right arrow over (y)}∈S.sub.o,j (resp. {right arrow over (x)}∈S.sub.o,i), with exactly the same cost and constraint function values, whose proof follows from routine calculations that can be carried out by those skilled in the art.

(82) Thus, one can define an equivalence class of sub-polyhedra in terms of the property defined by the conditions in items (i) and (ii) above. In each equivalence class, it is sufficient to keep only one representative for either further partitioning or evaluating the optimum value.

(83) Purging partitioned polyhedral (P.P.P.) method. FIG. 7 depicts the purging partitioned polyhedra (P.P.P.) method. The method uses two main subroutines, namely feasibility check via alternating projections and simple approximation. The list of inputs 701 of the method consists of the non-redundant partition of the initial polyhedron, which is the output of the pre-processing method, as well as the parameters of the particular problem instances, i.e., the 5-tuple. As the first step of its operation 702, the method checks for the feasibility of each polyhedron of the provided non-redundant partition by the alternating projection method and eliminate infeasible polyhedra. Thus, input of the next step 703 consists of the feasible polyhedra and their corresponding indices, denoted by {S.sub.o,i(0)}.sub.i∈ custom character .sub.o and .sub.o, respectively.

(84) Before proceeding further, define the following quantities: For any given polyhedron S⊂S.sub.o such that its intersection with the feasible set, i.e.,
S.sub.priv-meas(P.sub.V,P.sub.Y|V,r.sub.o):={{right arrow over (x)}(P.sub.Z|Y): P.sub.Z|Y∈ custom character (|),ƒ.sub.priv,priv-meas(P.sub.V,P.sub.Y|V,P.sub.Z|Y)≤r.sub.o} (67)
is not empty, ū(k,S) and u(k,S) denote the outer and inner approximation values of the optimization problem:

(85) $\begin{matrix} \sup_{\overset{.fwdarw.}{x} \in S : f_{priv, priv - meas} (P_{V}, P_{Y | V}, P_{Z | Y} (\overset{.fwdarw.}{x})) \leq r_{o}} f_{acc, acc - meas} (P_{V}, P_{Y | V}, P_{Z | Y} (\overset{.fwdarw.}{x})) & (70) \end{matrix}$
with k maximum iterations computed by the simple approximation routine, which is explained in Section 3.2.3.2.

(86) Returning back to the description, following steps are repeated n times, where custom character .sub.m−1 denotes the indices of the sub-polyhedra at the beginning of step m, m=1, i.e., 704, is the initial value for the iteration counter, {k.sub.1, . . . k.sub.n}, i.e., 705, determines the max-iter for simple inner approximation for each step, as well as the total number of times the following steps are repeated, i.e., n: 1. For every i∈ custom character .sub.m−1, compute ū(k.sub.m, S.sub.o,i(m−1)), u(k.sub.m, S.sub.o,i(m−1)), and the updated polyhedron S.sub.o,i(m) via simple approximation procedure, i.e., perform the operation 706, and output these quantities, i.e., 707. 2. Eliminate all sub-optimal polyhedra 708 with

(87) $\overline{u} (k_{m}, S_{o, i} (m - 1)) < \max_{j \in 𝒥_{m - 1}} \underline{u} (k_{m}, S_{o, j} (m - 1)),$

(88) since they cannot contain a global maximizer. In other words, compute
custom character .sub.m:={i∈.sub.m−1:ū(k.sub.m,S.sub.o,i(m−1))≥max.sub.j∈.sub.m−1u(k.sub.m,S.sub.o,j(m−1))} (72)
and output the following 4-tuple
.sub.m,{S.sub.o,i(m)}.sub.i∈.sub.m,{ū(k.sub.m,S.sub.o,i(m−1))}.sub.i∈.sub.m,{u(k.sub.m,S.sub.o,i(m−1))}.sub.i∈.sub.m
for the next iteration, i.e., perform the operation 709. 3. Check whether current iteration m exceeds n+1, i.e., perform 710. If not, increment counter by 1, i.e., perform 711.

(89) Finally, once custom character .sub.n, {S.sub.o,i(n)}.sub.i∈.sub.n, {ū(k.sub.n, S.sub.o,i(n−1))}.sub.i∈.sub.n, and {u(k.sub.n, S.sub.o,i(n−1))}.sub.i∈.sub.n are computed 712, next step, i.e., 713, performs the simple approximation routine with maximum iterations K.sub.max−Σ.sub.m=1.sup.n k.sub.m for each S.sub.o,i(n), and outputs the maximum of computed inner approximation along with the channel achieving it, i.e., P*.sub.Z|Y, ƒ.sub.acc-meas(P.sub.V, P.sub.Y|V, P*.sub.Z|Y), as the output of the system 511.

(90) The following points regarding practical implementation of the above procedure should be noted. First, applications of the simple approximation routine in 706 and 713 can be carried out in parallel. Thus, computationally most demanding part of the method can be accomplished in a parallelized fashion. Second, in practice, as the granularity of the initial partition increases, the number of cuts needed to get close enough inner and outer approximations in the simple approximation procedure appears to decrease. Hence, computation time for each sub-polyhedra appears to decrease as the granularity of the partition increases.

(91) Feasibility Check via Alternating Projections. Consider a set of polyhedra {S.sub.o,i}.sub.i=1.sup.N such that S.sub.o,i⊂S.sub.o for all i∈{1, . . . , N}. Recall that the feasible set, i.e.,
S.sub.priv-meas(P.sub.V,P.sub.Y|V,r.sub.o)={{right arrow over (x)}(P.sub.Z|Y):P.sub.Z|Y∈ custom character (|),ƒ.sub.priv,priv-meas(P.sub.V,P.sub.Y|V,P.sub.Z|Y)≤r.sub.o} (67)
is a convex set. Hence, the well-known alternating projection algorithm can be used to decide whether S.sub.o,i and S.sub.priv-meas(P.sub.V, P.sub.Y|V, r.sub.o), are disjoint. Specifically, this algorithm either finds a point in their intersection, if it is not empty, or converges to two points in each set closest to each other, if the intersection is empty.

(92) For completeness, a summary of the alternating projection procedure is given as follows: Start with a point in S.sub.priv-meas(P.sub.V, P.sub.Y|V, r.sub.o), e.g., {right arrow over (x)}.sub.o, i.e., center of mass of S.sub.o. At j-th step (j≥1), compute the following quantities:
{right arrow over (y)}.sub.j: projection of {right arrow over (x)}.sub.j onto S.sub.o,i (68)
{right arrow over (x)}.sub.j+1: projection of {right arrow over (y)}.sub.j onto S.sub.priv-meas(P.sub.V,P.sub.Y|V,r.sub.o) (69)

(93) At each step, if {right arrow over (x)}.sub.j+1={right arrow over (y)}.sub.j terminate and declare that the intersection is not empty, which, in turn, implies that S.sub.o,i is feasible. Continue a maximum number of times (in practice, 5 to 10 iterations appear to suffice) and if {right arrow over (x)}.sub.j+1≠{right arrow over (y)}.sub.j after the final iteration, declare that the sets are disjoint, which is equivalent to saying S.sub.o,i is infeasible.

(94) Simple Approximation Routine. Recall that S.sub.priv-meas(P.sub.V, P.sub.Y|V, r.sub.o is a subset of S.sub.o and the containment is strict unless r.sub.o=1. However, since S.sub.o is a polyhedron, it has a finite number of extreme points (equivalently vertices) and there are known algorithms to characterize them, such as Fourier-Motzkin elimination and linear programming-based methods. Thus, if one can “peel-off” S.sub.o by successive cuts so that the maximum of the cost function over the reduced polyhedron strictly improves, i.e., gets lower, one should progress toward a (local) optimum value of the original problem. Depending on the cutting methodology, it is well known to those skilled in the art that outer approximations in the spirit of the above ideas will lead to a global optimum of the original problem.

(95) One such procedure is explained below where
priv-meas ∈{inf,div} (43)
acc-meas ∈{MAP,MSE} (44) 1. Initialize with S.sub.o and i=0. 2. Repeat the following steps until convergence, where S.sub.i denotes the polyhedron at i-th iteration. 2.1. Compute i-th outer approximation point:

(96) $\begin{matrix} {\overset{.fwdarw.}{x}}_{i}^{*} := \arg \max_{\overset{.fwdarw.}{x} \in S_{i}} f_{a c c, a c c - m e a s} (P_{V}, P_{Y | V}, P_{Z | Y} (\overset{.fwdarw.}{x})) & (45) \end{matrix}$ 2.2. Compute the projection of {right arrow over (x)}*.sub.i onto the feasible set S.sub.priv-meas(P.sub.V, P.sub.Y|V, r.sub.o), defined in (67), i.e.,

(97) $\begin{matrix} {\overset{.fwdarw.}{y}}_{i} := \arg \max_{\overset{.fwdarw.}{x} \in S_{priv - meas} (P_{V}, P_{Y | V}, r_{o})} \frac{1}{2} {.Math. {\overset{.fwdarw.}{x}}_{i}^{*} - \overset{.fwdarw.}{x} .Math.}_{2}^{2} & (47) \end{matrix}$
where ∥.∥.sub.2 denotes the Euclidean norm. 2.3. Compute the supporting hyperplane of S.sub.priv-meas(P.sub.V, P.sub.Y|V, r.sub.o) at {right arrow over (y)}.sub.i and update A.sub.i by adding the equation of this hyperplane, i.e., cut S.sub.i with the aforementioned supporting hyperplane, to get A.sub.i+1 and {right arrow over (b)}.sub.i+1 to deduce:
S.sub.i+1={{right arrow over (x)}: A.sub.i+1{right arrow over (x)}≤{right arrow over (b)}.sub.i+1} (48)

(98) In what follows, we list some remarks regarding the implementation of the aforementioned routine.

(99) Computing the maximizer of the relaxed problem. Recall the problem:

(100) $\begin{matrix} \max_{\overset{.fwdarw.}{x} ϵ S_{i}} f_{a c c, a c c - m e a} (P_{V}, P_{Y | V'} P_{Z | Y} (\overset{.fwdarw.}{x})) & (49) \end{matrix}$
where S.sub.i={{right arrow over (x)}: A.sub.i{right arrow over (x)}≤{right arrow over (b)}.sub.i}, i.e., the polyhedron that includes the feasible set. As we have noted above, any maximizer of this problem is a vertex of S.sub.i and one needs to evaluate the cost function over all these vertices to find the global maximizer. Computing the cost function is not computationally demanding. Although there are well-known numerical routines to compute the vertices of a given polyhedron, such as the double description method based on Fourier-Motzkin elimination and methods based on linear programming, it is computationally more demanding compared with the remaining parts, especially as the number of iterations, i.e., number of cuts, grows beyond a moderate number that depends on the dimensionality of the channels, i.e., K.sub.V, K.sub.Y, and K.sub.Z. As such, efficient implementation necessitates limiting the number of cuts.

(101) Computing the hyperplane cuts. An implementation of the cutting-plane procedure is the following: Given {right arrow over (y)}.sub.i on the boundary of the feasible set, a supporting hyperplane can be found by evaluating the gradient of ƒ.sub.priv,priv-meas(P.sub.V, P.sub.Y|V, .Math.) at this point, which will be the normal vector of the supporting hyperplane. In particular, the half-space defined by this supporting hyperplane is given by:
custom character .sub.i:={{right arrow over (x)}:∇ƒ.sub.priv,priv-meas(P.sub.V,P.sub.Y|V,P.sub.Z|Y({right arrow over (y)}.sub.i)),{right arrow over (y)}.sub.i−{right arrow over (x)}≥0} (50)
where ∇ and .,. denote the gradient and inner product operators, respectively. One can augment the inequality constraint in (50) into A.sub.i to get the H-representation of the updated polytope S.sub.i+1 for (i+1)-th step.

(102) Thresholds for termination. The routine outlined above can be shown to converge to a global maximizer as the step size grows unboundedly. Yet, for practical applications, one needs to impose some termination conditions. To this end, define the following quantities:

(103) 0 $\begin{matrix} 𝒞_{o u t e r} (i) := f_{a c c, acc - m e a s} (P_{V}, P_{Y | V} P_{Z | Y} ({\overset{.fwdarw.}{x}}_{i^{*}})) & (53) \\ 𝒞_{i n n e r} (i) := \max_{j .Math. {0, .Math., i}} f_{a c c, acc - m e a} (P_{V}, P_{Y | V} P_{Z | Y} ({\overset{.fwdarw.}{y}}_{j})) & (54) \end{matrix}$
where custom character .sub.outer(i) (resp. .sub.inner(i) are the outer (resp. inner) approximation values after i iterations, and υ.sub.outer(0) and .sub.inner(0) are the initial values of these quantities, provided as part of the initial conditions of the routine.
Let δ.sub.outer-app-improv, δ.sub.outer-inner-app and δ.sub.const-qual denote the tolerance on the outer approximation improvement, inner-outer approximation distance, and distance to the feasible region, respectively. Also, let max-iter denote the maximum number of iterations allowed.

(104) The iteration is terminated if any of the following four conditions is satisfied:
| custom character .sub.outer(i+1)−.sub.outer(i)|≤δ.sub.outer-app-improv (55)
|.sub.outer(i)−.sub.inner(i)|≤δ.sub.outer-inner-app (56)
|ƒ.sub.priv,priv-meas(P.sub.V,P.sub.Y|V,P.sub.Z|Y ({right arrow over (x)}*.sub.i))−r.sub.o|≤δ.sub.const-qual (57)
i≤max-iter (58)

(105) With these termination conditions, we conclude the remarks regarding the practical implementation of the simple approximation routine.

(106) In conclusion, we would like to reiterate the differences the present system offers compared to conventional systems and give a detailed comparison with the existing work in the literature that are arguably the most relevant to the present system.

(107) The present system offers the following differences as compared to conventional systems. First, in conventional systems, accuracy is based on closeness of Z to Y, measured either in terms of an average distortion measure or an information measure, rather than the accuracy of best statistical inference algorithm's performance of inferring V based on the observation Z as in our case. Second, conventional systems are “dual” to our formulation, i.e., accuracy is guaranteed, privacy is optimized. In our case, it is the other way around. Third, implementation of conventional systems involves a convex minimization over a convex set, rather than a convex maximization over a convex set as in our case. Fourth, in conventional systems, privacy measure is not normalized, as opposed to our case.

(108) Privacy-preserving Data Mapping Under a Privacy-Utility Trade-off. In their paper, entitled “Privacy against statistical inference,” in 50th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Ill., 2012., F. du Pin Calmon and N. Fawaz consider the following setting, which is reproduced here in the notation we have used above: Given V custom character YZ

(109) $\begin{matrix} minimize c_{o}^{*} - 𝔼 [c_{Z}^{*} | Z], subject to 𝔼 [d (Y, Z)] \leq Δ & (59) \\ minimize c_{o}^{*} - \min_{Z} c_{Z}^{*}, subject to 𝔼 [d (Y, Z)] \leq Δ & (60) \end{matrix}$
where

(110) $c_{o}^{*} \underset{Q_{V}}{:= \min} 𝔼 [C (V, Q_{V})]$
and C(V Q.sub.V) is a loss function from the cartesian product of V and the probability distributions defined on it to real numbers, d: custom character ×←.sub.+ is a distortion measure and

(111) $c_{Z}^{*} := \min_{Q_{V}} 𝔼 [C (V, Q_{V}) | Z = z] .$
The optimization variable of the minimization problems in (59) and (60) are stochastic mappings from Y to Z, i.e., P.sub.Z|Y. Note that the above formulation has been shown to be stronger than differential privacy, hence in a sense a generalization of differential privacy. It will be evident to those skilled in the art that the aforementioned setting is fundamentally different than the privacy-preserving sufficient statistic. The following is a list some of the most notable differences. First, accuracy in the aforementioned formulation is based on the closeness of Z to Y, measured in terms of an average distortion measure; whereas the accuracy of an optimal statistical inference algorithm's, for a given loss function, inference of V based on the observation Z, in privacy-preserving sufficient statistic. Second, aforementioned formulation is “dual” of the formulation of privacy-preserving sufficient statistic, i.e., in the former the accuracy of third party's observation is guaranteed to exceed a certain amount and the privacy loss of the user is minimized; whereas in the latter, the privacy-loss user experience is guaranteed not to exceed a certain amount and the accuracy of third party's inference is maximized. Third, the particularization of the aforementioned formulation with the self-information cost function, reduces to a convex minimization over a convex set; whereas the particularizations of the privacy-preserving sufficient statistic with the privacy and accuracy measures mentioned above reduces to a convex maximization over a convex set. It will be evident to those skilled in the art that these two types of optimization problems are fundamentally different as far as the methods to compute their respective optimizers. Fourth, in the aforementioned formulation, privacy measure is not normalized, whereas in privacy-preserving sufficient statistic privacy measure is normalized.

(112) Information Bottleneck/Privacy Funnel. The so-called information bottleneck, introduced by N. Tishby and his co-workers in their paper “The information bottleneck method,” arXiv:physics/0004057, 2000, and its dual privacy funnel, introduced by A. Makhdoumi and his co-workers in their paper “From the information bottleneck to the privacy funnel,” in 2014 IEEE Information Theory Workshop (ITW 2014), 2014, are two noteworthy attempts to formalize privacy-utility trade-off, the entirety of each is incorporated herein by reference in their entirety. In our notation, they read as follows: Given V custom character YZ and λ>0

(113) $\begin{matrix} \max_{P_{Z | Y} : I (Y; Z) \leq x} I (V; Z) or \max_{P_{Z | Y}} [I (V; Z) - λ I (Y; Z)] & (61) \\ \min_{P_{Z | Y} : I (Y; Z) \geq x} I (V; Z) or \min_{P_{Z | Y}} [I (V; Z) - λ I (Y; Z)] & (62) \end{matrix}$
where (61) (resp. (62)) called as information bottleneck (resp. privacy funnel). Note that in both cases, the second optimization problem is typically interpreted as a Lagrange multiplier version of the first one, claimed to be an equivalent formulation. There are also ƒ-divergence versions, to which the following comments also apply. Among many differences with the formulation presented herein, the most notable ones follow. First, privacy is measured in terms of the information Z contains about Y, yet in our case we use a normalized version of the information Z contains about the user preference V. Second, accuracy is measured in terms of the mutual information, i.e., an information measure, whereas in our case we use the accuracy of best statistical inference algorithm's performance in inferring V based on the observation Z and these two notions don't have a direct mapping between each other. Third, for the constrained versions, algorithms given only to compute the boundary of the set {[I(Y;Z), I(V;Z): V custom character YZ}, which does not give an optimizer but rather provides, in a sense, a subset of the feasible set that includes the global optimizer(s), whereas we have an algorithm that computes a global optimizer.

(114) An Example of the Implementation of Valuable Information Extractor, i.e., 102, and Clarification on the Role of Updated Prior Information. Consider the following scenario: valuable information is a rating of a user's likelihood of purchasing a product at a given time instance. A particular example can be constructed akin to stock ratings in which an analyst summarizes her belief of a stock's future potential in discretized options such as buy, hold, sell. Specifically, the aforementioned rating of a user's likelihood of purchasing a product at a given time instance can consist of three possible values: (i) Interested; (ii) Neutral; and (iii) Not interested.

(115) In order to extract this valuable information, the method can use the totality of a user's web activity, such as websites visited, search queries, shopping charts in e-commerce websites, etc., which represents the meta data source in the above discussion. In any practical embodiment of the aforementioned scenario, it should be clear that the valuable information is not readily available to the method and needs to be extracted from the meta data source. Next, we outline one way to implement the method in such a scenario and outline the role of prior knowledge plays in this implementation.

(116) Let {X.sub.1, . . . , X.sub.T} be a sequence of the rating of user's likelihood of purchasing a given product, where X.sub.i is the said rating at time i and assume the method's ultimate goal is to extract an estimate of X.sub.T, i.e., the said rating at time T. One way to achieve this in a dynamical manner is to extract estimates of X.sub.i at each time instance, i.e., iteratively refine the estimate of the rating in light of further information gathered at each time. Let {Y.sub.1, . . . Y.sub.T} represent the sequence of such iterative estimates. At the beginning of the procedure, i.e., the operation to produce Y.sub.1, the method would use all the prior knowledge about the aforementioned rating, along with all the available meta data at that point in time.

(117) One can succinctly summarize this prior knowledge with a probability distribution over the possible values this rating can take. For example, going back to the initial example, if no prior information was available, then one could simply assume that all the three possibilities are equally likely, i.e., all the possibilities has a probability of ⅓. After the extraction operation at each time i, the method has a potentially updated information, which is not only due to the availability of more meta data, but also due to the fact that the extraction operation distills more refined information regarding X.sub.i. As such, in order to accomplish the extraction during the next time instance, the prior knowledge to be used will be different and hence, needs to be updated. In particular, for the ultimate step, in which Y.sub.T is produced, all the previous estimates, i.e., {Y.sub.1, . . . , Y.sub.T−1}, are available to the system, and hence the prior knowledge, i.e., the probability distribution of X.sub.T, to be used will be a function of these values. Therefore, the prior knowledge used in the ultimate step is a potentially different, i.e., more refined, version of the prior knowledge at the very beginning of the procedure.

(118) We continue with a couple of remarks regarding above discussion. The updated prior knowledge has two roles in the aforementioned implementation.

(119) First, as depicted in FIG. 1, the module (107) that computes the optimal privacy-preserving noisy stochastic mapping (109) takes extraction noise statistics (104) and updated prior knowledge (105) as two of its inputs. Further, the updated prior knowledge also plays a role in the Bayesian Framework outlined in FIG. 3. Specifically, it serves as P.sub.V in FIG. 3, which is the prior distribution on the valuable information V. Going back to the rating example at the beginning, the valuable information V will be the rating of user's likelihood of purchasing a product and the update prior knowledge P.sub.V will be a probability distribution over the possible values of said rating, which summarizes the method's prior belief regarding said rating.

(120) Second, during next epoch of the extraction procedure, in which the goal is to extract an estimate of X.sub.2T, one will start the extraction procedure with the posterior distribution computed by using the prior distribution representing this updated prior knowledge, along with extraction noise statistics and Bayesian Theorem, as the prior knowledge.

(121) The operation in the preceding paragraph, is not necessarily reflected in FIG. 1, because the figure is essentially a snapshot of the operation of a dynamical system, an example of which is outlined above, during one epoch.

(122) The statistical framework used in FIG. 3, which, in turn, is the basis of our formalizations of the privacy and accuracy in subsequent steps, is a Bayesian framework. In a nutshell, Bayesian statistics considers probability as a reasonable expectation representing the state of knowledge about an unknown quantity. In this framework, the prior knowledge, represented by a probability distribution, i.e., prior probability distribution, captures one's beliefs about the unknown quantity before any evidence becomes available. Once an evidence becomes available, one updates her belief based on this evidence, along with the prior distribution, by using the well-known Bayesian Theorem, to deduce a posterior distribution.

(123) It should be noted that various changes and modifications to the embodiments described herein will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the present invention and without diminishing its attendant advantages. For example, various embodiments of the systems and methods may be provided based on various combinations of the features and functions from the subject matter provided herein.

Systems and methods for optimal privacy-preserving information revelation

Inventors

Cpc classification

Classification Explorer

G06F2221/2115

PHYSICS

Classification Explorer

G06F21/6254

PHYSICS

Classification Explorer

G06F21/6245

PHYSICS

International classification

Classification Explorer

G06F21/62

PHYSICS

Abstract

Claims

Description