System and method for processing data and managing information
11580417 · 2023-02-14
Assignee
Inventors
Cpc classification
H04L9/3239
ELECTRICITY
H04L9/085
ELECTRICITY
H04L67/10
ELECTRICITY
H04L9/0894
ELECTRICITY
H04L9/0891
ELECTRICITY
International classification
Abstract
A method including receiving, at multiple cloud computing servers, multiple streaming data sets for the same sensing task each from a respective client device. The streaming data set from each client device comprises sensed data sensed by one or more sensors of said client device. The streaming data sets are encrypted. Each respective streaming data set from a respective client device is divided into multiple streaming data set portions, each to be received at a respective one of the cloud computing server. The method also includes processing, at each respective cloud computing server, the corresponding streaming data set portions received to generate a corresponding share of a result for the sensing task. The method also includes encrypting, at each respective one of the cloud computing servers, the corresponding share of the result; and facilitating creation or update of a blockchain based on the encrypted shares of the result.
Claims
1. A method for processing data and managing information, comprising: receiving, at two or more cloud computing servers operably connected with each other, and from each respective one of a plurality of client devices, a respective streaming data set for a same sensing task, a respective first multiplication triplet and a respective second multiplication triplet, wherein the streaming data set from each respective client device comprises sensed data sensed by one or more sensors of said client device and is encrypted at the respective client device, wherein each respective streaming data set from a respective client device is divided by the respective client device into two or more streaming data set portions each received at a respective one of the two or more cloud computing servers, wherein each respective first multiplication triplet from a respective client device is divided by the respective client device into two or more first multiplication triplet portions each received at a respective one of the two or more cloud computing servers, and wherein each respective second multiplication triplet from a respective client device is divided by the respective client device into two or more second multiplication triplet portions each received at a respective one of the two or more cloud computing servers; processing, at each respective one of the two or more cloud computing servers, the corresponding streaming data set portions and the corresponding first and second multiplication triplet portions received from the plurality of client devices to generate (i) respective share of a result for the sensing task using the first multiplication triplet portions and (ii) respective share of weight of the client devices using the second multiplication triplet portions, the weight of each of the client devices being associated with quality of the corresponding streaming data set provided by the client device; encrypting, at each respective one of the two or more cloud computing servers, the corresponding share of the result; facilitating creation or update of a blockchain based on all of the encrypted shares of the result so as to store the encrypted shares of the result in the blockchain; and hosting the blockchain at the two or more cloud computing servers.
2. The method of claim 1, wherein the streaming data sets are encrypted by secret sharing.
3. The method of claim 1, wherein the processing of the streaming data set portions comprises updating an existing share of the result at the respective cloud computing server.
4. The method of claim 3, wherein the updating of the existing share of the result comprises applying a weighting to the data associated with the existing share of the result and the streaming data set portions received.
5. The method of claim 4, wherein the weighting comprises an exponential weighted moving average.
6. The method of claim 1, wherein each of the client devices includes a respective data-quality-weighting, and wherein the processing of the streaming data set portions comprises updating the data-quality-weighting of the respective client device based on the generated share of the result.
7. The method of claim 1, wherein for each respective cloud computing server the processing of the streaming data set portions is conducted in a ciphertext domain.
8. The method of claim 1, further comprising: receiving, from a requester device, an indication of interest to purchase the result of the sensing task, processing the indication of interest at the two or more cloud computing servers to determine whether to provide the encrypted shares of the result to the requester device; providing the encrypted shares of the result to the requester device upon determining that the result should be provided to the requester device.
9. The method of claim 8, wherein the indication of interest comprises a purchase request and a payment.
10. The method of claim 9, wherein the payment from the requester device to the two or more cloud computing servers and the provision of the encrypted shares of the result to the requester device from the two or more cloud computing servers occur substantially simultaneously.
11. The method of claim 9, further comprising removing the encrypted shares of the result from the two or more cloud computing servers after providing the encrypted shares of the result to the requester device.
12. The method of claim 9, further comprising providing a reward to the respective client devices upon providing the encrypted shares of the result to the requester device.
13. The method of claim 12, wherein an amount of the reward to be provided to the respective client device is based on a data-quality-weighting of the respective client device.
14. The method of claim 1, wherein each respective streaming data set is comprised of a sensed value for the same sensing task and the sensed value is split into two or more shares by the respective client device.
15. A system for processing data and managing information, comprising: two or more cloud computing servers operably connected with each other and arranged to: receive, from a respective one of a plurality of client devices, a respective streaming data set for a same sensing task, a respective first multiplication triplet, and a respective second multiplication triplet, wherein the streaming data set from each respective client device comprises sensed data sensed by one or more sensors of said client device and is encrypted at the respective client device, wherein each respective streaming data set from a respective client device is divided by the respective client device into two or more streaming data set portions each received at a respective one of the two or more cloud computing servers, wherein each respective first multiplication triplet from a respective client device is divided by the respective client device into two or more first multiplication triplet portions each received at a respective one of the two or more cloud computing servers, and wherein each respective second multiplication triplet from a respective client device is divided by the respective client device into two or more second multiplication triplet portions each received at a respective one of the two or more cloud computing servers; process, at each respective one of the two or more cloud computing servers, the corresponding streaming data set portions and the corresponding first and second multiplication triplet portions received from the plurality of client devices to generate (i) respective share of a result for the sensing task using the first multiplication triplet portions and (ii) respective share of weight of the client devices using the second multiplication triplet portions, the weight of each of the client devices being associated with quality of the corresponding streaming data set provided by the client device; encrypt, at each respective one of the two or more cloud computing servers, the corresponding share of the result; facilitate creation or update of a blockchain based on all of the encrypted shares of the result so as to store the encrypted shares of the result in the blockchain; and host the blockchain at the two or more cloud computing servers.
16. The system of claim 15, wherein each respective streaming data set is comprised of a sensed value for the same sensing task and the sensed value is split into two or more shares by the respective client device.
17. A non-transitory computer readable medium for storing computer instructions that, when executed by a plurality of cloud computing servers, causes the cloud computing servers to perform a method for processing data and managing information, comprising: receiving, at two or more cloud computing servers operably connected with each other, and from each respective one of a plurality of client devices, a respective streaming data set for a same sensing task, a respective first multiplication triplet, and a respective second multiplication triplet, wherein the streaming data set from each respective client device comprises sensed data sensed by one or more sensors of said client device and is encrypted at the respective client device, wherein each respective streaming data set from a respective client device is divided by the respective client device into two or more streaming data set portions each received at a respective one of the two or more cloud computing servers, wherein each respective first multiplication triplet from a respective client device is divided by the respective client device into two or more first multiplication triplet portions each received at a respective one of the two or more cloud computing servers, and wherein each respective second multiplication triplet from a respective client device is divided by the respective client device into two or more second multiplication triplet portions each received at a respective one of the two or more cloud computing servers; processing, at each respective one of the two or more cloud computing servers, the corresponding streaming data set portions and the corresponding first and second multiplication triplet portions received from the plurality of client devices to generate (i) respective share of a result for the sensing task using the first multiplication triplet portions and (ii) respective share of weight of the client devices using the second multiplication triplet portions, the weight of each of the client devices being associated with quality of the corresponding streaming data set provided by the client device; encrypting, at each respective one of the two or more cloud computing servers, the corresponding share of the result; facilitating creation or update of a blockchain based on all of the encrypted shares of the result so as to store the encrypted shares of the result in the blockchain; and hosting the blockchain at the two or more cloud computing servers.
18. The non-transitory computer readable medium of claim 17, wherein each respective streaming data set is comprised of a sensed value for the same sensing task and the sensed value is split into two or more shares by the respective client device.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
(16) Overview
(17) A. Architectural Model
(18) In one embodiment of the invention, there is provided a privacy-aware crowdsensing framework that can efficiently leverage encrypted crowdsensed data streams to discover and sell knowledge.
(19) In this embodiment, clients 102 (client devices associated with users) are participants in crowdsensing applications (e.g., air quality monitoring and traffic monitoring). They are willing to perform sensing tasks and share the sensed data. The sensed data of the clients 102 are preferably encrypted for privacy protection. Monetary rewards may be provided to clients 102 in the system 100 as a reward. To achieve privacy protection and cost efficiency, the present embodiment uses a lightweight cryptographic technique, i.e., additive secret sharing, for encrypting the client's sensed data. In the present embodiment, at each epoch, a client 102 splits its sensed data into two shares for submission.
(20) The cloud (or server) 104 is deployed for collecting and processing the encrypted data streams received from the clients. In this embodiment, the cloud entity includes two independent cloud servers, i.e., S.sub.0 and S.sub.1, each arranged to obtain a respective share of the sensed data of a client at every epoch. The two servers S.sub.0 and S.sub.1 jointly perform truth discovery in the ciphertext domain, then incrementally update the truth for each sensing task and update each client's weight. As a result, at each epoch, each cloud server obtains, respectively, one share of the truth for each sensing task as well as the client's weight. Such an encrypted truth discovery procedure is conducted in a streaming manner to efficiently handle the streaming data. In other words, at each epoch, the shares of clients' sensed data will be scanned only once at each server.
(21) Once the encrypted truth discovery procedure at each epoch is completed, the framework in the present embodiment supports transparent and streamlined knowledge monetization by leveraging blockchain-based technology, preferably with quality-aware rewards delivered to the clients 102 who contributed their data. Particularly, at the end of each epoch, the servers S.sub.0 and S.sub.1 independently encrypt and commit its corresponding share of the learned truth for the sensing task and of each client's weight to a blockchain, preferably also on the server (e.g., the same or other computer systems), for transparent and streamlined knowledge monetization with the requester 106. In the system 100 of
(22) Knowledge monetization in the present invention may operate under various kinds of models. In one example, the two cloud servers jointly initiate the crowdsensing application with the purpose of making profit by selling the learned truths to requester who shows interests. And the clients are also rewarded in return, preferably according to their weights. In another example, the requester initiates the crowdsensing application and employs the two cloud servers to continuously collect sensed data from the clients and perform streaming truth discovery. The two cloud servers get paid by the requester for the crowdsensing service, and the clients also obtain quality-aware rewards in return, preferably according to their weights.
(23) B. Security Goals and Threat Assumptions
(24) The present embodiment has two main security goals.
(25) First, before knowledge monetization is initiated, the design aims to ensure that the two cloud servers learn nothing useful on its own. Preferably, the processing of the clients' sensed data at the cloud is conducted in the ciphertext domain throughout the streaming truth discovery procedure so that the two cloud servers are oblivious to the underlying private data. The two cloud servers in the present embodiment can be considered as non-colluding semi-honest adversaries. Preferably, each of the cloud servers will follow the protocol specification, and is interested in independently learning individual clients' sensed data.
(26) Second, for blockchain-based knowledge monetization, the present design primarily aims to ensure monetization fairness and knowledge confidentiality. In terms of monetization fairness, preferably, the requester is able to obtain the knowledge once the offered payment is finalized on the blockchain and the cloud servers and each client can automatically obtain monetary rewards in return. Regarding knowledge confidentiality, preferably, the knowledge is not revealed to any parties prior to knowledge monetization and, once knowledge monetization is realised, the knowledge is only revealed to the requester and becomes his proprietary information (e.g., no longer available for others). In one embodiment, during knowledge monetization, the requester and the two cloud servers would want to protect their respective interests. That is, the requester is not willing to pay to the two cloud servers and clients in advance while the two cloud servers are not willing to reveal the truth shares (i.e., the useful information) to the requester in advance. In the present embodiment, individual clients' weights are not protected (could be made known) during knowledge monetization as the weight information is inherently required for transparent and automatic quality-aware reward. However, in embodiments without reward scheme, the clients' weights may be protected.
(27) Preliminaries
(28) A. Streaming Truth Discovery
(29) Streaming truth discovery can be used to support real-time truthful aggregation (i.e., weighted aggregation) of sequentially incoming sensed data in crowdsensing while fully respecting historical aggregation results. The present embodiment uses streaming truth discovery, which, unlike common truth discovery techniques that run a batch algorithm over a static dataset iteratively, scans data once and performs real-time processing to update the truth (i.e., weighted aggregation result) and weights of data contributors incrementally, taking into account (e.g., integrating) historical results. Depending on the application context, various ways can be applied to control the effect of historical results on the latest truth. In one embodiment, exponential weighted moving average is applied to the data so that the historical effect of data on the latest result decreases exponentially.
(30) Algorithm 1 in
(31) In Algorithm 1, the initial weights of all clients in the first epoch are initially set to 1. In the subsequent epoch l, with each client k's weight w.sub.k, the truth Φ.sub.l is first updated via weighted aggregation, as shown in Equation (1) in
(32) B. Garbled Circuit
(33) A. C. Yao, “How to generate and exchange secrets (extended abstract)” in Proc. of FOCS, 1986 has disclosed a garbled circuit that enables two parties, with private input x and y respectively, to jointly compute an arbitrary function ƒ(x, y), without leaking information about their private inputs beyond the function output. Specifically, one party named the garbler first generates and garbles a circuit that computes ƒ(). Then, the garbled circuit and the garbled input {tilde over (x)} corresponding to its input x are given to the other party named evaluator. To avoid leaking its private input to the garbler, the evaluator runs an oblivious transfer protocol (OT) with the garbler to learn the garbled input y that corresponds to its input y. With {tilde over (x)} and {tilde over (y)}, the evaluator can now evaluate the garbled circuit and learn the function output ƒ(x, y).
(34) Mining Truthful Knowledge Through Privacy-Preserving Streaming Truth Discovery
(35) This section illustrates how crowdsensing with privacy-preserving streaming truth discovery is enabled in the present embodiment for producing truthful knowledge that can later be monetized via blockchain-based technology. To simplify illustration, the sensed data is referred below as sensing values.
(36) A. Design Overview
(37) To support privacy-preserving streaming truth discovery in a mobile-friendly and cost-effective manner, the present embodiment utilizes a lightweight cryptographic primitive, i.e., secret sharing, for protecting each client's sensing values. Specifically, at each epoch, given a sensing value x, a client splits it into two shares [x].sub.0 and [x].sub.1, by randomly choosing [x].sub.0 ∈F.sub.q and constructing [x].sub.1=x−[x].sub.0 ∈F.sub.q, where F is a finite field and q is a large prime. Denote this additive secret-sharing scheme as SS(⋅). The cloud server S.sub.0 holds [x].sub.0 while the cloud server S.sub.1 holds [x].sub.1. Then, the two cloud servers jointly perform secure streaming truth discovery algorithm based on the shares. As a result, at each epoch, each cloud server obtains one share of the truth for each task and one share of each client's weight.
(38) The following describes how to securely perform the atomic operations underlying the streaming truth discovery algorithm based on the shares of clients' sensing values. The following first considers how secure addition and multiplication can be performed. The goal in this instance is to add or multiply the shares of two inputs, e.g., [x].sub.i and [y].sub.i, so as to allow cloud server S.sub.i(i∈{0,1}) to get a share [x+y].sub.i or [x−y].sub.i without leaking private inputs x and y to both cloud servers. For the addition operation, the shares can be readily added by having S.sub.i compute [x+y].sub.i=[x].sub.i+[y].sub.i. For multiplication, if one of the inputs is a constant A∈Fq, S.sub.i computes as [Ay].sub.i=A[y].sub.i. Denote multiplication by a constant as Mul.sub.c([y].sub.i, A). If both of the two inputs are shared values, a viable technique for multiplication could be applied.
(39) In one embodiment, the design adapts a multiplication triplet technique disclosed in D. Beaver, “Efficient multiparty protocols using circuit randomization,” in Proc. of CRYPTO, 1991 to support multiplication over secret-shared values. Suppose that the cloud servers already share a multiplication triplet (a, b, c)∈F.sub.q.sup.3 where a and b are random values and a.Math.b=c∈F.sub.q. Beaver's technique implies that multiplication over shared values x and y can be performed as follows. In particular, each S.sub.i locally computes [u].sub.i=−[a].sub.i and [v]=[y].sub.i−[b].sub.i. Then, each S.sub.i broadcasts [u].sub.i and [v].sub.i. Each S.sub.i can reconstruct u and v, and further compute the share
(40)
Such multiplication over secret-shared values is denoted by Mul.sub.tri ([x].sub.i, [y].sub.i). To generate the multiplication triplet, one option is to run a cryptographic protocol between the two cloud servers. However, this option could be expensive. For higher cost efficiency, it is preferred that the multiplication triplets are generated on the client side, so that the expensive cryptographic protocol for multiplication triplet generation is avoided.
(41) Besides secure addition and multiplication operations, it is observed that secure division and logarithm operations may also be required. As these operations may be difficult for direct handling under the secret sharing technique, garbled circuits can be applied to securely reconstruct original values, perform the operations, and secret-share the computation result among the two cloud servers again for future processing. In one embodiment, the idea is to let S.sub.0 provide S.sub.1 with a garbled circuit inside which their shares are combined, and division and/or logarithm computation is performed over the recovered data. The evaluation of the garbled circuit on the S.sub.1 side will output a masked result m−∈F.sub.q, where in is the computation result and r is a random value chosen by S.sub.0. So, [m].sub.1=m−r is the share of in for S.sub.1, while [m].sub.0=r is the corresponding share for S.sub.0.
(42) For the non-linear logarithm function, it may be difficult to be efficiently computed inside the garbled circuit. In this case, a secure computation friendly approximation for the logarithm function may be required for efficiently support in the encrypted domain. One option would be to approximate such function by high-degree polynomials, e.g., Taylor series expansion. However, the handling of high-degree polynomials in the encrypted domain is generally not efficient. Thus, one embodiment applies approximation by low-degree polynomials and particularly the technique of piecewise linear polynomial approximation. In this case, the non-linear logarithm function is approximated via a set of piecewise continuous polynomials, in which the polynomial degree is 1. Consequently, the logarithm function inside the garbled circuit can be replaced by relatively lightweight operations like comparison, addition, and (depth-1) multiplication.
(43) B. Detailed Construction
(44) The construction in the present embodiment includes three steps at each epoch l. To handle floating-point numbers, a scaling factor L is used to scale up a fractional value into an integer if needed. Note that all intermediate and final values (except constants) at each epoch are secret-shared between the two cloud servers S.sub.0 and S.sub.1.
(45) Step 1: Data Shares Submission
(46) In this phase of epoch l, each client k first splits the sensing value x.sub.l.sup.k into two shares via ([x.sub.l.sup.k].sub.0, [x.sub.l.sup.k].sub.1)←SS(x.sub.l.sup.k). Besides, each client k also randomly generates two multiplication triplets Tri.sub.0=(a.sub.0, b.sub.0, c.sub.0) and Tri.sub.1=(a.sub.1, b.sub.1, c.sub.1), and splits each of them into two shares via SS(⋅). Then, each client k submits the share {[x.sub.l.sup.k].sub.1, [Tri.sub.0].sub.i, [Tri.sub.1].sub.i} to the cloud server S.sub.i, via a secure and authenticated channel.
(47) Step 2: Truth Shares Update
(48) In the first epoch, the weight and accumulated distance of each client k are first initialized as constants by each of the cloud servers. Then, each cloud server S.sub.i can locally update its share of truth Φ.sub.i via Equation (a) in , [
, {tilde over (r)}) to S.sub.1. S.sub.1 runs an oblivious transfer protocol with S.sub.0 to obtain its garbled values [
and [
. Then, with the given garbled values [
, [
, [
, [
, and {tilde over (r)}, S.sub.1 evaluates the garbled circuit and obtains output Φ−r, which is the truth share Φ.sub.1 on the S.sub.1 side.
(49) Step 3: Weight Shares Update
(50) After updating the truth shares, the weight shares can then be securely updated. First, the squared error d.sub.k=(x.sub.l.sup.k−Φ).sup.2 needs to be computed for each client k in the encrypted domain using Equation (d) in
(51) To simplify illustration, the above design is described mainly focusing on a single sensing task. It should be appreciated that the proposed design can readily support the scenario of multiple sensing tasks by slightly modifying the secure truth update and weight update algorithms. First, the truth share t of each sensing task n needs to be updated separately in the truth update step. Second, the accumulated distance sum across all sensing tasks need to be computed, i.e., [st*].sub.i=Σ.sub.n=1.sup.NΣ.sub.k=1.sup.K[st(k)(n)].sub.i, where N is the number of sensing tasks and ([st(k)(n)].sub.i denotes the state of client k for sensing task n. Client k's state [st(k)].sub.i is computed as [st(k)].sub.i=Σ.sub.n=1.sup.N[st(k)(n)].sub.i, and then weight estimation is performed over [st(k)].sub.i and [st*].sub.i to update its weight share.
(52) The present invention is not limited to the streaming truth discovery algorithm described above. Rather, the present invention also covers other algorithms, as long as the operations of the underlying plaintext algorithm can be decomposed into common arithmetic operations (addition, multiplication, and division), and some non-linear function.
(53) C. Security Guarantees
(54) The above design ensures that under the non-colluding and honest-but-curious adversary model, the two cloud servers learn nothing about the client's private sensing values throughout the procedure of secure streaming truth discovery.
(55) At each epoch, using one client as an example, each cloud server first receives a share of all of the client's sensing values and a share of the multiplication triplets from. The security of the secret sharing technique ensures that nothing is revealed to each cloud server. Then, to conduct truth update and weight update, the two cloud servers do not only operate their respective secret-shared values locally but also interact with each other. In the present embodiment, the interactions are either based on the Beaver's multiplication technique to perform secure multiplication, or are based on garbled circuits to perform secure division and/or (approximate) logarithm. The security properties of the Beaver's technique and garbled circuits ensure that nothing is revealed to each cloud server from the interactions. From the interactions, each cloud server only obtains a share of the computation result.
(56) Note that in the present design embodiment, it is assumed that the sensing values from the clients are well-formed, i.e., they are within a proper range. However, the present design embodiment can also be extended to handle malicious clients that provide malformed sensing values. For example, it is possible t integrate the latest secret-shared non-interactive zero knowledge proofs for efficiently checking whether a secret-shared sensing value is within a proper range.
(57) Blockchain-Based Knowledge Monetization with Quality-Aware Reward Mechanism
(58) With the truth shares, the present embodiment also presents how to support full-fledged blockchain-based knowledge monetization.
(59) A. Design Rationale
(60) The design of the present embodiment focuses on introducing how to establish a transparent and fair knowledge market, where everyone can browse and buy knowledge (i.e., learned truths) with fairness. This is catered for an application model in which the two cloud servers jointly initiate the crowdsensing application to make profit by selling the learned truths to the requester. It should be noted that other embodiments of the invention would support an alternative application model where the requester initiates the crowdsensing application. The design of the present embodiment aims to provide a knowledge market with the following properties: 1) Transparency and streamlined processing 2) Monetization fairness with (on-chain) knowledge confidentiality 3) Automatic and quality-aware rewards for clients.
(61) First, to enable end-to-end visibility and streamline the process, in the present embodiment, the knowledge market is deployed based on blockchain. Here, available knowledge and their corresponding metadata are committed and became publicly accessible, and clients are aware of the on-going knowledge monetization processes. Also, both the requester and servers need only to interact with the blockchain, which will automatically execute the monetization process with guaranteed correctness.
(62) It is necessary to achieve monetization fairness with knowledge confidentiality. Fairness should be deemed as a natural requirement in knowledge monetization. For one thing, the requester should learn the knowledge once money is paid to the cloud servers and clients. For another, the cloud servers and clients should get paid once the knowledge is revealed to the requester. As blockchain inherently does not provide on-chain data privacy, directly posting the knowledge shares on the blockchain will compromise knowledge confidentiality. So, in the present embodiment only stores the encrypted knowledge shares on the blockchain.
(63) To securely transfer the decryption keys to the requester with fairness, one method would be to let the cloud servers first commit the decryption keys to the blockchain, and later post them on the blockchain to claim the payment. However, this method still does not ensure knowledge confidentiality as the encrypted knowledge shares are also publicly accessible on the blockchain.
(64) To address this problem, the present embodiment provides a new protocol that leverages smart contract and customized protection techniques to ensure fairness with knowledge confidentiality. Particularly, in this embodiment, only the masked keys are committed on the blockchain, and the masks are securely transferred by encrypting them with the symmetric keys that are generated by the offering requester. Hence, even the masked keys are posted on the blockchain for verification later, only the requester can unmask them and recover the decryption keys. In one embodiment, to further ensure knowledge confidentiality after monetizing the knowledge, the sold knowledge is permanently removed from the market. However, in other embodiment, the knowledge may be rented or otherwise remained.
(65) Preferably, this design embodiment also provides automatic quality-aware reward for the clients via the smart contract. After completing a knowledge monetization process, i.e., after a requester pays and gets the knowledge, the clients are rewarded, preferably according to the quality (i.e., weights) of their contributed sensing value.
(66) B. The Proposed Protocol
(67) At a high level, the proposed knowledge monetization protocol comprises three phases, i.e., commit, offer, and harvest. First, in the commit phase, the cloud servers upload the encrypted knowledge shares, masked keys, and weight shares of clients to the knowledge market for sell. Second, in the offer phase, the requester C issues an offer to try buying the knowledge pertaining to his interests. Finally, in the harvest phase, if a knowledge offer is settled, the requester C recovers the knowledge, the cloud servers are rewarded equally, and quality-aware rewards are made to the contributed clients.
(68) In the following protocol description, consider that the knowledge shares Φ.sub.0 and Φ.sub.1 at epoch l are jointly discovered by the cloud servers, and a requester C wants to learn Φ after browsing the knowledge market. In this embodiment, the blockchain is trusted in terms of execution correctness and availability, but not for privacy. The blockchain serves as a ledger that maintains balances for all involved parties, and correctly executes arbitrary self-enforcing programs, i.e., smart contracts. In this description, the terms blockchain and ledger are used interchangeably unless otherwise stated. In order to facilitate better understanding of the protocol of this embodiment, two key features of the smart contract in the knowledge monetization process is provided below:
(69) (i) Timing and Round-based Execution: A smart contract has a timer that is modeled as a discrete variable T and increments in rounds. In each round, messages arriving at the smart contract are executed at the beginning of the next round, which starts when T increases.
(70) (2) Entity Identifiers: The cloud servers, clients, and requester have identities, i.e., S.sub.0, S.sub.1, {P.sub.k}.sub.k=1.sup.K, and C, associated with their accounts on the smart contract. They can send authenticated messages to the contract using their accounts' secret keys, and ledger[S] denotes S's balance in the ledger.
(71) The protocol in one embodiment of the invention is as follows.
(72) Commit Phase
(73) The knowledge market is established on the ledger by letting cloud servers committing the knowledge shares, weight shares and masked keys: S.sub.i computes id:=ID(l), where l is the epoch number and ID generates an identifier. S.sub.i encrypts the share of knowledge Φ.sub.i via C.sub.i:=SENC(Φ.sub.i, k.sub.i), where SENC is a symmetric encryption scheme and k.sub.i is a randomly selected symmetric key. S.sub.i encrypts the weight shares via wt.sub.k.sup.i:=XOR([w.sub.k].sub.i,p.sub.k.sup.i), where {P.sub.k.sup.i}.sub.k=1.sup.K are randomly selected. S.sub.i masks the encryption key via key.sub.i:=XOR(k.sub.i,mask.sub.i) using a fresh nonce mask.sub.i, and commits the masked key cm.sub.key1:=Comm(key.sub.i). S.sub.i sends commit:=(id, cm.sub.keyi, c.sub.i, {wt.sub.k.sup.i}.sub.k=1.sup.K) to Blockchain.sub.KM, i.e., the smart contract on blockchain.
(74) In addition to commitments, cloud servers can also attach some public metadata for id, e.g., a brief task description and its expected price, to facilitate trade. Upon receiving both commit requests from the cloud servers, Blockchain.sub.KM stores the ciphertexts and commitments, and labels knowledge id as available for sell. The detailed functionality is shown in
(75) Offer Phase
(76) The requester C can browser the knowledge market, and send an offer to Blockchain.sub.KM to indicate interest. Suppose now C wants to buy the knowledge Φ with id. It randomly selects two symmetric keys S.sub.0 and S.sub.1, and encrypts them under the two cloud servers' public keys: C generates ct.sub.0:=ENC(S.sub.0. epk, s.sub.0) and ct.sub.1:=ENC(S.sub.1. epk, s.sub.1), where ENC is a secure public key based encryption scheme. C sends pay:=(id, $offer, ct.sub.0, ct.sub.1) to Blockchain.sub.KM.
(77) Upon receiving an offer from the requester, Blockchain.sub.KM checks whether the requester has enough balance in the ledger, and sends the corresponding key to each cloud server. After receiving the offer, each cloud server will evaluate it to decide whether or not to sell knowledge Φ to the requester. Specifically, if S.sub.i accepts the offer, it will use the symmetric key S.sub.i, which is given from the requester, to encrypt the corresponding mask used in key.sub.i, i.e., sn.sub.i=SENC(mask.sub.i,s.sub.i). The detailed functionality is shown in
(78) Harvest Phase
(79) To claim the reward $offer, the masked key key.sub.i, sn.sub.i, and the masks used for decrypting the client weight shares are posted on the blockchain by the cloud server S.sub.i. In particular, S.sub.i retrieves key.sub.i for id, and then sends harvest:=(id, key.sub.i, sn.sub.i, {p.sub.k.sup.i}.sub.k=1.sup.K) to Blockchain.sub.KM.
(80) Upon receiving the harvest requests from the cloud servers, Blockchain.sub.KM checks 1) whether both masked keys, i.e., key.sub.0 and key.sub.1, are posted; 2) whether the posted masked keys match the commitments attached to id. If both requirements are satisfied, the offer from the requester C is finalized and transferred to the cloud servers and clients, and harvest.sub.c=(key.sub.0, key.sub.1, sn.sub.0, sn.sub.1) is given to the requester for recovering knowledge. To split the reward, the cloud servers will get a share $soffer, and the remaining reward $coffer is given to the clients, preferably according to their committed weights. Specifically, Blockchain.sub.KM will unmask their corresponding weight shares and recover each client's weight, and reward each of them via Deg(w.sub.k), where
(81)
The detailed functionality is shown in
(82) With harvest.sub.c, the requester C obtains the decryption keys by unmasking key.sub.0 and key.sub.1, and then recovers the knowledge from the two encrypted knowledge shares: C obtains decryption masks via mask.sub.0:=SDEC(sn.sub.0, s.sub.0) and mask.sub.1:=SDEC(sn.sub.1, s.sub.1). C unmasks keys k.sub.0:=XOR(key.sub.0, mask.sub.0) and k.sub.1:=XOR(key.sub.1, mask.sub.1). C computes knowledge shares Φ.sub.0:=SDEC(c.sub.0, k.sub.0) and Φ.sub.1:=SDEC(c.sub.1, k.sub.1). C recovers the knowledge via Φ:=Φ.sub.0+Φ.sub.1.
(83) While the protocol of this embodiment is tailored for a transparent knowledge market, it may be modified to provide another embodiment that supports the alternative application discussed above. As the discovered knowledge will be sold to a specific requester initiating the crowdsensing application, fulfillment of offer needs to be further ensured, so that the cloud servers and clients are rewarded even if the requester later cancels the offer, as they have wasted or spent resources. In one embodiment, a proper offer will be issued by that requester in the offer phase. Here, the requester may have to deposit money and commit key ciphertexts on the blockchain in advance. Later, after the cloud servers commit knowledge to the blockchain, an offer is automatically triggered.
(84) C. Protocol Analysis
(85) Monetization Fairness
(86) The fairness in the above knowledge monetization protocol is two-fold: the requester obtains the knowledge once he (or she, mentioned once and for all) pays, and the cloud servers and clients are rewarded once they reveal the knowledge. Recall that in the above protocol embodiment, the requester should first commit an offer via the smart contract before he initiates the knowledge monetization process. Then, only when both cloud servers provide the decryption materials to the requester via the blockchain, the monetization process will succeed and the cloud servers and clients are rewarded. If either of the cloud servers does not accept the offer and provide the material, the committed payment of a requester will be aborted by the smart contract automatically and returned to the requester.
(87) Knowledge Confidentiality
(88) The above protocol embodiment preserves knowledge confidentiality. That is, before knowledge monetization, the knowledge is not revealed to any parties. Once the knowledge monetization is done, the knowledge is only revealed to the requester and becomes his proprietary information. In particular, The above protocol embodiment provides the following theorem.
(89) Theorem 1: Assuming the commitment scheme Comm is computational hiding and perfectly binding, the encryption schemes ENC and SENC are perfectly correct and semantically secure, the masks for encryption are randomly generated, and the two cloud servers do not collude, the above protocol for knowledge monetization guarantees knowledge confidentiality before and after the monetization processing.
(90) Proof Sketch
(91) First, analyse knowledge confidentiality before monetization processing. Recall that the knowledge shares are encrypted using random symmetric keys of the cloud servers, via SENC(Φ.sub.i, k.sub.i). Thus, the requester can learn nothing from the posted knowledge shares ciphertexts on the market because he does not know the decryption keys. Also, the knowledge shares are encrypted independently by the two cloud servers. Each of the cloud servers cannot learn information from the other cloud server's knowledge share ciphertexts. Therefore, before the knowledge monetization process, the knowledge is not revealed to any party.
(92) Second, analyse knowledge confidentiality after monetization processing. Initially, the decryption keys for the knowledge shares are masked before committing, and the masks are given to the requester via SENC(mask.sub.i, s.sub.i), where s.sub.i is the symmetric key selected by the requester. Here, s.sub.0 and s.sub.1 are encrypted under each cloud server's public key, i.e., ENC(S.sub.i.epk, s.sub.i), before posting them on contract. Hence, posting the masked keys and encrypted masks on the blockchain will not leak information to others about the decryption keys, which ensures that the monetized knowledge is revealed only to the requester who has bought it. Also, after a knowledge monetization process, metadata of knowledge id, i.e., id; cm.sub.key0 and cm.sub.key1, are removed from the knowledge market. It ensures that the knowledge id cannot be monetized for a second time, and is owned by the requester as a private property.
Experiments
(93) A. Implementation
(94) The performance of the above privacy-preserving streaming truth discovery design and blockchain-based knowledge monetization protocol embodiment has been evaluated via a proof-of-concept implementation. The plaintext streaming truth discovery scheme I-CRH disclosed in Y Li, Q. Li, J. Gao, L. Su, B. Zhao, W. Fan, and J. Han, “Conflicts to harmony: A framework for resolving conflicts in heterogeneous data by truth discovery,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, pp. 1986-1999, 2016 were also implemented for accuracy comparison and validation. For cryptographic primitives, the standard cryptographic toolkit in Python was used. Specifically, AES was used for symmetric encryption and RSA was used for public key based encryption, with 256-bit key size and 2048-bit security level, respectively. And SHA-256 was used as the commitment scheme. All implementations except the smart contract and garbled circuit were in Python—the program contains more than 1400 lines of code. In the experiment, the Ethereum blockchain was used for test, and the smart contract is implemented with the Ethereum programming language Solidity with 240 lines of code. For garbled circuits, ObliVM-lang was used. A piecewise linear polynomial approximation could be used to approximate the non-linear logarithm function for evaluation inside the garble circuit. Apparently, by splitting the logarithm function into more pieces, the approximation result would be more accurate But this would also lead to an increase in the circuit size and computation overhead. Here, the approximation range was set as [0,1] and the base 2 logarithm function was approximated with 16 pieces. The polynomial expression of each interval was determined by fitting a smoothing spline with optimized switchover positions (i.e., knots).
(95) In the experiment, the client side and cloud server side operations were tested on a Microsoft Azure standard D8s v3 instance (8 virtual CPU, 32 GB memory, and SSD access), with Ubuntu 16.04 LTS operating system. The smart contract was deployed on a test RPC environment that runs on the instance above. In the experiments, synthetic datasets were used and the decay rate λ was set to 0.5. Particularly, the sensing values of clients are drawn from normal distribution. The mean value was used to represent the ground truth, and vary the standard deviation for each client to resemble their respective data quality levels. Note that the correctness and performance of the present design embodiment depends on the underlying cryptographic primitives, regardless of the type of dataset being evaluated. For the number of clients or sensing tasks, a parameter setting which has the same order of magnitude as in I-CRH was used. In particular, the number of clients and sensing tasks in the experiment ranged from 4 to 12, and from 10 to 30 respectively for demonstration. Such a parameter setting is consistent with some real-world crowdsensing applications, e.g., urban air quality monitoring and noise monitoring, where there are usually just a few tens of clients.
(96) B. Performance Evaluation
(97) Accuracy
(98) First, the accuracy of the privacy preserving streaming truth discovery design is measured. Here, for demonstration, the accuracy result was reported through a continuous trial over 20 epochs, with a sensing task and 10 clients. The standard root of mean squared error (RMSE) was used to measure the difference between the estimated ground truth and the ground truth. Recall that a scaling factor L was adopted to round fractional data values and polynomials for approximating the logarithm function. First, the accuracy was measured by varying the scaling factor L. In particular, L was varied from 10.sup.0 to 10.sup.5, and the RMSE result of the present design embodiment was compared with that of I-CRH, as illustrated in the graph of
(99) Next, with the configured scaling factor, the estimation error brought by the approximation of the logarithm function was measured.
(100)
(101) Computation Efficiency
(102) First the computation cost at the client side was measured. Recall that at each epoch, each client is only required to generating shares of their sensing values and two multiplication triplets. Share generation process with 100 sensing tasks was evaluated. It took around 0.48 ms and 2.02 ms for generating shares of sensing values and of multiplication triplets, respectively. This result indicates the practical computation efficiency on the client side.
(103) Next, the computation cost on the cloud server side for privacy-preserving streaming truth discovery is tested.
(104) Blockchain Performance
(105) The smart contract implementation in the present design embodiment consists of three main functions that reflect the functionalities in the Commit, Offer, and Harvest phases respectively. To evaluate the performance of the smart contract over blockchain, the knowledge monetization process was simulated by the following four steps: 1) contract deployment; 2) transmission of transaction from S.sub.0 and S.sub.1 respectively, for sequentially committing data; 3) transmission of offer transaction to trigger the knowledge monetization process; 4) transmission of harvest transactions to claim the payment, and to trigger automatic quality-aware reward for the clients.
(106)
(107) With the gas price 2×10.sup.−9 ether and exchange rate 1 ether=USD $300.85 on the official Ethereum network, the contract deployment cost is about 2.7 million gas ($1.62), and the total cost of the entire knowledge monetization process is about 3.7 million gas ($2.22) when the smart contract is deployed on the official Ethereum blockchain. Note that the contract deployment is a one-time cost. In addition, since the implementation of the underlying blockchain is agnostic to the present design embodiment, a lower financial cost may be expected by deploying the smart contract to a permissioned blockchain system, e.g., Azure consortium blockchain service.
(108) Exemplary Applications
(109) The system and method in the present invention are potentially applicable in various applications.
(110) One potential application is to enable secure healthcare applications over crowdsourcing data. Nowadays, healthcare data are collected by the medical researchers and biotechnology companies to help conduct better analytics, more efficient practices and overall better patient care. Due to raising privacy concerns on the personal healthcare data, there is a growing trend to encrypt the data before outsourcing it. Also, the client who contributed healthcare data is expected to be rewarded according to its contribution for the medical research/analysis, so that more clients will be incentivized to provide their useful data. During the entire knowledge/analysis discovery over the healthcare data, the service providers know nothing about the sensitive data of each client. Eventually, the discovered knowledge can be sold to a requester (hospital or researcher) with guaranteed confidentiality, and the contributed clients will be fairly rewarded accordingly to their data quality.
(111) Another potential application is to build a transparent knowledge market that truthful knowledge can be discovered and contributed by all individuals. With the rapid use of “wisdom of the crowd” to discover useful knowledge and make decisions in various areas (e.g., stock market), a growing number of users are fond of contributing their data/opinion to conduct some crowdsourcing tasks and earn money. However, their contributed data/opinion privacy is not guaranteed, and the leaked information could later be reused without being noticed by the original client. Here, the present invention guarantees data privacy on the contributed data. Also, the users are guaranteed to be rewarded automatically through the smart contract on the blockchain. All the transactions of the knowledge are publicly audit-able, so neither the service providers nor the knowledge requester can cheat the users and fraudulently retrieve the knowledge without rewarding the users.
(112) Exemplary System
(113) Referring to
(114) Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality desired herein.
(115) It will also be appreciated that where the methods and systems of the invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilized. This will include stand-alone computers, network computers, dedicated or non-dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to include any appropriate arrangement of computer or information processing hardware capable of implementing the function described.
CONCLUSION
(116) The above embodiment has provided a crowdsensing framework enabling private truthful knowledge discovery and full-fledged blockchain-based knowledge monetization system and method. The above embodiment enables privacy-preserving and efficient truth discovery over encrypted crowdsensed data streams for truthful knowledge discovery. Meanwhile, with careful integration of the newly emerging blockchain-based smart contract technology, the above embodiment allows full-fledged knowledge monetization. Tackling the challenges of monetization fairness and on-chain knowledge confidentiality, the customized knowledge monetization design in the above embodiment well respects the interests of knowledge seller and requester, with full support of transparency, streamlined processing, and automatic quality-aware rewards for clients. Extensive experiments on Microsoft Azure cloud and Ethereum blockchain have demonstrated the practically affordable performance of the proposed design.
(117) It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the scope of the invention as defined in the claims. The described embodiments of the invention should therefore be considered in all respects as illustrative, not restrictive.
(118) For example, the present invention is not limited to crowdsensing applications. The present invention may be applied to crowdsourcing applications, or like applications that leverage manual or machine input from different client devices. The created useful knowledge need not be blockchain-based but is preferably blockchain-based. Various block-chain based technology or platform may be employed. The transaction of the useful knowledge, or the associated blockchain, need not be associated with money. It could be, alternatively or additionally, associated with other points, credits, or other incentives. The number of cloud computing servers may be any number larger than or equal to two (although two is preferred). The data-quality-weighting can be omitted in some embodiments. The reward as a result of the transaction can also be omitted in some embodiments. The reward can be a fixed reward, or a floating reward based on the data quality weighting. The useful knowledge purchased or otherwise transferred to the requester may be removed from the server, or alternatively, remained on the server. For example, the transaction of the useful information may be in the form of a loan (available for a certain period of time), not proprietary.