Secure probabilistic analytics using an encrypted analytics matrix
11451370 · 2022-09-20
Assignee
Inventors
Cpc classification
G09C1/00
PHYSICS
H04L2209/12
ELECTRICITY
H04L9/0861
ELECTRICITY
H04L63/0435
ELECTRICITY
H04L9/30
ELECTRICITY
H04L9/3242
ELECTRICITY
G06F21/6218
PHYSICS
G06F16/254
PHYSICS
G06F40/211
PHYSICS
G06F17/16
PHYSICS
H04L9/302
ELECTRICITY
International classification
H04L9/00
ELECTRICITY
H04L9/08
ELECTRICITY
G06F17/16
PHYSICS
G09C1/00
PHYSICS
H04L9/32
ELECTRICITY
G06F16/25
PHYSICS
G06F21/62
PHYSICS
H04L9/30
ELECTRICITY
H04L9/06
ELECTRICITY
Abstract
Provided are methods and systems for performing a secure probabilistic analytic using an encrypted analytics matrix. An example method includes acquiring, by a client, an analytic, analytic parameters associated with the analytic, and a homomorphic encryption scheme including a public key for encryption and a private key for decryption. The method further includes generating, using the encryption scheme, an analytic matrix based on the analytic and analytic parameters, and sending the analytic matrix and the encryption scheme to at least one server. The method includes generating, by the server and based on the encryption scheme, set of terms from a data set, evaluating the analytic matrix over the set of terms to obtain an encrypted result, and estimating a probabilistic error of the encrypted result. The method further includes sending, by the server, the encrypted result and the probabilistic error to the client where the encrypted result is decrypted.
Claims
1. A method for building an analytic matrix, comprising: determining an analytic and at least one parameter associated with the analytic; extracting, using a term generation function, a set of term elements from the analytic and the at least one analytic parameter; generating, using a keyed hash function, a set of hashes from the set of term elements; partitioning elements of the set of hashes to generate a set of vectors, with each vector of the set of vectors having a pre-defined dimension; determining whether for at least one matrix element associated with row j and column m of at least one analytic matrix, there is a vector C from the set of vectors wherein C[j]=m; assigning the at least one matrix element a non-zero value when a result of the determination is positive; and assigning the at least one matrix element a zero value when the result of the determination is negative.
2. The method according to claim 1, wherein a number of rows of the at least one analytic matrix is equal or greater than a number of elements in the set of term elements and length of the elements in the set of hashes is divisible by a number of columns of the at least one analytic matrix.
3. The method according to claim 1, wherein the non-zero value is an encrypted value of a non-zero bitmask of a term element of the set of term elements, wherein a hash of the term element is used to generate the vector C, the encrypted value being obtained using a homomorphic encryption scheme.
4. The method according to claim 1, wherein the analytic and the at least one parameter are encoded using a homomorphic encryption scheme.
5. The method according to claim 4, wherein the homomorphic encryption scheme includes a public key for encryption and a private key for decryption.
6. The method according to claim 1, further comprising extracting the set of term elements from a data set using the term generation function, the keyed hashed function, and the analytic.
7. The method according to claim 1, further comprising evaluating the at least one analytic matrix over an encrypted data set to obtain at least one encrypted result.
8. The method according to claim 7, further comprising estimating a probabilistic error bound of the at least one encrypted result.
9. The method according to claim 8, further comprising sending either or both of the at least one encrypted result and the probabilistic error bound to a client.
10. A system for building an analytic matrix, comprising: a processor; and a memory for storing instructions, the processor executing the instructions to: determine an analytic and at least one parameter associated with the analytic; extract, using a term generation function, a set of term elements from the analytic and the at least one analytic parameter; generate, using a keyed hash function, a set of hashes from the set of term elements; partition elements of the set of hashes to generate a set of vectors, with each vector of the set of vectors having a pre-defined dimension; determine whether for at least one matrix element associated with row j and column m of at least one analytic matrix, there is a vector C from the set of vectors wherein C[j]=m; assign the at least one matrix element a non-zero value when a result of the determination is positive; and assign the at least one matrix element a zero value when the result of the determination is negative.
11. The system according to claim 10, wherein a number of rows of the at least one analytic matrix is equal or greater than a number of elements in the set of term elements and length of the elements in the set of hashes is divisible by a number of columns of the at least one analytic matrix.
12. The system according to claim 10, wherein the non-zero value is an encrypted value of a non-zero bitmask of a term element of the set of term elements, wherein a hash of the term element is used to generate the vector C, the encrypted value being obtained using a homomorphic encryption scheme.
13. The system according to claim 10, wherein the analytic and the at least one parameter are encoded using a homomorphic encryption scheme.
14. The system according to claim 13, wherein the homomorphic encryption scheme includes a public key for encryption and a private key for decryption.
15. The system according to claim 10, wherein the processor is configured to extract the set of term elements from a data set using the term generation function, the keyed hashed function, and the analytic.
16. The system according to claim 10, wherein the processor is configured to evaluate the at least one analytic matrix over an encrypted data set to obtain at least one encrypted result.
17. The system according to claim 16, wherein the processor is configured to estimate a probabilistic error bound of the at least one encrypted result.
18. The system according to claim 17, wherein the processor is configured to send either or both of the at least one encrypted result and the probabilistic error bound to a client.
19. A non-transitory computer-readable storage medium having embodied thereon instructions, which when executed by at least one processor, perform steps of a method, the method comprising: determining an analytic and at least one parameter associated with the analytic; extracting, using a term generation function, a set of term elements from the analytic and the at least one analytic parameter; generating, using a keyed hash function, a set of hashes from the set of term elements; partitioning elements of the set of hashes to generate a set of vectors, with each of vector of the set of vectors having a pre-defined dimension; determining whether for at least one matrix element associated with row j and column m of at least one analytic matrix, there is a vector C from the set of vectors wherein C[j]=m; assigning the at least one matrix element a non-zero value when a result of the determination is positive; and assigning the at least one matrix element a zero value when the result of the determination is negative.
20. The non-transitory computer-readable storage medium according to claim 19, wherein the method further comprises: evaluating the at least one analytic matrix over an encrypted data set to obtain at least one encrypted result; estimating a probabilistic error bound of the at least one encrypted result; and sending either or both of the at least one encrypted result and the probabilistic error bound to a client.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) Exemplary embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
(6) The technology disclosed herein is concerned with methods and systems for performing secure probabilistic analytics over data source using a homomorphic encryption. Embodiments of the present disclosure may facilitate a secure transmission of analytics from a client device to computing resource(s) providing a target data source and secure transmission of results of analytics from the computing resource(s) back to the client device.
(7) Some embodiments of the present disclosure may be used to encrypt an analytic on a client device using an encrypted analytics matrix. The encrypted analytics matrix can be sent to computing resource(s) providing desired data source(s). The encrypted analytics matrix can be performed over desired data source(s) to produce encrypted results. The encrypted results can be returned to the client device and decrypted using the homomorphic encryption techniques. Embodiments of the present disclosure may allow performing of an analytic over desired data sources in a secure and private manner because neither content of the analytic nor results of the analytic are revealed to a data owner, observer, or attacker.
(8) According to one example embodiment of the present disclosure, a method for performing secure probabilistic analytics using an encrypted analytics matrix commences with receiving, by at least one server from a client, at least one analytic matrix. The at least one analytic matrix can be encrypted using a homomorphic encryption scheme. The homomorphic encryption scheme may include a public key for encryption and a private key for decryption. The method may further include extracting, by the at least one server and based on the homomorphic encryption scheme, a set of terms from a data set. The method may further include evaluating, by the at least one server, the at least one analytic matrix over the set of terms to obtain at least one encrypted result. The method may further include estimating, by the at least one server, a probabilistic error bound of the at least one encrypted result. The method may further include sending, by the at least one server, the at least one encrypted result and the probabilistic error bound to the client. The client can be configured to decrypt the at least one encrypted result using the homomorphic encryption scheme.
(9) Referring now to the drawings, various embodiments are described in which like reference numerals represent like parts and assemblies throughout the several views. It should be noted that the reference to various embodiments does not limit the scope of the claims attached hereto. Additionally, any examples outlined in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the appended claims.
(10)
(11) As shown in
(12) In some embodiments, the server(s) 110 may be configured to store or provide access to at least one data source(s) 115. In certain embodiments, the server(s) 110 may include a standalone computing device. In various embodiments, the data source(s) 115 may be located on a single server 110 or distributed over multiple server(s) 110. The data source 115 may include plaintext data, deterministically encrypted data, semantically encrypted data, or a combination of thereof.
(13) In some embodiments, the server(s) 110 may be implemented as cloud-based computing resource shared by multiple users. The cloud-based computing resource(s) can include hardware and software available at a remote location and accessible over a network (for example, the Internet). The cloud-based computing resource(s) can be dynamically re-allocated based on demand. The cloud-based computing resources may include one or more server farms/clusters including a collection of computer servers which can be co-located with network switches and/or routers.
(14) In various embodiments, the client(s) 105 can make certain client inquires within the environment 100. For example, the client(s) 105 may be configured to send analytics to the server 110 to be performed over the data source(s) 115. The server 110 can be configured to perform the analytics over the data source(s) 115 and return the results of analytics to the client(s) 105.
(15) To protect the content of the analytics, the client(s) 105 can be configured to encrypt the analytics to a homomorphic analytics matrix based on a homomorphic encryption scheme. The homomorphic encryption scheme can include a partially homomorphic encryption scheme or a fully homomorphic encryption scheme. The partially homomorphic encryption scheme can include one of a Rivest, Shamir and Adleman cryptosystem, Elgamal cryptosystem, Benaloh cryptosystem, Goldwasser-Micali cryptosystem, and Pallier cryptosystem. The analytics can be encrypted with use of a public (encryption) key of the homomorphic encryption scheme. The homomorphic analytic matrix and the public key can be sent to the server 110. The homomorphic analytic matrix can be only decrypted with a private (decryption) key of the homomorphic encryption scheme. The decryption key can be kept on the client(s) 105 and never provided to the server(s) 110.
(16) To protect the content of the results of the analytic, the server(s) 110 can be configured to evaluate the encrypted analytics matrix over the data source using the same homographic encryption scheme and the public key received from the client 105 and, thereby, obtain encrypted results of the analytics. The encrypted results can be sent to the client(s) 105. The client(s) 105 can decrypt the encrypted results using the private key. Because the private key is always kept on the client(s) 105, neither homomorphic analytic matrix nor encrypted results of the analytic can be decrypted on the server(s) 110 or when intercepted while in transition between the client(s) 105 and the server(s) 110.
(17)
(18) The scheme 200 may include a term generation (TG) function 210. The term generation function 210 can be used to extract a set of term elements {T} of analytic A. For, example, if analytic A is a database frequency distribution analytic for elements in <row:column> pairs, then the set {T} reflects the frequency distribution of these elements from the database.
(19) The scheme 200 may further include a keyed hash function H(T) 220. The hash function H(T) can be used to obtain a set of hashes H(T)={H(T):T in {T}}. The set of hashes H(T) denotes the range of the hash function H(T) over the set of term elements {T}. The keyed hash function H(T) can be associated with a public key used for the encryption. The number of distinct elements in the set H(T) is equal to the number of distinct elements in the set of term elements {T}.
(20) The scheme 200 may further include an analytics matrix construction module 230. The module 230 can be used to construct a homomorphic analytic matrix A_M for the analytic A associated with an analytic parameter set {A_P}. The desired dimensions s×d of the matrix A_M can be selected to satisfy the following conditions: number of row s be equal or greater than the number of distinct elements in the set of term elements {T}, s be equal or greater than d, and |H(T)| be divisible by d.
(21) The module 230 can be configured to partition each of hashes H(T) into a vector C_t of d-many bitwise components c_T[j], i=0, . . . d−1. C(H(T)) denotes a set of vectors obtained as result of partition of hashes in set {H(T)}. It should be noted that |{(C(H(T))}|=|{H(T)}|=|{T}|. For example, if d=3 and H(T)=000001001111, then vector c_T={c_T[0], c_T[1], c_T[2]} where c_T[0]=0000, c_T[0]=0100, and c_T[2]=1111.
(22) The module 230 can be further configured to determine whether c_T[d−1] is distinct for all elements in {H(T)}. If the result of the determination is negative the module 230 can select a different keyed hash function H and reconstruct {H(T)} and C(H(T)) before constructing the homographic analytic matrix A_M.
(23) Elements A_M (j,m), wherein j=0, . . . , (d−1) and m=0, . . . , (s−1) can be determined as follows. If there is an element c_T[j] in set of vectors {C(H(T))} such that c_T[j]=m, then A_M(j,m) is set to E(B_j,m) where B_j,m is a nonzero bit mask corresponding to the term element T, otherwise, A_M (j,m) is set to E(0), wherein E is the homographic encryption. In this manner, the homomorphic analytic matrix A_M includes encryptions of nonzero bitmasks for only the term elements present in the set {T}. The analytic A cannot be recovered from the homomorphic analytical matrix A_M without a private key associated with homomorphic encryption scheme E.
(24) The client 105 can be further configured to send the homomorphic analytic matrix A_M, the term generation function TG, and the hash function H(T) with the public key to the server(s) 110.
(25) In some embodiments, the server(s) 110 can be configured to extract a set of term elements {T} from the data source 115 using the term generation function TG and TG and the keyed hash function H(T). The server(s) 110 can be further configured to evaluate the homomorphic analytic matrix A_M over the set of term elements {T} to produce encrypted results E(R). The server(s) 110 can be further configured to estimate a probabilistic error bound b of the encrypted results E(R) based on a hash collision rate of the hash function H(T) over data source 115. The server(s) 110 can be further configured to send the encrypted results E(R) and the probabilistic error bound b to the client 105.
(26) The client 105 can be further configured to decrypt the encrypted results E(R) in order to obtain the results R using the private key of the homomorphic encryption scheme E.
(27)
(28) The method 300 may commence, in block 305, with receiving, by at least one server, from a client, at least one analytic matrix, a term generation function, and a keyed hash function. The at least one analytic matrix can be encrypted using the homomorphic encryption scheme. The homomorphic encryption scheme may include a public key for encryption and a private key for decryption.
(29) In block 310, the method 300 may proceed with extracting, by the at least one server, a set of term components from a data set using the term generation function and the keyed hashed function.
(30) In block 315, the method 300 may evaluate, by the at least one server, the at least one analytic matrix over the set of term components to obtain at least one encrypted result.
(31) In block 320, the method 300 may proceed with estimating, by the at least one server, a probabilistic error of the at least one encrypted result. The estimate can be based on hash collision of the keyed hash function over the data set.
(32) In block 325, the method may include sending, by the at least one server, the at least one encrypted result and the probabilistic error to the client. The client can be configured to decrypt the at least one encrypted result using the homomorphic encryption scheme.
(33)
(34) The components shown in
(35) Mass data storage 430, which can be implemented with a magnetic disk drive, solid state drive, or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 410. Mass data storage 430 stores the system software for implementing embodiments of the present disclosure for purposes of loading that software into main memory 420.
(36) Portable storage device 440 operates in conjunction with a portable non-volatile storage medium, such as a flash drive, floppy disk, compact disk, digital video disc, or Universal Serial Bus (USB) storage device, to input and output data and code to and from the computer system 400 of
(37) User input devices 460 can provide a portion of a user interface. User input devices 460 may include one or more microphones, an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. User input devices 460 can also include a touchscreen. Additionally, the computer system 400 as shown in
(38) Graphics display system 470 include a liquid crystal display (LCD) or other suitable display device. Graphics display system 470 is configurable to receive textual and graphical information and processes the information for output to the display device.
(39) Peripheral devices 480 may include any type of computer support device to add additional functionality to the computer system.
(40) The components provided in the computer system 400 of
(41) The processing for various embodiments may be implemented in software that is cloud-based. In some embodiments, the computer system 400 is implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud. In other embodiments, the computer system 400 may itself include a cloud-based computing environment, where the functionalities of the computer system 400 are executed in a distributed fashion. Thus, the computer system 400, when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
(42) In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices. Systems that provide cloud-based resources may be utilized exclusively by their owners or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
(43) The cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as the computer system 400, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
(44) The present technology is described above with reference to example embodiments. Therefore, other variations upon the example embodiments are intended to be covered by the present disclosure.