Decentralized file system and message bus architecture for processing training sets in multi-cloud computing environment
11507540 · 2022-11-22
Assignee
Inventors
- Stephen J. Todd (Shrewsbury, MA)
- Kun Wang (Beijing, CN)
- Layne Lin Peng (Shanghai, CN)
- Pengfei Wu (Shanghai, CN)
Cpc classification
International classification
G06F16/00
PHYSICS
Abstract
In a multi-cloud computing environment comprising a plurality of cloud platforms, wherein one cloud platform is a source of a model and a data set and further wherein the model is to be executed against the data set on one or more of the other cloud platforms, the method maintains a decentralized architecture comprising a file system and a message bus, wherein the file system comprises a plurality of decentralized file system nodes corresponding to the plurality of cloud platforms, and the message bus comprises a plurality of decentralized message bus nodes corresponding to the plurality of cloud platforms. Further, the method manages sharing of the model and the data set via at least a portion of the decentralized file system nodes and manages messaging related to execution of the model against the data set via at least a portion of the decentralized message bus nodes.
Claims
1. A method comprising: in a given cloud platform comprising a source of a model and a data set, the given cloud platform communicating with a multi-cloud computing environment comprising one or more of other cloud platforms, the given cloud platform: storing the model and the data set as one or more local files; maintaining a given decentralized file system node and a given decentralized message bus node of a decentralized architecture of the multi-cloud computing environment, the decentralized architecture comprising a file system and a message bus; sharing the model and the data set with the one or more of the other cloud platforms via the given decentralized file system node; sending one or more messages related to execution of the model against the data set to the one or more of the other cloud platforms via the given decentralized message bus node, the one or more messages comprising instructions regarding execution of the model against the data set to enable execution across the one or more of the other cloud platforms; enabling, via the given decentralized file system node, access to and receipt of first results of the execution of the model against the data set by a first execution cloud platform of the one or more of the other cloud platforms; and enabling, via the given decentralized file system node, access to and receipt of second results of at least a subsequent execution of the model against the data set by a second execution cloud platform of the one or more of the other cloud platforms; wherein the message bus comprises a distributed ledger system; wherein the file system comprises a content address-based distributed file system; and wherein the method is implemented via one or more processing devices each comprising a processor coupled to a memory.
2. The method of claim 1, further comprising the given decentralized file system node of the given cloud platform receiving the model and the data set from the given cloud platform.
3. The method of claim 2, further comprising the given decentralized file system node of the given cloud platform enabling access of the model and the data set to a first decentralized file system node of the first execution cloud platform of the one or more of the other cloud platforms to permit execution of the model against the data set by the first execution cloud platform.
4. The method of claim 3, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a first decentralized message bus node of the first execution cloud platform indicating availability of the first results.
5. The method of claim 1, wherein the model comprises a training model in the form of an analytic algorithm and wherein the data set against which the model is executed is a training data set.
6. The method of claim 3, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a second decentralized message bus node of the second execution cloud platform indicating availability of the second results.
7. The method of claim 1, wherein: the given cloud platform comprises one or more private cloud platforms; and each of the first execution cloud platform and the second execution cloud platform comprises one or more public cloud platforms.
8. An article of manufacture comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device to: in a given cloud platform comprising a source of a model and a data set, the given cloud platform communicating with a multi-cloud computing environment comprising one or more of other cloud platforms; store the model and the data set as one or more local files; maintain a given decentralized file system node and a given decentralized message bus node of a decentralized architecture of the multi-cloud computing environment, the decentralized architecture comprising a file system and a message bus; share the model and the data set with the one or more of the other cloud platforms via the given decentralized file system node; send one or more messages related to execution of the model against the data set to the one or more of the other cloud platforms via the given decentralized message bus node, the one or more messages comprising instructions regarding execution of the model against the data set to enable execution across the one or more of the other cloud platforms; enable, via the given decentralized file system node, access to and receipt of first results of the execution of the model against the data set by a first execution cloud platform of the one or more of the other cloud platforms; and enable, via the given decentralized file system node, access to and receipt of second results of at least a subsequent execution of the model against the data set by a second execution cloud platform of the one or more of the other cloud platforms; wherein the message bus comprises a distributed ledger system; and wherein the file system comprises a content address-based distributed file system.
9. The article of manufacture of claim 8, further comprising the given decentralized file system node of the given cloud platform receiving the model and the data set from the given cloud platform.
10. The article of manufacture of claim 9, further comprising the given decentralized file system node of the given cloud platform enabling access of the model and the data set to a first decentralized file system node of the first execution cloud platform of the one or more of the other cloud platforms to permit execution of the model against the data set by the first execution cloud platform.
11. The article of manufacture of claim 10, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a first decentralized message bus node of the first execution cloud platform indicating availability of the first results.
12. The article of manufacture of claim 8, wherein the model comprises a training model in the form of an analytic algorithm and wherein the data set against which the model is executed is a training data set.
13. The article of manufacture of claim 11, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a second decentralized message bus node of the second execution cloud platform indicating availability of the second results.
14. The article of manufacture of claim 8, wherein: the given cloud platform comprises one or more private cloud platforms; and each of the first execution cloud platform and the second execution cloud platform comprises one or more public cloud platforms.
15. A system comprising: in a given cloud platform comprising a source of a model and a data set, the given cloud platform communicating with a multi-cloud computing environment comprising one or more of other cloud platforms; one or more processing devices configured to: store the model and the data set as one or more local files; maintain a given decentralized file system node and a given decentralized message bus node of a decentralized architecture of the multi-cloud computing environment, the decentralized architecture comprising a file system and a message bus; share the model and the data set with the one or more of the other cloud platforms via the given decentralized file system node; send one or more messages related to execution of the model against the data set to the one or more of the other cloud platforms via the given decentralized message bus node, the one or more messages comprising instructions regarding execution of the model against the data set to enable execution across the one or more of the other cloud platforms; enable, via the given decentralized file system node, access to and receipt of first results of the execution of the model against the data set by a first execution cloud platform of the one or more of the other cloud platforms; and enable, via the given decentralized file system node, access to and receipt of second results of at least a subsequent execution of the model against the data set by a second execution cloud platform of the one or more of the other cloud platforms; wherein the message bus comprises a distributed ledger system; and wherein the file system comprises a content address-based distributed file system.
16. The system of claim 15, further comprising the given decentralized file system node of the given cloud platform receiving the model and the data set from the given cloud platform.
17. The system of claim 16, further comprising the given decentralized file system node of the given cloud platform enabling access of the model and the data set to a first decentralized file system node of the first execution cloud platform of the one or more of the other cloud platforms to permit execution of the model against the data set by the first execution cloud platform.
18. The system of claim 17, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a first decentralized message bus node of the first execution cloud platform indicating availability of the first results.
19. The system of claim 18, further comprising receiving by the given decentralized message bus node of the given cloud platform a message through a second decentralized message bus node of the second execution cloud platform indicating availability of the second results.
20. The system of claim 15, wherein: the given cloud platform comprises one or more private cloud platforms; and each of the first execution cloud platform and the second execution cloud platform comprises one or more public cloud platforms.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) Illustrative embodiments will be described herein with reference to exemplary information processing systems (referred to as training systems) and associated host devices, storage devices and other processing devices. It is to be appreciated, however, that embodiments are not restricted to use with the particular illustrative system and device configurations shown. Accordingly, the terms “information processing system” and “training system” as used herein are intended to be broadly construed, so as to encompass, for example, processing systems comprising cloud computing and storage systems, as well as other types of processing systems comprising various combinations of physical and virtual computing resources. An information processing system or training system may therefore comprise, for example, a cloud infrastructure hosting multiple tenants that share cloud computing resources. Such systems are considered examples of what are more generally referred to herein as cloud computing environments. A cloud computing environment with multiple cloud platforms is referred to as a “multi-cloud computing environment.” As mentioned above, enterprises may desire to deploy and execute a training model against some training data on such a multi-cloud computing environment. The training system is referred to as a “multi-cloud training system.” The term “enterprise” as used herein is intended to be broadly construed, and may comprise, for example, one or more businesses, one or more corporations or any other one or more entities, groups, or organizations. An “entity” as illustratively used herein may be a person or system.
(12)
(13) In order for each of the workers 122-1, 122-2, 122-3 and 122-4 to run the training model 112 against the training data set 114, the training model 112 and the training data set 114 must be transferred to the public cloud environment 120 (as graphically illustrated in
(14) The workers 122-1, 122-2, 122-3 and 122-4 could also experience significantly more inefficiency when they are distributed across multiple clouds. The decision to run workers across cloud platforms could occur for any number of reasons including, but not limited to:
(15) (i) A given cloud may not have enough compute, network, and or storage resources for all workers.
(16) (ii) The model owner may wish to configure the workers across multiple clouds for availability/performance reasons.
(17) (iii) The model owner may wish to configure the workers across multiple clouds for evaluation purposes (evaluating the performance of multiple clouds, e.g., a bake-off).
(18) (iv) The model owner may not wish to have all the data processed on one cloud for security reasons.
(19) (v) The model owner may wish to quickly deploy all workers to a different cloud for cost and/or speed purposes.
(20) In addition, an enterprise may wish to achieve fluidity in their choice of public cloud providers.
(21) As shown in
(22) As described above, there are a variety of ways to transfer training data sets and training models from a master node of a private cloud to the workers in a given public cloud. One approach is to transfer the training data set and the training model to each worker, while another is to transfer the training data set/training model (once) to a shared location available to all the workers.
(23) The following problems arise with respect to transferring the training model and training data set. Note that many of these problems are common to workers that run within the context of one cloud as well as across multiple clouds.
(24) Wait Time for Worker Start
(25) Each worker must wait for the receipt of the entire training model and entire training data set before they can start. This increases the overall completion time of the analytic system (e.g., the training window takes longer to complete).
(26) Data Duplication
(27) Should the system transfer identical copies of the training data set to each worker, a penalty will be paid for: (a) the cost of multiple network transfers; and (b) the cost of duplicate data storage.
(28) Latency for Shared Storage
(29) Should the system place the training data set on a shared storage location available to all the workers, this shared location may not have ideal latency characteristics, especially in environments where workers are distributed across multiple clouds.
(30) Preservation of Worker Results
(31) As workers complete their runs and return results (e.g., parameters) to a master node, the record of these results is often lost. Preservation of these results on a per-worker basis can be used for a variety of purposes, not the least of which is to serve as a permanent record of how quickly the worker came up with their results, and what those results were.
(32) Visibility of Cross-Worker Results
(33) Worker results may be visible to the master node but not to other workers. This prevents slower workers from recognizing delays in their processing (as opposed to the faster workers) and also prohibits them from taking corrective action based on this knowledge.
(34) Specialized Application Programming Interfaces (APIs) for Multi-Cloud Targets
(35) When dealing with multiple cloud provider targets, each target platform has their own nuances for file transfer and collection of results, these nuances become limiting and often require specialized coding, which can slow down implementation and/or introduce errors into the process.
(36) Multi-Cloud Pipelining of Models
(37) The output of one public cloud analytic model may then be fed into a model (different or the same) running on a different cloud.
(38) Illustrative embodiments overcome the above and other challenge in existing training systems by providing a decentralized file system and message bus architecture.
(39)
(40) As shown in
(41) Further, as shown in
(42) It is to be appreciated that one or more computing devices at each of the cloud platforms may implement each dFS node and each dMB node.
(43) In one or more illustrative embodiments, the decentralized file system and message bus architecture 400 can be implemented using a decentralized ledger or blockchain network (or RabbitMQ-type framework) and a decentralized data sharing system such as the Interplanetary File System (IPFS) protocol. That is, in illustrative embodiments, the dMB nodes and the dFS nodes in
(44) As used herein, the terms “blockchain,” “digital ledger” and “blockchain digital ledger” may be used interchangeably. As is known, the blockchain or digital ledger protocol is implemented via a distributed, decentralized computer network of compute nodes (e.g., dMB nodes in
(45) In the case of a “bitcoin” implementation of a blockchain distributed ledger, the blockchain contains a record of all previous transactions that have occurred in the bitcoin network. The bitcoin system was first described in S. Nakamoto, “Bitcoin: A Peer to Peer Electronic Cash System,” 2008, the disclosure of which is incorporated by reference herein in its entirety.
(46) A key principle of the blockchain is that it is trusted. That is, it is critical to know that data in the blockchain has not been tampered with by any of the compute nodes in the computer network (or any other node or party). For this reason, a cryptographic hash function is used. While such a hash function is relatively easy to compute for a large data set, each resulting hash value is unique such that if one item of data in the blockchain is altered, the hash value changes. However, it is realized that given the constant generation of new transactions and the need for large scale computation of hash values to add the new transactions to the blockchain, the blockchain protocol rewards compute nodes that provide the computational service of calculating a new hash value. In the case of a Bitcoin network, a predetermined number of bitcoins are awarded for a predetermined amount of computation. The compute nodes thus compete for bitcoins by performing computations to generate a hash value that satisfies the blockchain protocol. Such compute nodes are referred to as “miners.” Performance of the computation of a hash value that satisfies the blockchain protocol is called “proof of work.” While bitcoins are one type of reward, blockchain protocols can award other measures of value (monetary or otherwise) to successful miners.
(47) It is to be appreciated that the above description represents an illustrative implementation of the blockchain protocol and that embodiments of the invention are not limited to the above or any particular blockchain protocol implementation. As such, other appropriate processes may be used to securely maintain and add to a set of data in accordance with embodiments of the invention. For example, distributed ledgers such as, but not limited to, R3 Corda, Ethereum, and Hyperledger may be employed in alternative embodiments.
(48) As mentioned above, a data sharing system such as the IPFS protocol may be employed in the decentralized file system and message bus architecture 400. IPFS is an open-source protocol that provides a decentralized method of storing and sharing files relying on a content-addressable, peer-to-peer hypermedia distribution. The compute nodes in an IPFS network form a distributed file system. The IPFS protocol was developed to replace the HyperText Transfer Protocol (HTTP) of the Internet which relies on location addressing (i.e., using Internet Protocol (IP) addresses to identify the specific computing resource that is hosting a desired data set). As such, the subject data set must be retrieved from the computing resource where it originated or some computing resource within the content delivery network (CDN) each time the data set is requested.
(49) IPFS operates by operatively coupling computing resources with the same system of files via a system of nodes (e.g., dFS nodes in
(50) In one example, the IFPS system is further described in J. Benet, “IPFS—Content Addressed, Versioned, P2P File System,” 2014, the disclosure of which is incorporated by reference herein in its entirety. However, illustrative embodiments are not limited to this particular data sharing system and alternative systems may be employed.
(51) Returning now to
(52) Cloud-Neutral Presentation of Data, Models, and Results
(53) By using the architecture depicted in
(54) This feature solves a number of problems including, but not limited to, the amount of time it would take to transfer the entire file to one (or more) of the cloud providers. It also addresses many of the manageability issues that go along with these file transfers (e.g., account addresses, permissions, etc.).
(55)
(56) Cloud Neutral Message Bus for Running Models
(57) Once a training model and a training data set have been loaded into a dFS node of dFS 410, the owner that wishes to run the model can simply insert a message into the decentralized message bus 420 (
(58) Instant Launch and Completion/Results
(59) Once a command has been issued to the dMB 420, the target cloud platform instantly begins running the training model against the training data set. This significantly shortens model iteration time for a number of reasons:
(60) (i) The remote cloud does not have to wait for the training model, and the entire training data set, to be transferred over.
(61) (ii) The local user does not have to step through the manual processes involved with transferring the entirety of both files, maintaining/keeping/deleting previous versions, dealing with user accounts, etc.
(62) (iii) Publishing of results is similarly instant; the cloud provider can simply notify the decentralized message bus that the run is complete. Neither the local user or the remote cloud has to manually transfer the results back to the central location.
(63) Reduced Data Transfers/Costs
(64) Training data sets can be large, and the transfer of the entire training data set to a public cloud provider will come with a cost. This is especially wasteful since an analytic training model will likely only access a portion of the training data set (and not the entire file).
(65) Accordingly, in one or more illustrative embodiments of the decentralized file system, only the requested segments of the file are transferred. In addition, once these segments cross over to the public cloud to a specific worker, that worker keeps (or “pins”) the segment locally so that it can be shared with other workers. The optimization of only transferring the data (once) that is needed to the public cloud not only speeds up the overall run time but it also reduces the bill that the public cloud provider will send when counting bytes for data transfer.
(66)
(67) If any of the workers 222-1, 222-2, 222-3 and 222-4 in public cloud platform 220 request a data segment that has already been stored in the dFS node 412-1 (e.g., worker 222-1 requesting “Range1” in
(68) Efficient Pipelining
(69) As multi-stage analytic jobs execute across multiple cloud platforms, the intermediate published results can simply be placed in a dFS folder and a message signaled to the next cloud via the dMB. This eliminates the bottleneck of pushing results back to a central arbiter (e.g., master node of the private cloud platform 210), which in turn pushes these results (and additional data) to the next processing cloud in the pipeline.
(70) More particularly,
(71)
(72) At least portions of the decentralized file system and message bus architecture shown in
(73) As is apparent from the above, one or more of the processing modules or other components of the decentralized file system and message bus architecture shown in
(74) The processing platform 1000 in this embodiment comprises a plurality of processing devices, denoted 1002-1, 1002-2, 1002-3, . . . 1002-N, which communicate with one another over a network 1004.
(75) The network 1004 may comprise any type of network, including by way of example a global computer network such as the Internet, a WAN, a LAN, a satellite network, a telephone or cable network, a cellular network, a wireless network such as a WiFi or WiMAX network, or various portions or combinations of these and other types of networks.
(76) As mentioned previously, some networks utilized in a given embodiment may comprise high-speed local networks in which associated processing devices communicate with one another utilizing Peripheral Component Interconnect Express (PCIe) cards of those devices, and networking protocols such as InfiniBand, Gigabit Ethernet or Fibre Channel.
(77) The processing device 1002-1 in the processing platform 1000 comprises a processor 1010 coupled to a memory 1012.
(78) The processor 1010 may comprise a microprocessor, a microcontroller, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other type of processing circuitry, as well as portions or combinations of such circuitry elements.
(79) The memory 1012 may comprise random access memory (RAM), read-only memory (ROM) or other types of memory, in any combination. The memory 1012 and other memories disclosed herein should be viewed as illustrative examples of what are more generally referred to as “processor-readable storage media” storing executable program code of one or more software programs.
(80) Articles of manufacture comprising such processor-readable storage media are considered embodiments of the present disclosure. A given such article of manufacture may comprise, for example, a storage array, a storage disk or an integrated circuit containing RAM, ROM or other electronic memory, or any of a wide variety of other types of computer program products. The term “article of manufacture” as used herein should be understood to exclude transitory, propagating signals. Numerous other types of computer program products comprising processor-readable storage media can be used.
(81) Also included in the processing device 1002-1 of the example embodiment of
(82) The other processing devices 1002 of the processing platform 1000 are assumed to be configured in a manner similar to that shown for processing device 1002-1 in the figure.
(83) Again, this particular processing platform is presented by way of example only, and other embodiments may include additional or alternative processing platforms, as well as numerous distinct processing platforms in any combination, with each such platform comprising one or more computers, servers, storage devices or other processing devices.
(84) For example, other processing platforms used to implement embodiments of the disclosure can comprise different types of virtualization infrastructure, in place of or in addition to virtualization infrastructure comprising virtual machines. Such virtualization infrastructure illustratively includes container-based virtualization infrastructure configured to provide Docker containers or other types of Linux containers (LXCs).
(85) The containers may be associated with respective tenants of a multi-tenant environment, although in other embodiments a given tenant can have multiple containers. The containers may be utilized to implement a variety of different types of functionality within the system. For example, containers can be used to implement respective cloud compute nodes or cloud storage nodes of a cloud computing and storage system. The compute nodes or storage nodes may be associated with respective cloud tenants of a multi-tenant environment. Containers may be used in combination with other virtualization infrastructure such as virtual machines implemented using a hypervisor.
(86) As another example, portions of a given processing platform in some embodiments can comprise converged infrastructure such as VxRail™, VxRack™ or Vblock® converged infrastructure commercially available from VCE, the Virtual Computing Environment Company, now the Converged Platform and Solutions Division of Dell EMC. For example, portions of a system of the type disclosed herein can be implemented utilizing converged infrastructure.
(87) It should therefore be understood that in other embodiments different arrangements of additional or alternative elements may be used. In many embodiments, at least a subset of these elements may be collectively implemented on a common processing platform, or each such element may be implemented on a separate processing platform.
(88) Also, in other embodiments, numerous other arrangements of computers, servers, storage devices or other components are possible in the decentralized file system and message bus architecture. Such components can communicate with other elements of the system over any type of network or other communication media.
(89) As indicated previously, in some embodiments, components of the decentralized file system and message bus architecture as disclosed herein can be implemented at least in part in the form of one or more software programs stored in memory and executed by a processor of a processing device. For example, at least portions of the dFS, the dMB, or other system components are illustratively implemented in one or more embodiments in the form of software running on a processing platform comprising one or more processing devices.
(90) It should again be emphasized that the above-described embodiments of the disclosure are presented for purposes of illustration only. Many variations and other alternative embodiments may be used. For example, the disclosed techniques are applicable to a wide variety of other types of systems. Also, the particular configurations of system and device elements, associated processing operations and other functionality illustrated in the drawings can be varied in other embodiments. Moreover, the various assumptions made above in the course of describing the illustrative embodiments should also be viewed as exemplary rather than as requirements or limitations of the embodiments. Numerous other alternative embodiments within the scope of the appended claims will be readily apparent to those skilled in the art.