Distributed computer system for delivering data

11663209 · 2023-05-30

Assignee

Inventors

Cpc classification

International classification

Abstract

A distributed computer system for delivering data to client-side application(s) is provided. The system includes a database configured to store immutable data blocks, a data distribution entity configured to split source-data into immutable data blocks and metadata. The data distribution entity is configured to replicate and store the data blocks on different storage nodes(s) of the database. The metadata comprises values referencing the data blocks for a key-value database call. The system further comprises a data fetching/delivering entity with a fuse-daemon configured to form a quorum read request for data block(s) out of a client-side request for a certain data range. The quorum-read request is a bundle of parallel requests to different storage nodes. The fuse-daemon is configured to fetch the data blocks delivered in the fastest response and to discard the rest. The fuse-daemon generates a virtual file comprising the corresponding range of data from the fetched data blocks.

Claims

1. A distributed computer system having at least one hardware processor and at least one memory, the distributed computer system for delivering data to at least one client-side application, the distributed computer system comprising: a database configured to store immutable data blocks, a data distribution entity configured to split source-data in the form of a data file into immutable data blocks and metadata, wherein the data distribution entity is configured to replicate the immutable data blocks and to store the immutable data blocks on at least two different storage nodes of the database, wherein the metadata comprises values referencing the immutable data blocks in the at least two storage nodes for a key-value database call, a data fetching and delivering entity comprising: a fuse-daemon configured to translate a request for a range of data of a file triggered by at least one client-side application into a quorum-read request for at least one immutable data block to the database, wherein the quorum-read request comprises a plurality of individual parallel requests to different storage nodes storing the same immutable data block, wherein the fuse-daemon is configured to fetch the data blocks delivered by the database in the fastest response and is configured to discard results delivered subsequently to the fastest response, wherein upon receiving the fastest response, the fuse-daemon orders the storage nodes of the database to cancel processing of the not yet served requests, wherein one of the different storage nodes responds more quickly than others of the different storage nodes to the quorum-read request and provides the fastest response, and wherein the quorum-read request gives an exact same result for each of the individual parallel requests of the quorum-read request, wherein the fuse-daemon is configured to generate a virtual file comprising the corresponding range of data from the fetched data blocks.

2. The distributed computer system of claim 1, wherein the data fetching and delivering entity comprises an operating system having an operating system page cache, wherein the operating system page cache is configured to store at least the parts of the fetched data blocks corresponding to the range of data of a file requested by the client-side application.

3. The distributed computer system of claim 1, wherein the data in the virtual file is retrievable by the at least one client-side application as response to the request for a range of data of a file and as response to any future request for the same range of data of a file.

4. The distributed computer system of claim 1, wherein the database comprising the storage nodes is a NoSQL database.

5. The distributed computer system of claim 1, wherein the values comprised by the metadata are referencing values referencing the immutable data blocks for a key-value database call, these referencing values being generated through hash values of each corresponding data block.

6. The distributed computer system of claim 1, wherein the database is configured to deliver data corresponding to the file including data of a data block outside of the requested range of the file.

7. The distributed computer system of claim 6, wherein the virtual file comprises, along with the range of data requested, the metadata of data blocks corresponding to data outside of the requested range of the file.

8. The distributed computer system of claim 6, wherein the fuse daemon is configured to generate the virtual file comprising the metadata.

9. The distributed computer system of claim 1, wherein at least two client-side applications employ a common middleware client library to access the virtual file.

10. The distributed computer system of claim 1, wherein the fuse-daemon is configured to perform the query for the quorum read operation at least three times in parallel and wherein the same data blocks are stored at least three times on at least three different database storage nodes.

11. The distributed computer system of claim 10, wherein the fuse-daemon is configured to perform the query for the quorum read operation three times in parallel and wherein the same data blocks are stored five times on at least five different database storage nodes.

12. The distributed computer system of claim 2, wherein the operating system of the data fetching and delivering entity is a UNIX based operating system.

13. The distributed computer system of claim 1, wherein the fuse daemon is configured to select and to fetch data blocks based on the range of the data of the file by selecting and fetching those immutable data blocks that contain the data within the requested range of data.

14. The distributed computer system of claim 13, wherein before the quorum read request for at least one immutable data block is issued by the data fetching and delivering entity, a request from the client application to the fuse daemon comprising the name of the data file is issued when a request for opening the data file is issued by the client application.

15. The distributed computer system of claim 14, wherein the fuse daemon is further configured to retrieve metadata of the file that comprises the list of blocks with their hash value, the size of the file and the block size, and is to store the metadata on the data fetching and delivering entity, wherein the metadata is retrieved when a request for opening the data file performed by the client application is issued.

16. The distributed computer system of claim 15, wherein the fuse daemon is to translate the request for a range of data of a file to the quorum read request using the information contained in the metadata, the translation comprising a calculation involving the size of the file, the block size and the requested range, using a calculation rule also obtained from the metadata of the file.

17. A computer-implemented method of delivering data to at least one client-side application, the method comprising: splitting, via a distributed computer system, source-data in the form of a data file into immutable data blocks and metadata, replicating and storing, via the distributed computer system, said data blocks on at least two different storage nodes of a database, wherein the metadata comprises values referencing the immutable data blocks in the at least two storage nodes for a key-value database call, translating, via the distributed computer system, a request for a range of data of a file triggered by at least one client-side application into a quorum-read request for at least one immutable data block to the database, wherein the quorum-read request comprises a plurality of individual parallel requests to different storage nodes storing the same immutable data block, fetching, via the distributed computer system, the data blocks delivered by the database in the fastest response and discarding results delivered subsequently to the fastest response, upon receiving the fastest response, ordering the storage nodes of the database to cancel processing of the not yet served requests, wherein one of the different storage nodes responds more quickly than others of the different storage nodes to the quorum-read request and provides the fastest response, and wherein the quorum-read request gives an exact same result for each of the individual parallel requests of the quorum-read request, generating, via the distributed computer system, a virtual file comprising the corresponding range of data from the fetched data blocks.

18. A non-transitory computer program product comprising program code instructions stored on a non-transitory computer readable medium to execute a computer-implemented method of delivering data to at least one client-side application, the method comprising: splitting, via a distributed computer system, source-data in the form of a data file into immutable data blocks and metadata, replicating and storing, via the distributed computer system, said data blocks on at least two different storage nodes of a database, wherein the metadata comprises values referencing the immutable data blocks in the at least two storage nodes for a key-value database call, translating, via the distributed computer system, a request for a range of data of a file triggered by at least one client-side application into a quorum-read request for at least one immutable data block to the database, wherein the quorum-read request comprises a plurality of individual parallel requests to different storage nodes storing the same immutable data block, fetching, via the distributed computer system, the data blocks delivered by the database in the fastest response and discarding results delivered subsequently to the fastest response, upon receiving the fastest response, ordering the storage nodes of the database to cancel processing of the not yet served requests, wherein one of the different storage nodes responds more quickly than others of the different storage nodes to the quorum-read request and provides the fastest response, and wherein the quorum-read request gives an exact same result for each of the individual parallel requests of the quorum-read request, generating, via the distributed computer system, a virtual file comprising the corresponding range of data from the fetched data blocks.

Description

BRIEF DESCRIPTIONS OF THE DRAWINGS

(1) Examples of the invention are now described, also with reference to the accompanying drawings, wherein

(2) FIG. 1 shows an example of a distributed computer system with a data distribution entity, a database and a data fetching and delivering entity as well as client applications and a fuse daemon, with immutable data blocks being replicated and stored by the data distribution entity on the database and queried via a quorum read by the data fetching and delivering entity.

(3) FIG. 2 shows an example of a No SQL database on which data blocks of a versioned file and metadata regarding that file are stored.

(4) FIG. 3 shows an example of a data fetching and delivering entity in communication with the database for retrieving the data blocks and the metadata to a virtual file stored on the data fetching and delivering entity.

(5) FIG. 4 shows an example of a flow chart illustrating a method of delivering data from a database to at least one client-side application, in a distributed computer environment.

(6) The drawings and the description of the drawings are of examples of the invention and are not of the invention itself. Like reference signs refer to like elements throughout the following description of examples.

DESCRIPTION

(7) An example of a distributed computer system 100 with a data distribution entity 3, a database 30 and a data fetching and delivering entity 50 as well as client applications 1, 1′ and a fuse daemon 51, with immutable data blocks 20 being replicated and stored by the data distribution entity on the database 30 and queried via a quorum read 65 by the data fetching and delivering entity 50 is shown in FIG. 1.

(8) A data producer application 23 is configured to produce a data file 5, for example, an airport list, exported from an airport dataset. The data producer application 23 is configured to send the generated data file 5 to the data a data distribution entity 3. The data distribution entity 3 comprises a distribution box 3′ (also referred to as “distribox”) and a distribox back end 3″ with the function of a persistent file storage for the data file 5. The distribox 3′ is to import the file 5 from the distribox back end 3″. The data file 5 comprises data that is, for example, segmented into immutable data blocks 20. The distribox 3′ is configured to replicate all or at least a significant part of the immutable data blocks 20 comprised by the data file 5 and store the replicated immutable data blocks 20 storage nodes 32 of a database 30.

(9) In the example illustrated by FIG. 1, the immutable data blocks are replicated five times and are stored on five different storage nodes 32. Hence, in total, twenty-five data blocks 32 are stored on the data base. When the data file 5 is segmented into five different immutable data blocks 20, each storage node 32 may contain a copy of each different immutable data block 20.

(10) Client-side applications 1, 1′ might run on respective application servers, indicated by the dashed boxes comprising the client-side applications 1, 1′. Those data consuming applications might are connected to a data fetching and delivering entity 50, which is, for example, an Openshift node. The client-side applications 1, 1′ might send a request for a range of data of a file 60 via a middleware that is common to both client-side applications 1, 1′ and an application back-end interface 51″ to the data fetching and delivering entity 50. From the data fetching and delivering entities side, this communication may take place via file interface 91. The file interface 91 supports sequential and random read access.

(11) If the request for a range of data of file 60 is triggered for the first time, no data corresponding to the requested range of data can be delivered directly from the virtual file 72. Therefore, the data blocks 20 corresponding to said data range 73, 74 (see FIGS. 2 and 3) have to be fetched from the data base 30. For this purpose, the data fetching and delivering entity 50 employs a fuse daemon 51. This fuse daemon is configured to translate the request for a range of data of the file 60 into a quorum-read request 65 for at least one immutable data block 20 to the database 30. After performing this translation to a quorum read request 65 (see the example translation explained in the section before), the fuse daemon 51 issues this request towards the database 30.

(12) Herein, the quorum-read request 65 comprises three individual parallel requests 66 to different storage nodes 32 storing the same immutable data block 20. The parallel requests 66 are identical with respect to the immutable data block 20 they are directed to. As such, all parallel requests 66 are, for example, directed to the first two immutable data blocks 20 of a file 5 comprising overall ten immutable data blocks 20.

(13) The quorum read request 65 is then processed by every storage node 32 to which one of the individual requests 66 is sent and send answers in the form of the requested data blocks 20 towards the data fetching and delivering entity 50. However, the data fetching and delivering entity 50 may receive every response but discards every response except the fastest response 70. Alternatively, the data fetching and delivering entity 50 may receive the fastest response 70 and cause the storage nodes 32 to stop processing those requests that have not been answered yet.

(14) The fuse daemon 51 may receive the data block(s) 20 contained in the fastest response 70 and store these data blocks in a virtual file 72. This virtual file 72 is stored in a page cache 52 of a operating system 51, for example, a LINUX operating system. The fuse daemon 51 is configured to keep fetched data blocks 20 in the page cache 52 as long as possible, hence, until the data file 5 and with it—the data block—has changed. The originally requested range of data 73, 74 may be cut out of the fetched immutable data blocks 20 resident in the page cache's 52 virtual file 72.

(15) From the virtual file 72 in the page cache 52, the requested range of data 73, 74 is delivered in a response to the client-side application(s) 1, 1′ in a response 69 to the initial request. As the fetched data blocks 20 are held in the virtual file 72, these immutable data blocks 20 could be used for any future request for a range of data 73, 74 comprised by the previously fetched data blocks. This significantly reduces the processing times for future requests for the same range of data or a range of data 73, 74 also comprised by previously fetched immutable data blocks 20.

(16) An example of a No SQL database on which data blocks of a versioned file and metadata regarding that file are stored is shown in FIG. 2.

(17) Three versions of a data file, namely the latest three versions data file, namely data file 5, data file 5′, and data file 5″ are kept in an external storage, such as the persistent file storage of a distribox back end 3″ shown in FIG. 1. As these data files 5, 5′, 5″ are external to the illustrated content of NoSQL database 30′, they are illustrated in FIG. 2 by dashed shapes. The NoSQL database 30′ is a possible implementation of the database 30, shown in FIG. 1.

(18) The storage node 32 of NoSQL database 30′, illustrated in more detail in FIG. 2, may store three different immutable data blocks 20, 20′, 20″. These immutable data blocks 20, 20′, 20″ are comprised by the latest version of the data file, namely data file 5 and are replicated, among others, to said storage node 32. These immutable data blocks cover two ranges of data 73, 74. The first range of data 73 is comprised by the data block 20 and the data block 20′, whereas the second range of data 74 is comprised by the data block 20′ and the data block 20″. The first range of data 73 might be a range of data corresponding to an actual request for a range of data of a file 60, while the second range of data 74 may correspond to the remaining data of a file 74, outside of the requested range 73.

(19) The NoSQL database 30′ stores metadata 4 that is related to the immutable data blocks 20, 20′, 20″ is stored. The metadata 4 comprises values 42 which are referencing the immutable data blocks 20, 20′, 20″ for a key-value database call. In the example illustrated by FIG. 2, these referencing values 42 are hash values 42′. For example, each immutable data block 20, 20′, 20″ is associated with a different unique hash value 42′.

(20) The metadata 4 of the data file 5 comprises a list of blocks with their hash value 42′, the size of the file 43 and the block size 44. The metadata 4 of the data file 5 further comprises a calculation rule 45 being readily exposed for retrieval by the data fetching and delivering entity 50 (see FIG. 1). The calculation rule 45 involves a rule for translating a request for a certain range of a file 73 with a certain offset, to a request for certain immutable data blocks 20, 20′, 20″. This calculation rule 45 uses the size of the file 43 and the block size 44 stored in the list of blocks to put out the hash values for which the data fetching and delivering entity 50 has to query in order to obtain the immutable data blocks 20, 20′ corresponding to those two data blocks 20, 20′ that comprise the range of data of the file 73. For an example of a such a calculation rule and the application of that rule in order to fetch particular immutable data blocks 20, 20′, 20″ it is referred to the general description.

(21) Besides the storage node 32, the NoSQL database 30′ comprises further storage nodes, such as the storage node 32′, on which, for example, the data blocks 20, 20′, 20″ are stored as well.

(22) An example of a data fetching and delivering entity in communication with the database for retrieving the data blocks and the metadata to a virtual file stored on the data fetching and delivering entity is shown in FIG. 3.

(23) The data consuming application 1 issues a request for opening the data file 61′ to be read out to the fuse daemon 61′. Concurrently with this request 61 or subsequent to this request, a request 62′ comprising the name of the data file is sent from the client application 1 to the fuse daemon 51. The fuse daemon 51 issues requests 61, 62 that correspond to the above-mentioned requests 61′, 62′ to the database. The requests 61′ and 62′ as well as the requests 61 and 62 can also be realized as a single request.

(24) Subsequent to or concurrently to the request for opening the file 61, 61′ the fuse daemon 51 is to retrieve metadata 4 of the data file 5 (see FIGS. 1 and 2). As described in conjunction with FIG. 2, this metadata 4 comprises a list of blocks with their hash value, a size of the file and a block size, as well as a computational rule of how to translate the request for a range of data of the file to a request for certain blocks. The metadata 4 is stored at the data fetching and delivering entity 50, for example, in the virtual file 72.

(25) When the calculation rule is received, the fuse daemon 51 performs a translation of the request for a range of data of a file 60 to said quorum read request 65, which is sent to the database 30.

(26) A first such quorum read request 65 might be sent to retrieve a range of data 73 included in the immutable data blocks 20, 20′. These immutable data blocks 20, 20′ are retrieved and stored in the virtual file 72. The data of the immutable data blocks 20, 20′ corresponding to the data range 73 may be isolated from these fetched immutable data blocks 20, 20′ and the data of the range 73 is, for example, also stored in the virtual file 72.

(27) As such, the immutable data blocks 20, 20′ may remain stored in addition to the range of data 73 originally requested by the client application 1 in the request 60 for a range of data. However, the immutable data blocks 20, 20′ may also be deleted from the virtual file 72 after having isolated and stored the range of data 73 in the virtual file 72.

(28) In the same way, data from a data range 74, corresponding to the remaining data of the file 5 may be retrieved. This data range 74 is data comprised by the immutable data blocks 20′, 20″ which are again retrieved by the fuse daemon and stored in the virtual data file 72 on the data fetching and delivering entity 50. Also here, the range of data 74 may be isolated from the immutable data blocks 20′, 20″.

(29) An example of a flow chart illustrating a method of delivering data from a database to at least one client-side application, in a distributed computer environment is shown in FIG. 4.

(30) In activity 51, source data in the form of a data file is split into immutable data blocks and metadata.

(31) In activity S2, the data blocks are replicated and stored on at least two different storage node of a database, wherein the metadata created in activity S1 comprises values referencing the immutable data blocks in the at least two storage nodes for a key-value database call.

(32) In activity S3, a request for a range of data of a file triggered by at least one client-side application is translated into a quorum-read request for at least one immutable data block to the database, wherein the quorum read request comprises a plurality of individual parallel requests to different storage nodes storing the same immutable data block.

(33) In activity S4, the data blocks delivered by the database in the fastest response are fetched and those results delivered subsequently to the fastest result are discarded.

(34) In activity S5, a virtual file comprising the corresponding range of data form the fetched data blocks is created.