Apparatus, system and method for the efficient storage and retrieval of 3-dimensionally organized data in cloud-based computing architectures
10482107 ยท 2019-11-19
Assignee
Inventors
Cpc classification
G06F16/283
PHYSICS
G01V1/345
PHYSICS
H04L67/568
ELECTRICITY
International classification
G06F16/00
PHYSICS
G06F16/28
PHYSICS
Abstract
A cloud based storage system and methods for uploading and accessing 3-D data partitioned across distributed storage nodes of the system. The data cube is processed to identify discrete partitions thereof, which partitions may be organized according to the x (e.g., inline), y (e.g., crossline) and/or z (e.g., time) aspects of the cube. The partitions are stored in unique storage nodes associated with unique keys. Sub-keys may also be used as indexes to specific data values or collections of values (e.g., traces) within a partition. Upon receiving a request, the proper partitions and values within the partitions are accessed, and the response may be passed to a renderer that converts the values into an image displayable at a client device. The request may also facilitate data or image access at a local cache, a remote cache, or the storage partitions using location, data, retrieval, and/or rendering parameters.
Claims
1. A method of obtaining image data from cloud storage comprising: generating a request for an image at a client device, wherein: the image is a graphical rendering of image data rendered according to a rendering parameter, the image data is a subset of an image data set having a multi-dimensionally organized form, the image data set partitioned across a plurality of remote storage nodes of a cloud storage, the request includes each of an identifier associated with a location in the image data set and the rendering parameter, the request is used to determine whether a cached image is available in a local cache of the client device, the cached image including the requested image; and when the cached image is not available in the local cache, the request is further used to determine whether the image data is cached in a remote cache, the remote cache configured to store the image data for rendering the image according to the rendering parameter when the cached image is not available in the local cache; and when the image data is not available in the remote cache, caching the image data for rendering the image in the remote cache, the image data retrieved from the plurality of remote storage nodes.
2. The method of claim 1 wherein the request further comprises a uniform resource locator comprising a network location for a server configured to access the remote cache and the plurality of remote storage nodes of the cloud storage and a retrieval parameter.
3. The method of claim 2 wherein: the image data set comprises a plurality of seismic trace data values, the network location is a network address for the server, the server manages a retrieval of the seismic trace data values from the plurality of remote storage nodes, the retrieval parameter includes at least one trace identifier, and the rendering parameter includes data processing information applied by a rendering application run against the image data.
4. The method of claim 2 wherein: the network location comprises a network address of the server, the server manages retrieval and rendering of images from image data of the image data set, and the retrieval parameter includes at least one of an inline value, a cross line value, and a time value.
5. The method of claim 1 wherein: the local cache is a browser cache, the request includes a full uniform resource locator comprising a network location, a retrieval parameter, and the rendering parameter, the full uniform resource locator for accessing the cached image in the browser cache; the network location and the retrieval parameter are used to access the image data cached in the remote cache, the rendering parameter is used by a rendering application to generate the image; and the retrieval parameter is used to generate a plurality of partition keys to retrieve the image data from the plurality of remote storage nodes.
6. The method of claim 1 wherein each of the plurality of remote storage nodes is a virtual storage node.
7. The method of claim 1 wherein the image data set comprises three-dimensional data and the image data for rendering the image is a two-dimensional portion of the three-dimensional data.
8. The method of claim 7 wherein the request further comprises a retrieval parameter, the retrieval parameter including an aspect parameter and a number parameter for generating keys to retrieve the image data from the plurality of remote storage nodes for rendering the two-dimensional portion of the three-dimensional data.
9. A method of obtaining data from cloud storage comprising: receiving a request for an image, wherein: the image is a graphical rendering of image data rendered according to a rendering parameter, the image data is a subset of an image data set having a multi-dimensionally organized form and partitioned across at least a first node and a second node of a remote distributed storage system, and the request includes an identifier associated with the requested image and the rendering parameter; when the image data corresponding to the requested image is not cached in a remote cache, generating a first key for retrieving a first portion of the image data from the first node and a second key for retrieving a second portion of the image data from the second node; and providing the first portion of the image data, the second portion of the image data, and the rendering parameter to a browser for rendering of the requested image according to the rendering parameter.
10. The method of claim 9 wherein the request includes a network address for a server in operable communication with the remote distributed storage system.
11. The method of claim 9 further comprising caching the first portion of the image data and the second portion of the image data in the remote cache and caching the requested image at a rendering cache.
12. The method of claim 11 wherein the rendering cache is a browser cache.
13. The method of claim 9 wherein the image data set comprises three-dimensional data and the request further comprising an aspect parameter, the method further comprising: generating the first key and the second key based on the aspect parameter; providing the image data, the aspect parameter, and the rendering parameter to the browser, the image data being a two-dimensional portion of the three-dimensional data; rendering the image data at the browser based on the aspect parameter and the rendering parameter to generate the requested image; and caching the requested image in a browser cache of the browser.
14. The method of claim 10 wherein the request further includes a network parameter, an aspect parameter and a number parameter, the method further comprising, when the image is not in a rendering cache, at the server: when the first portion of the image data and the second portion of the image data are cached at a remote cache, retrieving the first portion of the image data and the second portion of the image data from the remote cache; when the first portion of the image data and the second portion of the image data are not cached at the remote cache, retrieving the first portion of the image data and the second portion of the image data from the remote distributed storage system; and transmitting the first portion of the image data and the second portion of the image data to a renderer to generate the requested image according to the rendering parameter.
15. The method of claim 14, further comprising caching the first portion of the image data and the second portion of the image data in the remote cache when the first portion of the image data and the second portion of the image data are not cached in the remote cache.
16. The method of claim 15 further comprising generating, when the first portion of the image data and the second portion of the image data are not cached in the remote cache, the first key for retrieving the first portion of the image data from the first node and the second key for retrieving the second portion of the image data from the second node using the aspect parameter and a number parameter of the request.
17. The method of claim 9 wherein the image data set comprises three-dimensional data, the method further comprising: providing the image data, an aspect parameter, and the rendering parameter to the browser, the image data being a two-dimensional portion of the image data set, and rendering the image data at the browser based on the aspect parameter and the rendering parameter to generate the requested image.
18. The method of claim 17 further comprising caching the two-dimensional portion of the image data set in a rendering cache.
19. An apparatus comprising: a server in operable communication with a remote distributed storage and a remote cache, the server including a processor operably coupled with a tangible computer memory including computer readable instructions for: receiving a request for an image, wherein: the image is a graphical rendering of image data rendered according to a rendering parameter, the image data is a subset of an image data set having a multi-dimensionally organized form and partitioned across at least a first node and a second node of remote distributed storage, the request including an identifier associated with the requested image and the rendering parameter; when the image data is not cached in the remote cache, generating a first key for retrieving a first portion of the image data from the first node and a second key for retrieving a second portion of the image data from the second node; and providing each of the first portion of the image data and the second portion of the image data to a requesting browser for rendering by the browser according to the rendering parameter to generate the requested image.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Aspects of the present disclosure may be better understood and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings. It should be understood that these drawings depict only typical embodiments of the present disclosure and, therefore, are not to be considered limiting in scope.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION
(13) Aspects of the present disclosure involve apparatus, systems and methods for the efficient storage and retrieval of 3-dimensionally (3-D) organized sample data in cloud-based computing architectures. More particularly, aspects of the present disclosure involve partitioning 3-dimensionally organized data and storing the partitions in distinct storage partitions within a cloud storage arrangement. The storage partitions may be associated with independent and dedicated processing threads and controls such that the data partitions may be written into storage and read from storage partially or completely in parallel. The data may be partitioned based on different axes of the 3-dimensional data set, and stored in the cloud partitions based on the partitioning of the data. The data may also be selectively read from the various cloud partitions so that different aspects of the data may be read without reading the entirety of the 3-D data stored. Reading, like writing the data, may also occur in parallel.
(14) The various apparatus, systems and methods discussed herein provide for efficient storage and retrieval of massive 3-dimensionally organized data to and from the cloud, access and sharing of the data across the world, greater data security, and numerous other advantages and efficiencies over conventional systems. The example implementation discussed herein reference the 3-D data as a seismic stack. However, the present disclosure is applicable to any number of other types of 3-D data such as medical data (e.g., magnetic resonance imaging (MRI)), oceanic data, weather data, etc.
(15) Referring now to
(16) Referring to
(17) In one particular case, the storage node involves a non-relational data base with underlying disc for physically storing the data. Further, in one particular example, the data cube (stack), which may comprise some number of traces, is partitioned into a 32 trace32-trace columns or partitions. The depth of the column along the time or z-axis depends on the depth of the stack from which the partitions are created. Thus, for example, if the stack has 128 traces along the x axis (inline), then the stack is divided into 4 32-trace columns along the x-axis. If the same stack is 128 traces along the y-axis (crossline), then the stack is divided into a 16 total 32-trace columns (4 columns4 columns), as shown in
(18) Any particular trace within the stack has an inline (i) location and a crossline (c) location. Through an integer division and modulo operation, any trace within the set of columns may be identified, even when separated and stored in different storage nodes. The following operations provide an index into the proper partition, and into the proper trace within a partition: a. Inter-partition index (i/32, c/32): index to a column having the particular trace b. Intra-partition index (i %32, c %32): index to a particular trace within a given column
So, for example, referring again to
(19) In the example above for the inter-partition index, the integer division operation 7/32=0. Here, the integer value of dividing 7/32 is 0. Similarly, the integer division operation 127/32=3. Here, the integer value of dividing 127/32=3. With respect to the second operation, the modulo operation 7%32=7 and 127%32=31.
(20) The operator of 32 in both cases may be other possible values. Generally speaking, it represents a possible range of inline or crossline values, and is an integer. Other possible operator values are 8, 10, 16, 64, 128, etc., and depends on various factors including the data cube size, storage set up, etc.
(21) Returning to
(22) From blob storage, the stack is partitioned into the discrete partitions (or chunks or bricks), and each partition is stored into a storage node 16. The act of partitioning includes creating columns and column indexes as well as indexes to particular traces within each column. The partitions may be encrypted and/or compressed. Storage of the partitions may occur non-serially, meaning that it is not necessary (albeit) possible that one partition be stored first followed by another, and so on. Rather, depending on the architecture of the storage, the partitions may be stored using separate processors and/or threads. In some systems, this may be considered to be occurring in parallel.
(23) Partitioning, which may also be referred to as sharding, involves the partitioning of the 3-D stack of contiguous data (traces) into partitions. Each trace of the stack involves a trace header defining the inline and crossline numbers of the trace within the stack. The trace also has measurement values corresponding to seismic measurements taken along the z-axis, which are typically a series of seismic values at time increments such as 2 milliseconds. The inter- and intra-partition indexing schemes defined above are used to initially assign traces to a column. Essentially, each of the inline and crossline values of each trace are used to calculate the column number for which it will be assigned as well as identify the location within the assigned column for later retrieval. So, all of the traces for partition (0,0) will be stored in the first storage node, each of the traces for partition (0,1) will be stored in the second partition, and so on.
(24) Indexing information may be used to generate a key for each of the partitions, which may be then used by the storage partitions (nodes) to store the appropriate data partition. Generally speaking, a distributed storage system involves a flat one-dimensional collection of storage nodes. As conceptually illustrated in
(25) So, for column (0,3), the partition key involves the following string [0000][0003], which is a concatenation of the inline and cross line identifiers for the partition. The storage than uses the partition key to determine where to store a column and to create a one-to-one correspondence between the columns and storage nodes. The partition key is also used to extract the data later from the storage partition, as discussed below. Generation of the partition keys, in this specific example, involves concatenating or otherwise combining the x-column (partition) index with the y-column (partition index) index, so that (0, 3) becomes 0-3 (represented by key 4 of
(26)
(27) Since the stack has been partitioned and distributed across several storage nodes, the system identifies the columns having portions of the slice, and then extracts the relevant traces from the appropriate subset of all nodes holding the entirety of the stack partitions.
(28) For context,
(29)
(30) With inline and crossline slices, as well as various other possible slices, the system takes advantage of both distributed storage and the various threads (and/or processors) that are dispatched to service the data requests. So, in the above two examples, four threads were dispatched to access four (4) of the sixteen (16) total partitions to extract only the relevant data, in parallel.
(31) For time slices (or those occurring along the z-axis), more of the partitions are accessed to extract the relevant data; however, the system can still take advantage of parallelism.
(32) It is also possible to partition the stack along the time axis, as well as the inline and crossline axes as discussed relative to
(33)
(34) The initial request is received at a web server 96 that communicates with the cloud storage 94 as discussed with reference to
(35) More specifically, the data portion of the url (e.g., website.com?name=basic&aspect=inline&number=7) is broken down into its components: name=basic, aspect=inline, and number=7. Basic is the name of the 3D dataset, and it is used to lookup in a relational table in the system with the exact address in distributed storage where the seismic traces are stored. Keep in mind that the logical entity that is basic is a single thing, even though it is partitioned across nodes of the distributed storage system. Next, per the discussion above, the aspect and number are used to produce a key to the correct partition, and a key to the needed items from within the partition.
(36) The cloud storage then returns the appropriate traces to the web server 96. In this example, the storage would return 128 traces corresponding to each of the traces that fall along inline 7. The traces are provided to a rendering application 100, which may or may not be integrated with the web server, and the renderer returns a 2-dimensional digital image of inline 7, as shown in
(37) Seismic data may be presented in different visual forms. At a simple level, it is possible to request and view grayscale (black and white) or color seismic data. If the user's initial request was for grayscale data, then a grayscale inline slice would be displayed and cached at the browser. During the course of analyzing the seismic data, the user will typically request other slices, often viewing slices deeper and deeper into the data such as from inline 7, to 8, to 9, and so on. Often, however, the user will return to and request inline 7 again.
(38)
(39) To provide the system with the capability to issue one request and select the appropriate data to service the request, a unique query string is used. The query string is as follows:
request(query string)=network location: retrieval information: rendering parameters.
(40) The network location describes the location on a network for the location of the data. The network location may take the form of a uniform resource locator (URL), a uniform resource identifier (URI) or other addressing scheme. The retrieval information may include the name of the cube, the aspect of the section of data being requested, and the section or slice number, if available. The name of the data cube is used to identify the data cube including the requested information, and distinguished it from other cubes that may be accessible from the system. The aspect, in the example of a seismic slice, may include inline, crossline, time, and other aspects such as those depicted in
(41) The rendering information may include a number of different possible fields. Examples of rendering information includes palette (e.g., simple, grayscale, color), gain (a multiplier to apply to the data before rendering) and scale (number of image pixels per data sample). Generally, the rendering parameter describes how the data should be rendered, e.g., grayscale, color, without or without gain, with or without scaling, etc.
(42) Assuming for the sake of an example, that the stack 18 is named basic, and referring specifically to a grayscale request for inline 7 with the webserver located at www.website.com, the request is as follows:
website.com?name=basic&aspect=inline&number=7&palette=grayscale&scale=1.
Similarly, a color request for inline 7 is as follows:
website.com?name=basic&aspect=inline&number=7&palette=color&scale=1.
In this example, the retrieval parameters are: name=basic&aspect=inline&number=7. The rendering parameters are: palette=grayscale&scale=1.
(43) In a high performance, cloud-based system with many simultaneous users, caching is used to enhance system performance. In the system describe herein, a two-layer cache is describedthe first cache is the cloud cache, which stores the data samples (e.g., traces) for each request, for a period of time, under the key composed of the concatenated retrieval parameters. The cloud cache is implemented as close as possible to the processing resources of the cloud-based system. In some cloud-based system, the cloud cache may be referred to as an in-memory application cache.
(44) In the example set out above, the key to the cloud cache would be: website.com?name=basic&aspect=inline&number=7 The browser cache may be referred to as a rendering cache, which stores in memory the image for each request, for a period of time, under the key composed of the concatenated base and retrieval and rendering parameters (full URL) (e.g., website.com?name=basic&aspect=inline&number=7&palette=grayscale&scale=1). The rendering cache may be implemented as close as possible to the user of the cloud-based system. In some specific examples, the rendering cache may be a browser cache. Alternatively, a content delivery network (CDN) can be used for the rendering cache.
(45) Referring again to
(46) If the resource is not in its browser cache, the browser forwards the request for the png file (in this example) to the web server 96 (operation 1010). The server then subtracts out the image rendering parameters (palette=grayscale&scale=1) from the URL, and uses the remaining data parameters (website.com?name=basic&aspect=inline&number=7) as the key for searching the cloud data cache for the seismic trace data that might be in the remote cache 104. The rendering parameters are not relevant at this point. If the trace data for inline 7 is in cache, it reads it out and sends it to the renderer 100, which uses the rendering parameters to direct the creation of the png file from the seismic traces (operations 1010 and 104). Of course, if the seismic traces are not in the cloud cache 104, it goes ahead and uses the data parameters to read the seismic trace data from storage 94 (operation 1020).
(47) The various inventive concepts described above may be implemented on virtually any type of computer regardless of the platform being used. For example, as shown in
(48) Further, those skilled in the art will appreciate that one or more elements of the computer system 500 may be located at a remote location and connected to the other elements over a network. The invention may be implemented on a distributed system having a plurality of nodes, where each portion of the invention (e.g., the operating system, file system, cache, application(s), etc.) may be located on a different node within the distributed system, and each node may correspond to a computer system. Alternatively, the node may correspond to a processor with associated physical memory. The node may alternatively correspond to a processor with shared memory and/or resources. Further, software instructions to perform embodiments of the invention may be stored on a tangible computer readable medium such as a compact disc (CD), a diskette, a tape, a digital versatile disk (DVD), or any other suitable tangible computer readable storage device.
(49) The description above includes example systems, methods, techniques, instruction sequences, and/or computer program products that embody techniques of the present disclosure. However, it is understood that the described disclosure may be practiced without these specific details. In the present disclosure, the methods disclosed may be implemented as sets of instructions or software readable by a device. Further, it is understood that the specific order or hierarchy of steps in the methods disclosed are instances of example approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the method can be rearranged while remaining within the disclosed subject matter. The accompanying method claims present elements of the various steps in a sample order, and are not necessarily meant to be limited to the specific order or hierarchy presented.
(50) The described disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette), optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or other types of medium suitable for storing electronic instructions.
(51) It is believed that the present disclosure and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that various changes may be made in the form, construction and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages. The form described is merely explanatory, and it is the intention of the following claims to encompass and include such changes. While the present disclosure has been described with reference to various embodiments, it will be understood that these embodiments are illustrative and that the scope of the disclosure is not limited to them. Many variations, modifications, additions, and improvements are possible. More generally, embodiments in accordance with the present disclosure have been described in the context of particular implementations. Functionality may be separated or combined in blocks differently in various embodiments of the disclosure or described with different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.