Equi-joins between split tables
09846709 · 2017-12-19
Assignee
Inventors
Cpc classification
G06F16/278
PHYSICS
International classification
Abstract
A join operation between split data tables includes providing value IDs. For each of the value IDs, a unique global ID may be associated with the value ID when the actual value represented by the value ID occurs among actual values comprising the second attribute of the second partition. For each identified unique global ID, the identified unique global ID may be paired with a document ID of a data record contained in a second partition stored at the second server in which the actual value in the data record is represented by the value ID associated with the identified unique global ID.
Claims
1. A method for a join operation between a first data table and a second data table based on a first attribute of the first data table and a second attribute of the second data table, the method comprising: a server among a plurality of second servers receiving from a plurality of first servers a plurality of value IDs associated with a plurality of first partitions that comprise the first data table, the first partitions distributed among the first servers, the first servers different from the second servers; and for a given second partition among a plurality of second partitions that comprise the second data table, distributed among the plurality of second servers, performing at one of the second servers operations including: for each of the value IDs, identifying a unique global ID associated with the value ID when an actual value represented by the value ID occurs among actual values comprising the second attribute of the second partition; and for each identified unique global ID, pairing the identified unique global ID with a document ID of a data record contained in a second partition stored at the second server in which the actual value in the data record is represented by the value ID associated with the identified unique global ID, wherein a plurality of first globalized lists from one or more of the first partitions are combined into a first compiled list, wherein a plurality of second globalized lists from one or more of the second partitions are combined into a second compiled list, wherein the first and second compiled lists are joined based on global IDs in the first compiled list and the global IDs in the second compiled list.
2. The method of claim 1, wherein a globalized list from one of the first partitions comprises a document ID paired with a global ID from one of the second partitions.
3. The method of claim 2, wherein the document ID identifies a data record in the first partition in which the actual value of its first attribute is represented by the value ID associated with the global ID from one of the second partitions.
4. The method of claim 1, wherein each of the second partitions is associated with a pairing the associates value IDs with unique global IDs.
5. The method of claim 1, further comprising sending each of the second globalized lists to a recipient server, the recipient server generating the second combined list from the second globalized lists received.
6. The method of claim 1, wherein multiple occurrences of an actual value in a global list of a given partition are associated with the same global ID.
7. A non-transitory computer readable storage medium having stored thereon computer executable program code, which when executed, will cause a computer processor in a first server among a plurality of second servers to perform steps for a join operation between a first data table and a second data table based on a first attribute of the first data table and a second attribute of the second data table, the steps including: receiving from a plurality of first servers a plurality of value IDs associated with a plurality of first partitions that comprise the first data table, the first partitions distributed among the first servers, the first servers different from the second servers; and for a given second partition among a plurality of second partitions that comprise the second data table, distributed among the plurality of second servers, performing operations including: for each of the value IDs, identifying a unique global ID associated with the value ID when an actual value represented by the value ID occurs among actual values comprising the second attribute of the second partition; and for each identified unique global ID, pairing the identified unique global ID with a document ID of a data record contained in a second partition stored at the second server in which the actual value in the data record is represented by the value ID associated with the identified unique global ID, wherein a plurality of first globalized lists from one or more of the first partitions are combined into a first compiled list, wherein a plurality of second globalized lists from one or more of the second partitions are combined into a second compiled list, wherein the first and second compiled lists are joined based on global IDs in the first compiled list and the global IDs in the second compiled list.
8. The non-transitory computer readable storage medium of claim 7, wherein a globalized list from one of the first partitions comprises a document ID paired with a global ID from one of the second partitions.
9. The non-transitory computer readable storage medium of claim 8, wherein the document ID identifies a data record in the first partition in which the actual value of its first attribute is represented by the value ID associated with the global ID from one of the second partitions.
10. The non-transitory computer readable storage medium of claim 7, wherein each of the second partitions is associated with a pairing the associates value IDs with unique global IDs.
11. The non-transitory computer readable storage medium of claim 7, further comprising sending each of the second globalized lists to a recipient server, the recipient server generating the second combined list from the second globalized lists received.
12. The non-transitory computer readable storage medium of claim 7, wherein multiple occurrences of an actual value in a global list of a given partition are associated with the same global ID.
13. A first server among a plurality of second servers, the first server comprising: a computer processor; a memory; and executable program code stored in the memory to perform a join operation between a first data table and a second data table based on a first attribute of the first data table and a second attribute of the second data table, the executable program code, which when executed by the computer processor, will cause the computer processor to: receive from a plurality of first servers a plurality of value IDs associated with a plurality of first partitions that comprise the first data table, the first partitions distributed among the first servers, the first servers different from the second servers; and for a given second partition among a plurality of second partitions that comprise the second data table, distributed among Hall the plurality of second servers, perform operations including: for each of the value IDs, identifying a unique global ID associated with the value ID when an actual value represented by the value ID occurs among actual values comprising the second attribute of the second partition; and for each identified unique global ID, pairing the identified unique global ID with a document ID of a data record contained in a second partition stored at the second server in which the actual value in the data record is represented by the value ID associated with the identified unique global ID, wherein a plurality of first globalized lists from one or more of the first partitions are combined into a first compiled list, wherein a plurality of second globalized lists from one or more of the second partitions are combined into a second compiled list, wherein the first and second compiled lists are joined based on global IDs in the first compiled list and the global IDs in the second compiled list.
14. The system of claim 13, wherein a globalized list from one of the first partitions comprises a document ID paired with a global ID from one of the second partitions.
15. The system of claim 14, wherein the document ID identifies a data record in the first partition in which the actual value of its first attribute is represented by the value ID associated with the global ID from one of the second partitions.
16. The non system of claim 13, wherein each of the second partitions is associated with a pairing the associates value IDs with unique global IDs.
17. The system of claim 13, further comprising sending each of the second globalized lists to a recipient server, the recipient server generating the second combined list from the second globalized lists received.
18. The system of claim 13, wherein multiple occurrences of an actual value in a global list of a given partition are associated with the same global ID.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
(17) Aspects of the present invention relate to performing a join operation between two distributed data tables. In particular, an equi-join operation may be performed between data tables that are split into multiple distributed partitions.
(18) Referring to
(19) Communication networks can connect the servers 210, 215, and 231-235. For example, a local network 220 may connect servers 210, 215. A publicly accessed network (e.g., the Internet) 230 may connect server 231-235. The local network 220 may be connected to the publicly accessed network 230, allowing communication among the database servers 210, 215, and 231-235.
(20) Each server (e.g., 210) may include a data processor subsystem 201 that may comprise one or more data processing units. A memory subsystem 202 may comprise random access memory (usually volatile memory such as DRAM) and non-volatile memory such as FLASH memory, ROM, and so on. The memory subsystem 202 may store computer executable programs, which when executed can cause the data processing subsystem 201 to operate as a database system in accordance with aspects of the present invention disclosed herein. A storage subsystem 203 may comprise one or more mass storage devices such as hard disk drives and the like. The storage subsystem 203 may include remote storage systems; e.g., for data mirroring, remote backup and such. A network interface subsystem 204 can provide access to the local network 220 and provide users with access to the server 210. A system of buses 205 can interconnect the foregoing subsystems, providing control lines, data lines, and/or voltage supply lines to/from the various subsystems. The server 210 may include a suitable display(s) 212 and input devices 211 such as a keyboard and a mouse input device.
(21)
(22) In embodiments, the two data tables A and B may be split into respective partitions A.sub.1-A.sub.n and B.sub.1-B.sub.m. Referring for a moment to
(23) Turning to the execution plan illustrated in
(24) In a step 302, each partition A.sub.x among the partitions A.sub.1-A.sub.n generates reduction data 102 and sends a copy of the generated reduction data to each partition B.sub.1-B.sub.m. Accordingly, each partition B.sub.1-B.sub.m receives reduction data 102 from each partition A.sub.1-A.sub.n. In embodiments, the reduction data 102 for partition A.sub.x comprises value IDs associated with actual values of instances of the specified attribute that occur in the partition. The step 302 can be performed by each partition A.sub.x independently of the other partitions A.sub.1-A.sub.n; there is no need to synchronize or otherwise coordinate their activity. In other words, individual reduction data 102 from each partition A.sub.1-A.sub.n can be communicated to the partitions B.sub.1-B.sub.m in parallel fashion.
(25) In a step 304, each partition B.sub.x among the partitions B.sub.1-B.sub.m associates a global ID for each actual value of the specified attribute in the partition that also occurs in at least one of the partitions A.sub.1-A.sub.n. In embodiments, the reduction data 102 received from the partitions A.sub.1-A.sub.n can be used to perform this step. Multiple occurrences of a given actual value in the partition B.sub.x will be associated with the same global ID. In accordance with the present invention, occurrences of a given actual value among all the partitions of data table A and data table B can be associated with the same global ID. This aspect of the present invention will be discussed in more detail in connection with a specific example discussed below.
(26) In a step 306, each partition B.sub.x among the partitions B.sub.1-B.sub.m translates the value IDs that comprise the reduction data 102 from each partition A.sub.1-A.sub.n, producing a set of translated value IDs 104 for each partition A.sub.1-A.sub.n. Consider a partition A.sub.x, for example. Translated value IDs 104 for partition A.sub.x are generated from the value IDs that comprise the reduction data 102 received from that partition. In particular, each value ID in the reduction data 102 is first associated with an actual value stored in the partition A.sub.x. Next, if that actual value also occurs in the partition B.sub.x, then the value ID is “translated” by pairing it with the global ID that is associated with that actual value. This translation is attempted for each value ID in the reduction data 102 of partition A.sub.x. Accordingly, the translated value IDs 104 will comprise one or more values IDs that are paired with respective global IDs. Some value IDs in the reduction data 102 may be associated with actual values which do not occur in the partition B.sub.x and so no translation is made. The translated value IDs 104 are then sent to the partition A.sub.x (backward communication). This step is performed for each of the partitions A.sub.1-A.sub.x, and by each partition B.sub.1-B.sub.m. This aspect of the present invention will be discussed in more detail below.
(27) In a step 308, each partition B.sub.x among the partitions B.sub.1-B.sub.m generates a globalized list 106 that identifies one or more rows stored in the partition. Each row that is identified in the globalized list 106 contains an actual value of the specified attribute which also occurs in one of the partitions A.sub.1-A.sub.N (based on the reduction data 102 as explained above). In an embodiment, the globalized list 106 may include a 2-tuple (a data pair) comprising a Doc ID and a global ID. The Doc IDs in the 2-tuples of the globalized list 106 identify specific rows in the partition B.sub.x for which the actual values of the specified attribute also occur in one of the partitions A.sub.1-A.sub.n. The global IDs in the 2-tuples are associated with those actual values. As a side note, the term “Doc ID” will be used herein to refer to an identifier that identifies a particular row in a data table. This aspect of the present invention will be made more clear in the specific example discussed below.
(28) In embodiments, the globalized list 106 can be sent to a receiving (recipient) data server. For example, any data server 210, 215, 231-235 can be designated as the recipient data server. The recipient data server can be selected from among a population of candidate data servers in round-robin fashion, or selected randomly. In an embodiment, a particular data server can be designated as always being the recipient data server rather than employing a round robin selection process.
(29) In a step 310, each partition A.sub.x among the partitions A.sub.1-A.sub.n generates a globalized list 108 based on the translated value IDs 104 received from each of the partitions B.sub.1-B.sub.m. The globalized list 108 identifies one or more rows stored in the partition A.sub.x. In an embodiment, the globalized list 108 includes a 2-tuple (a data pair) for each identified row. Each 2-tuple in turn comprises a Doc ID and a global ID. The Doc IDs identify one or more rows of the partition A.sub.x. The global IDs are obtained from the translated value IDs 104 received from partitions B.sub.1-B.sub.m. In embodiments, the globalized list 108 identifies rows in partition A.sub.x for which actual values (represented by the global IDs) of the specified attribute also occur in one of the partitions B.sub.1-B.sub.m. This aspect of the present invention will be discussed in more detail below. The globalized list 108 can be sent to a recipient data server. In an embodiment, the recipient data server can be the same data server employed to receive globalized lists 106 from partitions B.sub.1-B.sub.m. In an embodiment, a different data server can be employed.
(30) In a step 312, when the recipient data server has received all of the globalized lists 106 from partitions B.sub.1-B.sub.m, a compiled B list 112 can be created. The compiled B list 112 comprises pairs of Doc IDs and global IDs, and identifies those rows among partitions B.sub.1-B.sub.m for which the actual values of the specified attribute also occur in at least one of the partitions A.sub.1-A.sub.n. Similarly, when the recipient data server has received all of the globalized lists 108 from partitions A.sub.1-A.sub.n, a compiled A list 114 can be created. The compiled A list 114 comprises pairs of Doc IDs and global IDs, and identifies those rows among partitions A.sub.1-A.sub.n for which the actual values of the specified attribute also occur in at least one of the partitions B.sub.1-B.sub.m.
(31) In a step 314, a join operation is performed between the compiled A list 114 and the compiled B list 112, where the join condition is based on the global ID attribute that is common to the compiled A list and the compiled B list. Since the global IDs are associated with actual values of the specified attribute, the join operation performed in this step is equivalent to the desired join operation between the specified attributes.
(32) Following will be a more detailed discussion of the foregoing processing steps shown in
(33)
(34) Suppose a join operation is performed on the data tables A and B, predicated on the Name attribute in data table A being equal to the Name attribute in data table B. An execution plan for the join operation may involve generating an index on the Name attribute for each data table A and B. For example,
(35) Referring to
(36) Consider now the join operation discussed above in connection with
(37) Step 302—Generate and Send Reduction Data
(38) Referring to
(39) Step 304—Associate Global IDs
(40) Each partition B.sub.1, B.sub.2 associates a global ID for each actual value of the Name attribute in the partition that also occurs in at least one of the partitions A.sub.1, A.sub.2. Referring to
(41) In a step 322, various tables can be created and initialized in partition B.sub.1. In an embodiment, these tables are local to partition B.sub.1.
(42) The translation matrix 702 includes a global ID column 716 which is initialized with its corresponding ordinal position in the translation matrix. In general, any set of numbers can be used to initialize the global ID column 716 so long as each row has a unique number and the same set of numbers is used by each of the partitions B.sub.1, B.sub.2. Initially, the value IDs 1, 2, 3, and 4 from partition A.sub.1 are initially mapped to global IDs 1, 2, 3, and 4 (namely rows 1-4 of the translation matrix 702); and the value IDs 1, 2, 3, 4, and 5 from partition A.sub.2 are initially mapped to global IDs 5, 6, 7, 8, and 9 (rows 5-9 of the translation matrix).
(43) Additional tables are created and initialized. An A2B translation table is created for each partition A.sub.x, and initialized to “−1”. For example, partition A.sub.1 has a corresponding A2B translation table 704a and partition A.sub.2 has a corresponding A2B translation table 704b. The translation tables 704a, 704b serve to translate the value IDs from partitions A.sub.1 and A.sub.2 to corresponding value IDs in partition B.sub.1. Since the respective value IDs for A.sub.1, A.sub.2 are consecutive by convention, the value IDs correspond directly with the row indices in tables 704a, 704b. For example, the value ID “3” for partition A.sub.1 can be used to index into translation table 704a to obtain the translated value ID in partition B.sub.1. This use of the indices is illustrated in
(44) A B2A translation table is created and initialized with “−2”. The partition B.sub.1 has a corresponding B2A translation table 706 which includes a corresponding column for each partition A.sub.x. The B2A translation table 706 serves to provide a translation of the value IDs in the local dictionary of partition B.sub.1 to corresponding value IDs in partitions A.sub.1, A.sub.2. Since the value IDs are consecutive, the row in table 706 for a given value ID can be accessed using the value ID itself as an index into the table.
(45) Continuing with
(46) Steps 324a-324d are performed 324 by each partition A.sub.x. For example,
(47) It can be appreciated from the foregoing that the “−1” values in the A2B translation tables 704a, 704b indicate that there is no translation of value IDs from the respective A.sub.1, A.sub.2, partitions to the B.sub.1 partition. Consider translation table 704a for partition A.sub.1, for example. The second entry corresponds to a value ID 2 in partition A.sub.1, which in turn corresponds to the value “Hugo”. Since the local dictionary for partition B.sub.1 does not have an entry for “Hugo”, the entry in table 704a remains unchanged, namely it is “−1”.
(48) It can be further appreciated that the “−2” values in the B2A translation table 706 indicates that the partition B.sub.1 has not yet received the corresponding value from the respective the A.sub.x partition. For example, consider the first entry in the A.sub.1 ID column of the table 706. This entry corresponds to the first entry in the local dictionary for partition B.sub.1, which contains the value “Achim”. Since the local dictionary for partition A.sub.1 (see
(49) Continuing with
(50) For entries in the tables 704a, 704b which have no translation to corresponding value IDs in partition B.sub.1, the corresponding global IDs in column 716 are changed to “−1” to indicate this fact. Thus, for example, value IDs 2, 3, and 4 in partition A.sub.1 have no corresponding value IDs in partition B.sub.1, and so the global IDs in rows 2, 3, and 4 of the translation matrix 702 are set to “−1”. Similarly, value IDs 2, 3, and 4 in partition A.sub.2 have no corresponding value IDs in partition B.sub.1, and so the global IDs in rows 6, 7, and 8 of the translation matrix 702 are set to “−1”.
(51) In a step 328, any duplicate VID-B values in column 714 would be handled in the following manner: For each VID-B value in column 714 that occurs more than once, copy the global ID (column 716) associated with the first occurrence of that VID-B value into the global ID column of each duplicated VID-B value. However, the translation matrix 702 shown in
(52) Refer now to
(53) Referring now to
(54) Referring to
(55) In step 328, duplicate VID-B values in column 714 of the translation matrix 702 for partition B.sub.2 are handled in the following manner: For each VID-B value in column 714 that occurs more than once, copy the global ID (column 716) associated with the first occurrence of that VID-B value into the global ID column of each duplicated VID-B value. Referring to
(56) This concludes the discussion of step 304 (
(57) The discussion will now continue with an explanation of the remaining steps 306-312 of
(58) Step 306—Send Translated Value IDs
(59) Each partition B.sub.1, B.sub.2 translates the value IDs that comprise the reduction data 102a, 102b from respective partitions A.sub.1, A.sub.2, producing a set of translated value IDs. Referring to
(60) Step 308—Generate Global B Lists
(61) Each partition B.sub.1, B.sub.2 generates a globalized list that identifies one or more rows stored in the partition. Each row that is identified in the globalized list contains an actual value of the specified attribute which also occurs in one of the partitions A.sub.1, A.sub.2. Referring to
(62) In an embodiment, for each row in the index table 1102: (1) if the value ID appears in column 714 of the translation matrix 702, then (2) copy the corresponding global ID from column 716 into a globalized list 106a for partition B.sub.1, and (3) copy the corresponding Doc ID from the index table for all instances of the value ID. For example, value ID 2 appears in column 714 of the translation matrix 702. Two instances of the value ID 2 appear in the index table 1102, and the corresponding Doc IDs are 2 and 4. Accordingly, 2 and 4 are recorded in the globalized list 106a. The global ID corresponding to value ID 2 is 1, and so 1 is recorded in the globalized list 106a next to Doc IDs 2 and 4. This is repeated for value IDs 3 and 4, which also appear in column 714 of the translation matrix 702. The completed globalized list 106a can then be communicated to a recipient server.
(63) Referring to
(64) Step 310—Generate Global A Lists
(65) Referring now to
(66) In an embodiment, using the tuple list 604 obtained (
(67) Step 312—Compile Combined Lists
(68) The recipient server will receive the globalized lists 106a, 106b from respective partitions B.sub.1, B.sub.2. A compiled B list 112 can be created by concatenating the two globalized lists 106a, 106b. This is illustrated in
(69) Step 314—Join the Compiled Lists
(70) Still referring to
(71) A comparison of the join result 122 in
(72) The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the invention as defined by the claims.