Database managing method, database managing system, and database tree structure
10664459 ยท 2020-05-26
Assignee
Inventors
Cpc classification
International classification
Abstract
Provided are a database (DB) managing method and system, wherein, while forming an index of a DB, a lower value and an upper value of key values of a plurality of records included in one page are stored as separators and an overlapping part of the key values is deleted from the plurality of records by using the separators to save a storage space where pages of the index is stored, and thus the performance of the DB is improved.
Claims
1. A method for managing a database (DB) in communication with a computer system, using an index having a plurality of pages, each having a plurality of records, each of the plurality of records comprising an original key value and an object or record identifier providing a pointer to a record in the database, the method comprising: for each respective page of at least one of the plurality of pages: generating, by an index manager in the computer system, a lower fence key (LFK) from a lower value of the original key values of the plurality of records included in the respective page and an upper fence key (UFK) from an upper value of the original key values of the plurality of records included in the respective page; extracting, by the index manager, a common region from each of the original key values of the plurality of records included in the respective page, as a prefix; excluding the prefix from each of the original key values of the plurality of records in the respective page except for the lower fence key (LFK) and the upper fence key (UFK), to provide remaining parts of the original key values; and storing, by the index manager, together in the respective page, the generated lower fence key (LFK) the generated upper fence key (UFK) and the provided remaining parts of the original key values, wherein the prefix is included in the stored LFK and UFK, and wherein the provided remaining parts of the original key values are stored in the respective page between the lower fence key (LFK) and the upper fence key (UFK) to provide a compressed page.
2. The DB managing method of claim 1, wherein key values obtained by excluding the prefix from the original key values of the plurality of records are stored in the plurality of records of the at least one of the plurality of compressed pages, except in where the LFK or the UFK are stored.
3. The DB managing method of claim 1, wherein each of the plurality of records is in a multi-column form comprising a plurality of parts of key values.
4. The DB managing method of claim 1, wherein each of the plurality of pages is a leaf node having a B tree or B+ tree structure.
5. The DB managing method of claim 1, further comprising restoring, by the index manager, the original key values of the plurality of records included in each of the at least one compressed page of the plurality of pages, wherein the restoring comprises: determining whether a select page is a compressed page, wherein said determining comprises determining whether the LFK and the UFK exist in the select page; if it is determined that the select page is a compressed page, comparing the LFK and the UFK to extract the prefix between the LFK and the UFK if the LFK and the UFK exist in the select page; and restoring the original key values by combining the extracted prefix and the remaining part of the original key values of a select record of the select page.
6. The DB managing method of claim 5, wherein, if the LFK and the UFK do not exist in the select page, key values stored in each record of the select page are the original key values.
7. The DB managing method of claim 1, further comprising adding, by the index manager, a new record to a select page or changing, by the index manager, a select record from the select page, wherein the adding or changing comprises: determining whether a select page is a compressed page, wherein said determining comprises determining whether the LFK and the UFK exist in the select page; if it is determined that the select page is a compressed page, comparing the LFK and the UFK to extract the prefix between the LFK and the UFK if the LFK and the UFK exist in the select page; and adding, to the select page, a remaining part of a key value of the new record obtained by excluding the prefix from the new record to be added, or changing the remaining part of the key value of the select record obtained by excluding the prefix from the select record.
8. The DB managing method of claim 7, wherein, if the LFK and the UFK do not exist in the select page, the new record is added to the select page or the select record of the select page is changed without the prefix being extracted.
9. A non-transitory computer-readable recording medium having recorded thereon a program for executing the DB managing method of claim 1.
10. A database (DB) managing system having a processor for managing a database (DB) in communication with the DB managing system, using an index having a plurality of pages, each having a plurality of records, each of the plurality of records comprising an original key value and an object or record identifier providing a pointer to a record in the database, comprising: a query analyzer that receives and analyzes a query in which a fetch request for a record included in a certain table in a database and an update request for at least one column included in the record are defined; an execution plan generator that generates an execution plan for performing the analyzed query; an execution plan execution unit that executes the execution plan by fetching the record and updating the at least one column according to the execution plan; and an index manager that comprises an index generator generating, for each respective page of at least one of the plurality of pages, a lower fence key (LFK) from a lower value of the original key values of the plurality of records included in the respective page and an upper fence key (UFK) from an upper value of the original key values of the plurality of records included in the respective page; wherein the index manager extracts a common region from each of the original key values of the plurality of records included in the at least one of the plurality of pages, as a prefix; excludes the prefix from each of the original key values of the plurality of records in the respective page except for the lower fence key (LFK) and the upper fence key (UFK), to provide remaining parts of the original key values; and stores, together in the respective page, the generated lower fence key (LFK), the generated upper fence key (UFK) and the provided remaining parts of the original key values to provide a compressed page, wherein the prefix is included in the stored LFK and UFK, and wherein the provided remaining parts of the original key values are stored in the respective compressed page between the lower fence key (LFK) and the upper fence key (UFK).
11. The DB managing system of claim 10, wherein the prefix is stored only in the LFK and the UFK.
12. The DB managing system of claim 11, wherein key values obtained by excluding the prefix from the original key values of the plurality of records are stored in the plurality of records of the at least one of the compressed pages, except in where the LFK or the UFK is stored.
13. The DB managing system of claim 10, wherein the index manager further comprises a record restorer that restores the original key values from the plurality of records included in each of the at least one of the compressed pages of the index.
14. The DB managing system of claim 13, wherein the record restorer determines whether a select page is a compressed page, wherein said determining comprises determining whether the LFK and the UFK exist in a select page, if the LFK and the UFK exist in the select page, compares the LFK and the UFK to extract the prefix, and restores the original key values by combining the extracted prefix and the remaining part of the original key values of a select record of the select page.
15. The DB managing system of claim 10, wherein the index manager further comprises a record updater that adds a new record to a select page of the index or changes a select record from the select page.
16. The DB managing system of claim 15, wherein the record updater determines whether a select page is a compressed page, wherein said determining comprises determining the LFK and the UFK exist in the select page, compares the LFK and the UFK to extract the prefix if the LFK and the UFK exist in the relevant page, and adds, to the select page, a remaining part of the key value of the new record obtained by excluding the prefix from the new record to be added, or changes the remaining part of the key value of the select record obtained by excluding the prefix from the select record.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and/or other aspects will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the embodiments are merely described below, by referring to the figures, to explain aspects of the present description.
(12) Hereinafter, one or more embodiments of the present invention will be described in detail with reference to accompanying drawings, to be easily executed by one of ordinary skill in the art.
(13)
(14) First, various types of data are stored in a DB 110 in a form of a table, and as described above, each table includes at least one record, and each record includes at least one column. For example, when a DB stores bulletins on a predetermined bulletin board, a table denotes a group of bulletins, and a record denotes each bulletin, and a column denotes an area where a bulletin identifier, a bulletin writer, or a bulletin click-count is stored. In
(15) The DB managing system 100 is connected to the DB 110 to systematically manage the DB 110, such as update or delete data recorded in the DB 110 or add data to the DB 110, and largely includes a query analyzer 120, an execution plan generator 130, and an execution plan execution unit 140. Also, the DB managing system 100 may further include an entry manager 150 and an index manager 160.
(16) The query analyzer 120 receives a query for processing data stored in the DB 110 from any one of various external servers (not shown) or a manager terminal (not shown) interworking with the DB managing system 100, and analyzes the received query. Such a query analyzer 120 may include a query receiver and a parser, and may further include a validity verifier.
(17) The execution plan generator 130 generates an execution plan for fetching a requested record and updating a column included in the requested record based on a parse tree determined to be valid by the validity verifier of the query analyzer 120, and stores the generated execution plan in a memory 170 that is described in detail later. Here, the execution plan denotes a data structure including a method of fetching a record from a certain table, a result record list, and information about whether to perform an adding operation on a column requested to be updated.
(18) According to an embodiment, the execution plan generator 130 may select any one of a sequential scanning method and an index scanning method to fetch a requested record from a certain table. Here, a sequential scanning method fetches a record having an identifier of a requested record while sequentially scanning records included in a certain table, and the index scanning method fetches a requested record by only scanning a relevant index since an index is generated according to identifiers of records. An index of the DB managing system 100 will be described in detail later.
(19) The execution plan execution unit 140 fetches the requested record from the certain table according to the execution plan generated by the execution plan generator 130, and updates a column value of the column requested to be updated by performing an adding operation on a column value recorded in a column on a record corresponding to a physical location of the column requested to be updated. In detail, the execution plan execution unit 140 generates a transaction for executing the execution plan generated by the execution plan generator 130 to process the generated execution plan during the transaction. Here, a transaction indicates one logical working unit, and is defined by using at least one structured query language (SQL) statement. By using such a transaction, data consistency and data concurrency may be guaranteed.
(20) The DB managing system 100 may further include the entry manager 150 that generates or deletes an entry including an identifier of a record and an identifier of a column requested to be updated, and stores the generated entry in the memory 170 by matching the generated entry to a column value corresponding to a column identifier included in an entry), wherein the column value of the column requested to be updated may be matched to the entry generated by the entry manager 150, and stored in the memory 170.
(21) Meanwhile, the DB managing system 100 may further include the index manager 160 that generates or deletes an index, and stores the generated index in the memory 170. The index manager 160 may include an index generator 161, a record restorer 163, and a record updater 165.
(22) The index generator 161 generates an index regarding the DB 110, and at this time, a lower value of key values in each page of the index is stored as a lower fence key (LFK) and an upper value of the key values is stored as an upper fence key (UFK). Also, the index generator 161 extracts, as a prefix, a common region of key values of a plurality of records forming each of the pages. Then, the index generator 161 deletes a part corresponding to the prefix from the key values of the plurality of records included each of the pages, and then stores the key values in the index.
(23) The record restorer 163 restores original key values from the record included in each page of the index. In detail, the record restorer 163 determines whether the LFK and the UFK exist in each page, and if the LFK and the UFK exist, extracts the prefix by comparing the LFK and the UFK and restores the original key values by combining the extracted prefix and the key values of the record.
(24) The record updater 165 adds a new record to each page of the index or changes a pre-existing record. In detail, the record updater 165 determines whether the LFK and the UFK exist in each page, and if the LFK and the UFK exist, extracts the prefix by comparing the LFK and the UFK, and adds, to a record of a relevant page, remaining data obtained by excluding the prefix from a record to be added or change the record of the relevant page to remaining data obtained by excluding the prefix from a record to replace the record of the relevant page.
(25) The DB managing system 100 may be implemented in a dedicated computing or processing device or made part of an existing computing or processing device. The different components of the DB managing system 100 described above may be separate programs or functions that are executed by the computing or processing devices that implements the DB managing system.
(26) The transmitting unit 52, receiving unit 54, memory unit 56, and processing unit 58 are connected to one another via the data bus 59, and send data to and/or receive data from one another using the data bus 59.
(27) The transmitting unit 52 is a device that includes hardware and any necessary software for transmitting signals including, for example, control signals or data signals via one or more wired and/or wireless connections to other network element.
(28) The receiving unit 54 is a device that includes hardware and any necessary software for receiving wireless signals including, for example, control signals or data signals via one or more wired and/or wireless connections to other network elements.
(29) The memory unit 56 may be any device capable of storing data including magnetic storage, flash storage, etc.
(30) The processing unit 58 may be any device capable of processing data including, for example, a processor. The term processor, as used herein, refers to, for example, a hardware-implemented data processing device having circuitry that is physically structured to execute desired operations including, for example, operations represented as code and/or instructions included in a program. Examples of the above-referenced hardware-implemented data processing device include, but are not limited to, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC), and a field programmable gate array (FPGA).
(31) Hereinafter, a DB managing method using a separator-based index compression method will be described in detail.
(32)
(33) Referring to
(34) The DB managing method will now be described in detail.
(35) An index indicates a data structure that increases an operation speed regarding a table in DB fields. The index may be generated by using one column (a single column index) or several columns (multi-column index) in the table, and provides a base for not only a high speed search operation but also an efficient ordering operation with regard to accessing a record. A disk space required to store the index is generally smaller than that required to store the table because the index generally has a key and a field and does not have other detailed items of the table.
(36) A B tree (or a B+ tree) is a type of a tree data structure widely used in a DB and a file system to form such an index, and is an associative mapping data structure for quickly inquiring into a record having a certain key value. Since data is recorded on a high capacity disk that is slowly accessed, the B tree (or the B+ tree) has a tree structure in page units to reduce numbers of input/output (I/O). Key values and locations (object identifiers or record identifiers) of actual records including the key values as attributes are recorded in one page in an order of the key values. In other words, an index record (also referred to as a record herein) is formed by combining {key-OID}.
(37) Due to such characteristics, adjacent key values of a record in one page may be considerably similar. For example, when an identification (ID) number, a mail folder number, and a mail serial number are assigned as index keys for classifying mails in a mail system of a company (i.e., a multi-column index), and an employee has total 100,000 mails and 10,000 mails in one mail folder, the 10,000 mails have the same ID number and the same mail folder number and the 100,000 mails have the same ID number in an index. If 1,000 mails are recorded in one page of a general B tree according to a general method, the same ID number and the same mail folder number are repeatedly stored in about 10 pages. Accordingly, since same values are repeatedly stored, a memory is unnecessarily wasted.
(38) Accordingly, a DB managing method and system, which use a separator-based index compression method, according to one or more embodiments of the present invention save a memory space by storing a prefix that is repeatedly stored from among key values of one page, in a lower fence key (LFK) and an upper fence key (UFK) that are separators.
(39) Referring back to
(40) Operation S100 will now be described in more detail.
(41) Referring to
(42) Here, the root node of the B tree index of
(43) As described above, the DB managing method according to an embodiment of the present invention stores a lower value of key values in each page as an LFK and an upper value of the key values in each page as an UFK. Here, in
(44) Also, the DB managing method according to an embodiment of the present invention extracts a common region from key values of a plurality of records between the LFK and the UFK in each page as a prefix, deletes the prefix from the key values of the plurality of records except for the LFK and the UFK, and stores the remaining key values, thereby preventing repeated storage of data to save a memory space.
(45) In other words, the first separation key value P1 is an UFK UFK1 of the first page and the same time, is an LFK LFK2 of the second page. Similarly, the second separation key value P2 is an UFK UFK2 of the second page and at the same time, is an LFK LFK3 of the third page. Similarly, the third separation key value P3 is an UFK UKF3 of the third page and at the same time, is an LFK LFK4 of the fourth page. Here, an LFK cannot be set for the leftmost leaf node (the first page) of the four leaf nodes, and thus a prefix is not extracted from the first page. Similarly, an UFK cannot be set for the rightmost leaf node (the fourth page) of the four leaf nodes, and thus a prefix is not extracted from the fourth page. In
(46) For more detailed description,
(47) Referring to
(48) The restoring of a record during the DB managing method according to an embodiment of the present invention will now be described in detail. Referring back to
(49) First, a page having key values to be restored is found via a binary search, and it is determined whether an LFK and an UFK exist in the page. If any of the LFK and the UFK does not exist in the page, the page is not compressed, i.e., overlapping data is not deleted by using a prefix, and thus it is determined that key values stored in each record of the page are original key values (operation S240). Meanwhile, if both the LFK and the UFK exist in the page, it is determined that the data is compressed by using a prefix in the page, and thus a predetermined restoration routine is performed. In other words, the LFK that is a lower value of the page and the UFK that is an upper value of the page are compared to extract a common region of the LFK and the UFK, i.e., a prefix. In
(50) The adding or changing of a record during the DB managing method according to an embodiment of the present invention will now be described in detail. Referring back to
(51) First, a page having key values to be added or changed is found via a binary search, and it is determined whether an LFK and an UFK exist in the page. If any of the LFK and the UFK does not exist in the page, the page is not compressed, i.e., overlapping data is not deleted by using a prefix, and thus a new key value is added to a key value of a relevant record or the key value of the relevant record is changed without any separate process (operation S340). Meanwhile, if both the LFK and the UFK exist in the page, it is determined that the data is compressed by using a prefix in the page, and thus a predetermined disassembling routine is performed. In other words, the LFK that is a lower value of the page and the UFK that is an upper value of the page are compared to extract a common region of the LFK and the UFK, i.e., a prefix. In
(52) A process of splitting a page or merging pages during the DB managing method according to an embodiment of the present invention will now be described.
(53)
(54) A root node of the B tree index of
(55) Here, let's assume that there is no more storage space in the second page and thus the second page needs to be split based on a predetermined value S between the first and second separation key values P1 and P2.
(56) In this case, as shown in
(57) Meanwhile, a third page that is newly generated in
(58) Lastly, in the third page of
(59) Meanwhile, although not shown in
(60) Also, although not shown in
(61) Table 1 below is obtained by comparing a total page count of an index when a general DB managing method and system are applied and a total page count of an index when a DB managing method and system according to one or more embodiments of the present invention are applied. Also, Table 2 below is obtained by comparing an average key length when a general DB managing method and system are applied and an average key length when a DB managing method and system according to one or more embodiments of the present invention are applied.
(62) TABLE-US-00001 TABLE 1 Total Page Reduction Rate of Page Count of Count of Index when Present Name of Index General Index Invention is Applied (%) PK (primary key) 4,120,635 21 Index 1 5,740,191 23 Index 2 6,056,621 18 Index 3 6,849,043 19 Index 4 5,793,081 15
(63) TABLE-US-00002 TABLE 2 General Reduction Rate of Average Average Key Key Length when Present Name of Index Length (bytes) Invention is Applied (%) PK (primary key) 18 39 Index 1 30 33 Index 2 32 28 Index 3 38 29 Index 4 30 27
(64) As shown in Tables 1 and 2, the total page count is reduced by about 19% and the average key length is reduced by about 31% when the DB managing method and system according to one or more embodiments of the present invention are applied compared to when the general DB managing method and system are applied. In other words, a space storage may be saved according to one or more embodiments of the present invention, and thus the performance of a DB may be improved.
(65) In the DB managing method according to an embodiment of the present invention, a key value used as a separator of a B tree page in the B tree page is set as a {virtual key-OID} record and added to each of fence keys (an LFK and an UFK) at two ends of the B tree page, and thus a combining process of obtaining original key values based on the LFK and the UFK and a disassembling process of deleting a prefix may be quickly performed.
(66) If an LFK or an UFK does not exist in a page, a record may be stored by using a general method, and thus an index may include both a compressed page (a page wherein a prefix is deleted from each record) and an uncompressed page (a page wherein a prefix is not deleted from each record). Also, a compressed record and an uncompressed record may co-exist in a page having an LFK and an UFK. Accordingly, compression may be dynamically set in run-time regardless of a current state of a B tree. Accordingly, a DB operation efficiency may be improved by adjusting compression to be not performed when an insert/delete load of a certain region is increased.
(67) Meanwhile, since compression using a prefix is determined by using an LKF and an UFK, metadata about a secondary compression method and a range may not be additionally recorded. Accordingly, as a number of records stored in a page increases, a compression efficiency is increased compared to a general method of including meta information to a compressed record.
(68) Since only key values without a prefix are stored according to an embodiment of the present invention, more records may be stored in one page of B tree. Accordingly, since a storage space may be saved, it is highly likely that the storage space is included in a buffer cache of a main memory, and thus the performance of a DB may be increased. Moreover, compared to a general compression method, in the DB managing method according to an embodiment of the present invention, splitting a page and restoring a record are easily performed, lower compatibility is satisfactory since a storage structure is not changed, and a structure of an index is simple since only a flag indicating an LFK and an UFK is added and no other metadata is additionally recorded.
(69) In addition, by using the DB managing method according to an embodiment of the present invention, a validity check may be performed on a leaf node by using an LFK and an UFK of each page whenever a tree structure of an index is traversed, and thus an error of an index structure may be easily determined. Also, since a connection relationship between leaf nodes is determined by using an LFK and an UFK, a link may be removed from the leaf nodes, and thus the performance and efficiency of a structure modification operation (SMO) may be increased.
(70) As described above, according to the one or more of the above embodiments of the present invention, a storage space where pages of an index are stored may be saved, and thus the performance of a DB may be increased.
(71) Also, according to one or more embodiments of the present invention, compression may be set in run-time, and thus a DB operation efficiency may be increased by adjusting compression not to be performed when an insert/delete load in a creation region is increased.
(72) Also, according to one or more embodiments of the present invention, metadata about a secondary compression method and a range may not be additionally recorded, and thus a compression efficiency is increased as a number of records stored in a page increases.
(73) Also, according to one or more embodiments of the present invention, a validity check is performed on a leaf node by using an LFK and an UFK of each page whenever a tree structure of an index is traversed, and thus an error of an index structure may be easily determined.
(74) The DB managing method described above may be written as a program that is executable by using any one of various computers. Here, the program for performing the DB managing method may be stored in a computer-readable recording medium, such as a hard disk, a CD-ROM, a DVD, a ROM, a RAM, or a flash memory.
(75) While one or more embodiments of the present invention have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.