Detecting and processing cache hits for queries with aggregates
09563662 ยท 2017-02-07
Assignee
Inventors
- Donovan Alfred Schneider (San Francisco, CA, US)
- Edward Shaw-Lee Suen (San Francisco, CA, US)
- Kazi Atif-Uz Zaman (Belmont, CA, US)
Cpc classification
Y10S707/99936
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y10S707/99932
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y10S707/99935
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y10S707/99934
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
Y10S707/99933
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
Techniques to improve query caching performance by efficiently selecting queries stored in a cache for evaluation and increasing the cache hit rate by allowing for inexact matches. A list of candidate queries stored in the cache that potentially could be used to answer a new query is first determined. This list may include all cached queries, cached queries containing exact matches for select list items, or cached queries containing exact and/or inexact matches. Each of at least one candidate query is then evaluated to determine whether or not there is a cache hit, which indicates that the candidate query could be used to answer the new query. The evaluation is performed using a set of rules that allows for inexact matches of aggregates, if any, in the new query. A query plan is generated for the new query based on a specific candidate query with a cache hit.
Claims
1. A non-transitory computer-usable storage medium storing instructions executable by a processor, said instructions comprising: a first set of instructions configured to maintain a select list index of a plurality of cached queries, wherein said select list index comprises key field entries and data item field entries, said select list index stores one or more unique items, each of said one or more unique items is extracted from a clause of a statement of at least one cached query of said plurality of cached queries, said plurality of cached queries are stored in a cache, each key field entry stores a unique item of said one or more unique items, said each key field entry is associated with a corresponding data item field entry of said data item field entries, said corresponding data item field entry is configured to store information identifying said at least one cached query that comprises said unique item stored in said each key field entry, and at least one of said one or more unique items comprises a precomputed aggregate from a previously cached query; and a second set of instructions configured to determine a list of candidate queries configured to be used to answer a new query that includes at least a first unique item and a second unique item, wherein the list of candidate queries is determined based on an intersection of a first set of cached queries that include the first unique item and a second set of cached queries that include the second unique item.
2. The non-transitory computer-usable storage medium of claim 1, wherein said list of candidate queries comprises a subset of said plurality of cached queries, said new query comprises a new query aggregate, said new query uses a cross-section of results of previously computed non-identical queries, said new query aggregate does not exactly match any of said unique items in said select list index, said aggregate in said select list is evaluated to be usable for performing aggregate roll-up for said new query aggregate, and said at least one cached query that comprises said aggregate is identified as a candidate query to potentially be used to answer said new query.
3. The non-transitory computer-usable storage medium of claim 1, wherein said one or more unique items further comprise a first unique item, wherein said first unique item is an expression comprising a mathematical operator.
4. A non-transitory computer-usable storage medium storing instructions executable by a processor, said instructions comprising: a first set of instructions configured to maintain an index of a plurality of aggregates, wherein said index comprises key field entries and data item field entries, said index stores one or more unique items comprising a precomputed aggregate from a previously cached query, each of said one or more unique items is aggregated upon by one or more aggregates extracted from a clause of a statement of at least one cached query of a plurality of cached queries, said plurality of cached queries are stored in a cache, and each key field entry stores a unique item of said one or more unique items, said each key field entry is associated with a corresponding data item field entry of said data item field entries, said corresponding data item field entry is configured to store information identifying said one or more aggregates that operate upon said unique item, wherein said one or more aggregates are drawn from a select list index; and a second set of instructions configured to access said index and rewrite a selected aggregate in a new query, said second set of instructions is configured to rewrite said selected aggregate only if an exact match for said selected aggregate cannot be found in said select list index.
5. The non-transitory computer-usable storage medium of claim 4, wherein said one or more unique items further comprise a first unique item, said first unique item is a first subexpression generated by a decomposition of an expression.
6. The non-transitory computer-usable storage medium of claim 4, wherein said one or more aggregates in said data item field comprise only distributive aggregates.
7. The non-transitory computer-usable storage medium of claim 5, wherein, said decomposition comprises storing a first portion of said expression in said index, and said decomposition further comprises excluding from said first portion a mathematical operator and a constant.
8. The non-transitory computer-usable storage medium of claim 1, wherein, said one or more unique items further comprise a first unique item and a second unique item, said first unique item is a first subexpression generated by a decomposition of an expression, said second unique item is a second subexpression generated by said decomposition of said expression, said decomposition comprises storing as said first unique item a first portion of said expression in said select list index, said decomposition further comprises storing as said second unique item a second portion of said expression in said select list index, and said first portion and said second portion are distinct.
9. The non-transitory computer-usable storage medium of claim 8, wherein, said new query uses a cross-section of results of previously computed non-identical queries, said new query comprises an expression comprising a mathematical operator, said first unique item, and said second unique item, and said list of candidate queries comprises an intersection of cached queries for said first unique item and said second unique item.
10. The non-transitory computer-usable storage medium of claim 1, wherein said select list index is revised to reflect additions to and deletions from said cache.
11. The non-transitory computer-usable storage medium of claim 4, wherein said select list index comprises one entry for each of said one or more unique items, and each entry corresponds to a set of cached queries that comprise said each unique item.
12. The non-transitory computer-usable storage medium of claim 4, wherein said index is revised to reflect additions to and deletions from said cache.
13. The non-transitory computer-usable storage medium of claim 4, wherein said select list index is revised to reflect said additions to and said deletions from said cache.
14. The non-transitory computer-usable storage medium of claim 4, wherein said instructions configured to rewrite said selected aggregate are limited to rolling up results of said one or more aggregates.
15. The non-transitory computer-usable storage medium of claim 4, wherein said instructions configured to rewrite said selected aggregate further comprise a third set of instructions configured to perform a rewrite evaluation of said at least one cached query that comprises said at least one aggregate, and said rewrite evaluation comprises a check to ensure that all columns associated with said selected aggregate are present in said at least one cached query.
16. The non-transitory computer-usable storage medium of claim 4, further comprising a third set of instructions configured to ensure that all filters of said are compatible with said selected aggregate.
17. The non-transitory computer-usable storage medium of claim 16, wherein said filters comprise a where clause.
18. The non-transitory computer-usable storage medium of claim 16, wherein said filters comprise a having clause.
19. The non-transitory computer-usable storage medium of claim 4, wherein said selected aggregate operates on a new query item, said new query item matches at least one unique item in said index, at least one aggregate of said one or more aggregates that operate upon said one unique item is evaluated to be usable for performing aggregate rewrite for said selected aggregate, and said selected aggregate in said new query is rewritten in terms of said at least one aggregate of said one or more aggregates.
20. The non-transitory computer-usable storage medium of claim 2, wherein said at least one cached query identified as said candidate query is part of a first set of potential candidate queries, said new query comprises a new query item that matches one of said unique items in said select list index, a second subset of cached queries comprising said one unique item are identified as a second set of potential candidate queries, and said list is determined by intersecting said first set of potential candidate queries with said second set of potential candidate queries.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
(10)
(11) Depending on its design, Web/application server 120 may support various software applications (e.g., analytics), with only one application 122 being shown in
(12) Query server 130 receives logical query requests (or queries) from Web/application server 120 , processes each query, and provides the results for the queries back to the Web/application server. Query server 130 performs all of the query processing, including sending SQL requests to the databases. In the embodiment shown in
(13) Databases 150 store data that may be accessed by server 130 as needed. A metadata store 160 stores information about the application environment, data modeling, aggregate navigation, caching, and so on. This information is used by server 130 to perform various processing. A cache 170 provides storage for results for prior queries and is implemented with a storage unit (e.g., a disk drive or a memory unit) that can be accessed more readily and quickly than databases 150.
(14)
(15) The processing for a new query may proceed as follows. The query is first provided to SQL generation engine 134, which translates the logical request for this query into an SQL statement. Query processing engine 140 then determines whether or not this query may be answered using a query that is stored in cache 170. If the answer is yes, then the query request is run against the cached query. Otherwise, query processing engine 140 translates the SQL statement into physical queries against one or more databases 150. In either case, computation engine 136 performs post processing of the intermediate results from the databases or the cache to provide the desired results.
(16) A logical request may be issued by an end-user or some other entity to query objects in one or more databases to obtain specific results of interest. The logical request may be formulated using a SELECT statement, which is the basis for querying a structured query language (SQL) database. The SELECT statement for a query, which is also referred to as a query specification, has the following basic syntax:
(17) TABLE-US-00001 SELECT [DISTINCT] select_list FROM from_clause [WHERE search_condition] [GROUP BY column {, column} [HAVING search_condition]] [ORDER BY column {, column}]
where select_list is a list of items to be provided as results for the query; from_clause is a list of tables containing data being accessed by the query, and may optionally include certain join information for the query; search_condition may specify any combination of conditions or restrictions to form a conditional test; and column is a column (or alias) belonging to a table defined in a database.
In the above syntax for the SELECT statement, [ ] and { } each indicate zero or one instance or occurrence of the item within the bracket. Thus, the items within the [ ] and { } brackets are optional.
(18) Queries are normally run against one or more databases, each of which may include one or more tables. Each table is uniquely identified by a table name and typically includes a number of columns that contain the data for that table. Each column of a table is also uniquely identified by a column name. However, columns of different tables may have the same name, in which case they may be uniquely identified by the combination of table and column names (e.g., TableX.ColumnY).
(19) A query may ask for data in raw form, such as the data in a column of a table. A query may also ask for data to be aggregated in some defined manner. For example, the query may ask for the sum of all revenues for each product. The level of aggregation may be specified in the query (e.g., sum of revenues for each product, for each product and in each year, for each year, and so on). Aggregates are described in further detail below.
(20) As shown above, the SELECT statement for a query includes a number of clauses, some of which are optional. The FROM clause includes a list of tables from which the results of the query may be obtained.
(21) The SELECT clause includes a list of one or more items to be provided as the results of the query. Each select list item may be (1) a column in a table identified by the FROM clause (e.g., revenue), (2) an aggregate, which is typically a particular operation or function to be performed on a specified column (e.g., sum(revenue)), or (3) an expression, which is an operation to be performed on each entry in one or more specified columns (e.g., revenue+5). One column is provided in a results table for the query for each item in the select list. Thus, the SELECT clause defines the format of the results table for the query.
(22) The WHERE clause includes one or more conditions or restrictions used to limit the query results. Thus, only data that satisfies the specified conditions is included in the results table. The conditions in the WHERE clause may be viewed as filters for the data obtained by the SELECT clause.
(23) The GROUP BY clause includes one or more columns by which aggregates, if any, in the SELECT clause are computed. The aggregation is performed such that one aggregated value is provided for each unique combination of the elements in the column(s) specified in the GROUP BY clause. For example, if the SELECT clause includes an aggregate sum(revenue) and the GROUP BY clause includes product, then the sum of revenue is computed for each unique product. The level of aggregation is thus specified by the column(s) in the GROUP BY clause.
(24) The HAVING clause includes one or more conditions or restrictions for aggregates. The conditions in the HAVING clause may be viewed as filters for the data obtained by aggregates in the SELECT clause.
(25) The ORDER BY clause is used to arrange the order of the columns in the output table.
(26) As an example, a SELECT statement for a first query may be written as follows:
(27) TABLE-US-00002 select year, product, sum(revenue) from time, products, facts where year > 1997 group by year
The results of this query may be as follows:
(28) TABLE-US-00003 YEAR PRODUCT SUM(REVENUE) 1998 Coke 1100 1998 Pepsi 1100 1999 Coke 1150 1999 Pepsi 1150 2000 Coke 1400 2000 Pepsi 1400
This query includes one aggregate: sum(revenue), and the GROUP BY clause includes one column: year. Thus, the values in the revenue column are summed for all products for each year.
(29) In one specific SQL instantiation, if the GROUP BY clause is not included in the SELECT statement, then the level of aggregation is determined by the non-aggregate columns in the SELECT clause. For example, if the SELECT statement for the first query did not include the GROUP BY clause, then the level of aggregation would be product and year (i.e., the aggregation is performed for each product of each year).
(30) The SELECT statement for a second query may be written as follows:
(31) TABLE-US-00004 select year, product, sum(revenue) from time, products, facts where year > 1997 group by product, year
The results of this query may be as follows:
(32) TABLE-US-00005 YEAR PRODUCT SUM(REVENUE) 1998 Coke 500 1998 Pepsi 600 1999 Coke 600 1999 Pepsi 550 2000 Coke 800 2000 Pepsi 600
This query is the same as the first query and includes one aggregate: sum(revenue). However, the GROUP BY clause includes two columns: product and year. Thus, the values in the revenue column are summed for each product of each year.
(33) The SELECT statement for a third query may be written as follows:
(34) TABLE-US-00006 select year, product, sum(revenue) from time, products, facts where year > 1997 group by product, year having sum(revenue) > 800
This query is similar to the second query, except that only the results for the product/year combinations with the sum of revenue exceeding 800 are provided in the results table.
(35) Different instantiations of SQL may support different features not present in the standard SQL. One specific extended SQL instantiation is described below having various capabilities not supported by standard SQL. This extended SQL has the ability to perform functional aggregates such as rank, median, topn, bottomn, and running/moving aggregates. This extended SQL may further have the ability to, for regular aggregates such as sum, output aggregates with multiple levels of aggregation in a single row of the results table. For example, the extended SQL would allow for the following items in the select list: country, region, sum(sales by country, region), and sum(sales by country). Standard SQL does not have the ability to support the previous select list. This extended SQL also allows users to place conditions in the WHERE and HAVING clauses interchangeably. The query compiler would then determine the appropriate place for each clause. In much of the following description, the extended SQL is assumed to be used and all conditions are deemed to be placed in the WHERE clause.
(36) Extended SQL also supports another syntax in addition to the one used for standard SQL. For example, the SELECT statement for the first query described above may also be written as: select year, product, sum(revenue by year)
where the GROUP BY clause is effectively included in the aggregate.
(37) Referring back to
(38) The above examples show the use of aggregates in queries. One fast way to process a query is to skip the bulk of the processing and use precomputed results. Aggregate tables are examples of such and contain precomputed results for a particular aggregation level. For example, an aggregate table may store sales results for each product by month, when the granularity of detail for the database may be at the day level. To create this aggregate table, a process (often a query) computes the results and then stores them in a table in the database.
(39) As noted above, the processing of queries may require a long time and further consume large amounts of resources. Query caching may be used to store the results of prior queries in a cache so that these results may thereafter be used to obtain results for new queries. If a new query can be answered using the stored results (a cache hit), then the cost of evaluating this query is considerably cheaper than having to pose the query anew against one or more databases (a cache miss). The throughput and performance of the system would be determined by the cache hit rate.
(40) A query caching system typically includes a number of components, two of which are cache hit detection and query plan generation. The cache hit detection component determines whether or not a given new query can be answered by any one of the cached queries. If there is a cache hit, then the query plan generation component generates a query plan for the new query in terms of a selected cached query with the cache hit. The query plan will then be processed to obtain the results for the new query based on the results for the selected cached query.
Cache Hit Detection
(41) A major design tradeoff for a query caching system is the cost incurred for processing each new query to detect a cache hit versus the benefits of reduced response time and resources usage whenever a cached query is used to answer the new query. On one extreme, a cache hit detection scheme may require an exact match between a new query and a cached query to be declared a cache hit. An exact match would occur if the elements in each clause of the SELECT statements for the two queries are identical, including the WHERE clause conditions. To determine exact match, the expression tree for the SELECT statement for each cached query may be viewed as a key. An index structure populated with the keys for all cached queries may then be compared with the key for the new query to determine whether or not there is an exact match. Exact match is relatively simple to evaluate but is also the most restrictive, and cache misses may be erroneously declared in instances where cached queries may actually be used to answer the new query.
(42) A cache hit detection scheme that can identify cache hits based on an inexact match between the new query and cached queries requires more processing but may be able to provide improved performance by increasing the cache hit rate. An inexact match cache hit may be determined by evaluating the new query against a cached query based on the semantics of queries. For a given query, the semantics specify both the operations to be performed for that query and the order for performing these operations to obtain the results. After evaluating the new query against the cached query, it may be possible to ascertain whether the results for the new query could be obtained from the results for the cached query.
(43) The computational semantics for a query may be expressed as follows: 1) Evaluate the FROM clause to determine the data sources (perform any joins). 2) Retrieve the columns identified in the SELECT clause. 3) Filter the rows of the retrieved columns in accordance with the WHERE clause. 4) Divide the results of the filtering into groups specified in the GROUP BY clause (the GROUP BY clause may be implicit in the SELECT statement). This requires evaluation of grouping columns in the select list. 5) Process all aggregations in the select list (including regular aggregations and functional aggregations, which are described below). 6) Retain all groups from step 5 that satisfy the HAVING clause. (This also includes WHERE clause filters that reference a regular or functional aggregation.) 7) Remove duplicates if DISTINCT is specified in the SELECT clause. 8) Order the results from step 7 in a results table according to the ORDER BY clause.
(44) Whether or not the new query can be answered by a given cached query may be ascertained by performing steps 1 through 7 described above for the new query and the cached query. As illustrated above, the processing to determine inexact match may be complicated.
(45) The cache may be designed to store a large number of queries. In that case, another important design consideration is the selection of queries stored in the cache to evaluate against the new query to detect for cache hits. If the processing to evaluate a cached query for cache hit is simple, as is the case for exact match, then the new query may be evaluated against each cached query, one at a time, until a cache hit is declared. This is often referred to as the nave strategy. However, if the processing to evaluate a cached query for cache hit is more complicated, as may be the case for inexact match, and if the number of queries stored in the cache is large, then the costs of traversing through all cached queries to search for a cache hit may be excessive.
(46) Techniques are provided herein to efficiently determine whether or not a new query may be answered by a cached query. These techniques improve query caching performance by (1) selecting cached queries for evaluation in an efficient manner, and (2) increasing the cache hit rate by allowing for inexact matches. These techniques may be advantageously used in query caching systems that store a large number of queries in the cache.
(47)
(48) Initially, a new query is received for processing (step 208). A first list of candidate cached queries that potentially may be used to answer the new query is then determined based on items in the select list (step 212). If the cache stores a large number of cached queries, then a large amount of resources may be required to evaluate each cached query to determine whether or not there is a cache hit. Techniques are provided herein to efficiently compare the new query against the cached queries to trim down the number of candidate cached queries for evaluation. The first list may be determined as described in further detail below.
(49) If the first list is not empty, as determined in step 214, then the new query is evaluated against each candidate cached query in the first list, one at a time, to determine whether or not the new query can be answered by the cached query (step 216). This evaluation may be performed as described in further detail below. If there is a cache hit for any cached query, as determined in step 218, then a cache hit is declared (step 230) and a query plan is generated for the new query based on the cached query with the cache hit (step 232). Multiple cache hits may be possible for the candidate cached queries in the first list. However, the evaluation in step 216 may stop after the first cache hit has been detected. The process then terminates.
(50) If the first list is empty, as determined in step 214, or if no cache hit is achieved for any of the candidate cached queries in the first list, as determined in step 218, then a second list of candidate cached queries is determined by rewriting certain aggregates, if any, included in the new query (step 222). The new query may include one or more aggregates that may not be exactly like the ones in the cached queries. Techniques are provided herein to efficiently determine whether or not each aggregate in the new query may be answered by aggregates in the cached queries even if these aggregates are not exactly alike, as described in further detail below.
(51) If the second list is not empty, as determined in step 224, then the new query is evaluated against each candidate cached query in the second list, one at a time, to determine whether or not the new query can be answered by the cached query (step 226). If there is a cache hit for any of the cached queries, as determined in step 228, then a cache hit is declared (step 230) and a query plan is generated for the new query based on a selected cached query with the cache hit (step 232). Otherwise, if the second list is empty, as determined in step 224, or if no cache hit is achieved for any of the candidate cached queries in the second list, as determined in step 228, then a cache miss is declared (step 240). In this case, the new query is run against the databases (step 242) and the new query and its results are stored in the cache for possible use by future queries (step 244). After steps 232 and 244, the process terminates.
(52) As shown in
(53) As also shown in
(54) Numerous modifications may be made to the process shown in
Candidate List formed by Select List Item Matching
(55) In an aspect, the first list of candidate cached queries to evaluate for cache hit is determined based on items in the select list. As shown above, the SELECT clause for each query includes a list of one or more items, where each item may be column, an aggregate, or an expression. A cached query may be included in the first candidate list if each item in the select list of the new query has an exact match in the select list of the cached query.
(56)
(57) In the embodiment shown in
(58) For each entry in the select list index, field 312 stores a specific select list item and field 314 stores a pointer to each results table whereby the associated query includes this select list item. For example, since employeeid is included in the SELECT clause of cached queries 1, 4, and 5, the data item for employeeid includes pointers to the results tables for queries 1, 4, and 5. Thus, each unique select list item is associated with a data item that is the set of cached queries containing that particular item.
(59) In one embodiment, whole expressions are stored in the select list index. For this embodiment, an expression such as productid+5 would be stored in its entirety in the table.
(60) In another embodiment, subexpressions are stored in the select list index. Each expression may be decomposed into subexpressions, and each subexpression may then be stored in the select list index. For the above example, the expression productid+5 can be decomposed into productid and 5, and productid may be stored in the table. Productid would then be deemed an inexact match for expressions such as productid+5 and productid*2 since these expressions may be computed from productid by the addition of 5 and the multiplication of 3, respectively.
(61) If an expression includes multiple items stored in the select list index, the set of cached queries for this expression would be the intersection of the sets of cached queries for all items. For example, the set of cached queries for the expression productid+siteid would be the intersection of the set of cached queries containing productid and the set of cached queries containing siteid, since both items are needed to compute productid+siteid.
(62) The select list index is updated as necessary to ensure that its contents remain accurate. In particular, whenever an existing query is deleted from the cache, the select list index is updated so that each select list item that references the deleted query is revised or removed from the table. Similarly, whenever a new query is stored in the cache, each select list item included in the new cached query is revised or added to the index.
(63) The select list index in
(64)
(65) Initially, the first item in the select list of the new query is identified (step 412). The set of cached queries containing this select list item is then obtained by looking up the select list index (step 414). This can be achieved by looking up the key corresponding to this select list item and retrieving the data item associated with this key. A candidate set is then initialized to include all cached queries contained in the set just obtained from the select list index (step 416).
(66) A determination is then made whether or not the candidate set is empty (step 418). If the answer is yes, then the process proceeds to step 428. This is because any additional select list item can only reduce the candidate set but never enlarge it, and an empty candidate set at any stage would indicate that there are no possible candidate cached queries for the new query.
(67) If the candidate set is not empty, as determined in step 418, then a determination is made whether or not all select list items for the new query have been considered (step 420). If the answer is no, then the next item in the select list of the new query is identified (step 422), and the set of cached queries containing this select list item is obtained by looking up the select list index (step 424). The set of all cached queries that contains all select list items considered thus far may then be determined by intersecting the set just obtained from the select list index with the candidate set (step 426). The new candidate set would include only those cached queries, if any, that are present in both the prior candidate set and the set just obtained. The process then returns to step 418.
(68) If the candidate set is empty, as determined in step 418, or if all select list items for the new query have been considered, as determined in step 420, then the cached queries included in the candidate set are provided as the first list of candidate cached queries to be evaluated (step 428). This first list may be a null list if no cached query contains all of the select list items included in the new query. The process then terminates.
(69) The process shown in
The select list for this query includes two items: employeeid and rank(revenue). Using the select list index shown in
(70) In the above description, the first list of candidate cached queries is determined based on exact match of select list items. Items for other clauses in the SQL statement may also be stored in an index, in conjunction with or as substitute for the select list items, and this is within the scope of the invention. For example, if a sufficient number of tables or data sources are accessed, then these may also be stored in the index. For example, one entry may be provided in the index for each unique item in the FROM clause. The items in the FROM clause of the new query may then be processed in similar manner as that described above for the select list items.
Candidate List formed using Aggregate Rewrite
(71) Queries often include aggregates. Each aggregate typically denotes a particular operation (e.g., sum) to be performed on a particular data column with a particular level of aggregation, all of which are specified in the SELECT statement for the query.
(72) The set of all aggregates supported by a particular SQL instantiation may be classified into a number of different categories based on the properties of the resultant data generated by the aggregate operation or function. These categories include regular aggregates, distributive aggregates, and functional aggregates. Distributive aggregates include operations such as SUM, COUNT, MAX, and MIN. Regular aggregates are a superset of distributive aggregates (e.g., average is a regular aggregate that is not a distributive aggregate). Functional aggregates include operations such as RANK, PERCENTILE, TOPN, BOTTOMN, and MAVG. Of these categories, distributive aggregates are of special interest because the results of a distributive aggregate operation may be subjected to the same operation with a coarser level of aggregation and still produce the correct results.
(73) For example, a cached query may contain the aggregate sum(revenue by productid, year), which sums revenue for each product and in each year. A new query may then ask for the aggregate sum(revenue by year), which sums revenue for each year. In this case, the results for the new query can be obtained from the results for the cached query by performing an additional aggregation over all products for each year. Similarly, if another query asks for the aggregate sum(revenue by productid), which sums revenue for each product, then the results for this query can be obtained from the results for the cached query by performing an additional aggregation over all years for each product. The aggregate sum(revenue by year) and sum(revenue by productid) in the two new queries may be rewritten in terms of the aggregate sum(revenue by productid, year) in the cached query.
(74) The process of aggregating over a coarser level of aggregation is also referred to as aggregate roll-up. Aggregate roll-up can be validly applied for certain aggregates, such as distributive aggregates, to produce the correct results. Aggregate roll-up would produce incorrect results if applied to some other aggregates. For example, the aggregate avg(revenue by productid) could not be obtained from the aggregate avg(revenue by productid, year) unless additional information regarding revenue is available.
(75) Since distributive aggregates can be rolled up, they are good candidates for being rewritten if they appear in the new query. In one embodiment, using a nave strategy, each distributive aggregate in the new query is checked against all aggregates in all cached queries for a possible rewrite. This embodiment can provide satisfactory performance when the number of cached queries is not too large. In another embodiment, the search for aggregates in the cached queries for a possible rewrite of a distributive aggregate in the new query is made using an aggregate index.
(76)
(77) In the embodiment shown in
(78) The aggregate index is updated as necessary to ensure that its contents remain accurate. In particular, whenever an existing query is deleted from the cache or a new query is stored to the cache, the aggregate index is updated so that aggregates, if any, included in the deleted or added query are deleted from or added to the index.
(79)
(80) In the first step of the process, the set of cached queries for each matched select list item in the new query is obtained using the select list index (step 602). A matched select list item is one that is found in the select list index, and an unmatched select list item is one that is not found in the index.
(81) The set of aggregates is then obtained for each unmatched select list item in the new query using the aggregate index (step 604). The aggregate set for each unmatched select list item is further operated on to obtain the set of cached queries for the select list item. In particular, for each aggregate set, the set of cached queries for each aggregate in the set is first obtained using the select list index, and a union is then performed on the sets of cached queries for all aggregates in the set to obtain the set of cached queries for the unmatched select list item (step 606).
(82) A set of cached queries is thus obtained for each matched select list item in step 602 and a set of cached queries is obtained for each unmatched select list item in steps 604 and 606. An intersection is then performed on all of these sets of cached queries to provide the second list of candidate cached queries for the new query (step 608).
(83) Aggregate rewrite can increase the search space of candidate cached queries. In an embodiment, to prevent a possible explosion in the size of the search space, a rewrite is attempted for an aggregate (e.g., aggregate(qtysold by sited)) only if an exact match for the aggregate cannot be found in the select list index. Moreover, the increase in the search space may be minimized by choosing a specific order in which to process the items in the select list of the new query. Since only aggregates are candidates for rewrites and since rewrites can increase search space, they may be processed last.
(84)
(85) For process 222b, a candidate set is first initialized to include all queries stored in the cache (step 610). The first item in the select list of the new query is then identified (step 612). A determination is then made whether or not this select list item is found in the select list index (i.e., whether or not this is a match or unmatched select list item) (step 614). If the select list item is found in the select list index, then the set of cached queries containing this select list item is obtained by looking up the select list index (step 616).
(86) Otherwise, if the select list item is not found in the select list index, then a determination is made whether or not the item is an aggregate (step 622). If the answer is no, then the select list item is deemed to not be included in any of the cached queries, and the process proceeds to step 644 where a null second list is provided for the new query. Otherwise, if the select list item is an aggregate, then the item being aggregated upon is determined (step 624). The set of aggregates in the cached queries that contain or aggregated over this item is then obtained by looking up the aggregate index (step 626).
(87) A temporary set is next initialized to null (step 628). The first aggregate in the aggregate set is then identified for consideration (step 630) and this aggregate is evaluated to see whether or not it can be used for a rewrite of the select list aggregate in the new query currently being processed (step 632). A rewrite would be possible if the select list aggregate merely roll ups the results of the aggregate under consideration. As part of the rewrite evaluation, a check is performed to ensure that all columns identified in the select list aggregate are included in the select list of the cached query. For example, if aggregate(qtysold by siteid) is to be rewritten using aggregate(qtysold by siteid, productid), then the item productid needs to be present in the select list of the cached query.
(88) If a rewrite is not possible, as determined in step 634, then the process proceeds to step 640. Otherwise, if a rewrite is possible, then the set of cached queries containing the aggregate under consideration is obtained by looking up the select list index (step 636), and a union is performed between this set of cached queries and the temporary set. The union is performed because any of the aggregates that can be used for a rewrite may thereafter be selected for the select list aggregate.
(89) A determination is then made whether or not all aggregates in the aggregate set have been considered (step 640). If the answer is no, then the next aggregate in the set is identified (step 642), and the process returns to step 632 to evaluate this aggregate. Otherwise, if all aggregates in the set have been considered, then the process proceeds to step 652.
(90) A set of cached queries is obtained for each match select list item in step 616 and a set of cached queries (which is the temporary set) is obtained for each unmatched select list item in steps 622 through 642. An intersection is then performed between the set of cached queries obtained for the select list item currently being processed and the candidate set (step 652).
(91) If the candidate set is not empty (as determined in step 654) and if all select list items for the new query have not been considered (as determined in step 656), then the next item in the select list of the new query is identified (step 658), and the process returns to step 614 to process this select list item. Otherwise, if the candidate set is empty or if all select list items for the new query have been considered, then the cached queries included in the candidate set are provided as the second list of candidate cached queries for the new query (step 660). This second list may be a null list if no cached query contains all of the select list items (or rewrite candidates) included in the new query. The process then terminates.
(92) In an embodiment, aggregate rewrites may also be performed on subexpressions. For example, consider the expression sum(revenue by country)*2. This expression may be decomposed into sum(revenue by country) and 2. The subexpression sum(revenue by country) may then be written in terms of the aggregate sum(revenue by country, year).
(93) The operation of processes shown in
The select list for this query includes two items: siteid and sum(revenue by siteid). The first select list item siteid is used to index the select list index and, for the example index shown in
(94) Using the example aggregate index shown in
(95) The select list index is then accessed to obtain (1) the set of cached queries containing the first aggregate in the set sum(revenue by siteid, productid), which may be, for example, {1, 2, 7}, and (2) the set of cached queries containing the second aggregate in the set sum(revenue by siteid, month), which may be, for example, {3, 9}. The union of these two sets would be {1, 2, 3, 7, 9}, which is the set of cached queries that support the second select list item sum(revenue by siteid) in the new query.
(96) The set obtained for the first select list item {2, 3} is then intersected with the set obtained for the second select list item {1, 2, 3, 7, 9} to provide the second list of candidate cached queries {2, 3} that supports all select list items of the new query.
(97) The aggregate index shown in
Evaluation of Candidate Cached Query with Exact Match for Select List Items
(98) The candidate selection process described above ensures compatibility between the select list of the new query and the select list of each candidate cached query. However, each query may specify one or more filters via the WHERE and HAVING clauses. These filters limit the results provided for the query based on the conditions or restrictions specified by the filters. Thus, each candidate cached query would need to be evaluated to ensure that the filters, if any, specified by the cached query did not filter out any data needed by the new query. This evaluation is then to verify that the filters present in the new and cached queries are compatible.
(99) For standard SQL, the WHERE clauses cannot contain aggregates. In some other SQL instantiations, such as the extended SQL described above, aggregates may be allowed in the WHERE and HAVING clauses, in which case the evaluation would need to account for any aggregates that may appear in these clauses. The server can be designed to automatically determine where each predicate should be placed. In the following analysis, an assumption is made that all filters are present in the WHERE clause prior to being appropriately placed.
(100) Filters are often defined using WHERE clause conjuncts, which can be described as follows. Consider an SQL query:
(101) TABLE-US-00007 SELECT lastname, firstname FROM employees WHERE gender = male AND age > 50
This query asks for the lastname and firstname of all male employees with age greater than 50. In this query, gender=Male and age>50 are WHERE clause conjuncts. They are referred to as conjuncts because both conditions have to be satisfied for an employee's name to satisfy this WHERE clause condition. This is because the two conditions are joined with the AND operator. If these conditions had been joined with the OR operator, then they would be referred to as WHERE clause disjuncts. Any logical expression can be normalized so that it consists of a sequence of conjuncts. The following description assumes that logical expressions are normalized before they are processed.
(102)
(103) Initially, a determination is made whether or not the WHERE clause for the new query exactly matches that of the target query (step 710). If the answer is yes, which is the simplest case, then there is a cache hit (step 714).
(104) Otherwise, if the WHERE clauses do not exactly match, then a determination is made whether or not all non-matching WHERE clause conjuncts reference aggregates (step 720). A WHERE clause conjunct is said to reference an aggregate if the aggregate is used in the WHERE clause conjunct (e.g., sum(revenue by productid)>1000). Step 720 effectively determines if there is a HAVING clause constraint. If the answer is yes, then a determination is made whether or not the WHERE clause contains rewrite AND the matching filter has aggregate (step 722). There is a cache miss if the answer is yes (step 724) and a cache hit if the answer is no (step 726).
(105) If the answer to step 720 is no, then a determination is made whether or not all non-matching WHERE clause conjuncts reference non-aggregates (step 730). Step 730 effectively determines if there is a WHERE clause constraint. If the answer is yes, then a determination is made whether the select list or the WHERE clause contains any aggregate (step 732). There is a cache miss if the answer is yes (step 734) and a cache hit if the answer is no (step 736).
(106) If the answer to step 730 is no, then a determination is made whether the select list or the WHERE clause contains any aggregate (step 742). There is a cache miss if the answer is yes (step 744) and a cache hit if the answer is no (step 746). The process then terminates.
(107) The algorithm shown in
(108) For clarity, specific examples are provided in Appendix A to illustrate the criteria described above in
(109)
(110) Initially, a determination is made whether or not the WHERE clause for the new query exactly matches that of the target query (step 810). If the answer is yes, then a determination is made whether or not the WHERE clause contains a functional aggregate (step 820). If the answer is also yes, then the following three criteria apply. First, there is a cache hit if all inexact matches in the select list reference non-aggregates (steps 822 and 824). Second, there is a cache miss if one or more inexact matches in the select list reference a functional aggregate where this aggregate is above the hit (i.e., the functional aggregate is computed from the cached rows) (steps 826 and 828). Third, there is a cache hit if the select list does not contain a rewrite (steps 832 and 834) and a cache miss otherwise.
(111) If the answer to step 820 is no, then a determination is made whether or not the WHERE clause contains no aggregates (step 840). There is a cache hit if the answer is yes (step 842). If the answer to step 840 is no, then a determination is made whether or not the WHERE clause contains regular aggregates (step 850). If the answer is yes, then there is a cache hit if the select list does not contain an aggregate rewrite (steps 852 and 854) and a cache miss otherwise (step 856).
(112) In
(113) In
(114) If the answer to step 880 is no, then a determination is made whether or not the non-matching WHERE clause conjuncts contain a mix of aggregates and non-aggregates (step 890). If the answer is yes, then the same two criteria for step 880 also apply, as shown in
(115) The algorithm shown in
(116) For clarity, specific examples are provided in Appendix A to illustrate the criteria described above in
Cached Query Selection
(117) Given a new query (Q) and a set of cached queries (Q1, Q2 . . . QN), a determination needs to be made as to which specific cached query to use to generate the results for the new query. One or more cached queries may be hits and would be possibilities. Different amounts of processing may be required to be performed on the results for different cached queries to obtain the results for the new query. In one embodiment, the first candidate cached query that hits is selected for use to generate the results for the new query. In a second embodiment, some or all cached queries that hit may be further evaluated to determine the best one to use. The first embodiment may be used if the benefit from using the best cached query is outweighed by the cost of determining the best one.
Query Plan Generation
(118) Once a specific cached query has been selected for use to generate the results for the new query, the query plan generation phase is entered (step 232 in
(119) In the query plan generation, if an expression in the select list of the new query contains an aggregate and there is an exact match for this expression in the select list index, then the expression is treated as a column index and not as an aggregate by the query compiler. For example, the new query may contain the expression rank(qtysold)+5 in the select list. If there is an exact match for this expression in the select list index, then the expression is rendered in the new plan as a column index of rank(qtysold)+5 in the target query's select list (which may be 1, 2, 3, and or on).
(120) A major issue in query plan generation is to ensure that the correct order of filter execution is preserved. In an embodiment, filter execution order is preserved by inserting additional levels of nesting and moving filters where appropriate. In one implementation, rewrite rules for an expanded summary filter examine predicates in the detail filter and move them to the summary filter if appropriate. A summary filter is one normally included in the HAVING clause in standard SQL, and a detail filter is one normally included in the WHERE clause in standard SQL. The rewrite rules also replace summary filter lists by converting them to detail filters in a newly added layer of nesting. In general, the query plan produced by the plan generation should be equivalent to a query plan produced without query caching.
Examples of Evaluation of Cached Queries for Cache Hit
(121) A number of examples are provided in Appendix A to illustrate the evaluation performed in
Computer System
(122)
(123) Memory subsystem 912 may include a RAM 932 and a ROM 934 used to store codes and data that implement various aspects of the invention. For example, memory subsystem 912 may be used for metadata store 160 and possibly cache 170 in
(124) Input device interface 916 provides interface with various input devices such as a keyboard 952, a pointing device 954 (e.g., a mouse, a trackball, a touch pad, a graphics tablet, a scanner, or a touch screen), and other input device(s) 956. Output device interface 918 provides an interface with various output devices such as a display 962 (e.g., a CRT or an LCD) and other output device(s) 964. Network interface 920 provides an interface for system 900 to communicate with other computers coupled to network 112.
(125) Many other devices or subsystems (not shown) may also be coupled to system 900. In addition, it is not necessary for all of the devices shown in
(126) Headings are included herein for reference and to aid in locating certain sections. These headings are not intended to limit the scope of the concepts described therein under, and these concepts may have applicability in other sections throughout the entire specification.
(127) The foregoing description of the specific embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein, and as defined by the following claims.
Appendix A
Example 1
(128) Cache query: SELECT employeeid, percentile(qtysold) FROM employee
(129) WHERE rank(employeeid) in (2,5). New query: SELECT employeeid, percentile(qtysold) FROM employee
(130) WHERE rank(employeeid) in (2, 5) Result: Cache hit. (Step 714 in
Example 2
(131) Cache query: SELECT siteid, productid, qtysold FROM sales
(132) WHERE rank(qtysold)<10. New query: SELECT siteid, productid, qtysold FROM sales
(133) WHERE rank(qtysold)<10 AND aggregate(qtysold BY siteid)>1000 Result: Cache miss. (Step 724 in
(134) The aggregate in the WHERE clause is a rewrite.
Example 3
(135) Cache query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(136) FROM sales
(137) WHERE rank(qtysold)<10. New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(138) FROM sales
(139) WHERE rank(qtysold)<10 AND aggregate(qtysold BY siteid)>1000 Result: Cache hit. (Step 726 in
Example 4
(140) Cache query: SELECT employeeid, percentile(qtysold) FROM employee. New query: SELECT employeeid, percentile(qtysold) FROM employee
(141) WHERE rank(employeeid) in (2, 5) Result: Cache hit. (Step 726 in
Example 5
(142) Cache query: SELECT employeeid, lastname, rank(revenue) FROM employee
(143) WHERE rank(qtysold)<5. New query: SELECT employeeid, lastname, rank(revenue) FROM employee
(144) WHERE rank(qtysold)<5 and rank(revenue)<4. Result: Can be a cache hit. (Step 726 in
Example 6
(145) Cache query: SELECT revenue, qtysold, rank(revenue) FROM employee
(146) WHERE rank(revenue)<5. New query: SELECT revenue, rank(revenue) FROM employee
(147) WHERE qtysold<1000.0 and rank(revenue)<5 Result: Can be a cache hit. (Step 726 in
Example 7
(148) Cache query: SELECT employeeid, percentile(qtysold) FROM employee New query: SELECT employeeid, percentile(qtysold) FROM employee
(149) WHERE employeeid in (2, 3, 5) Result: Cache miss. (Step 734 in
Example 8
(150) Cache query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(151) FROM sales
(152) WHERE rank(qtysold)<10 New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(153) FROM sales
(154) WHERE rank(qtysold)<10 AND productid<5 Result: Cache miss. (Step 734 in
Example 9
(155) Cache query: SELECT siteid, productid, qtysold FROM sales
(156) WHERE rank(qtysold)<10 New query: SELECT siteid, productid, qtysold, FROM sales
(157) WHERE rank(qtysold)<10 AND productid<5 Result: Cache miss. (Step 734 in
Example 10
(158) Cache query: SELECT employeeid, lastname FROM employee New query: SELECT employeeid, lastname FROM employee
(159) WHERE lastname =King Result: Cache hit. (Step 736 in
Example 11
(160) Cache query: SELECT lastname, qtysold FROM employee New query: SELECT lastname, qtysold FROM employee
(161) WHERE lastname =Smith Result: Can be a cache hit. (Step 736 in
Example 12
(162) Cache query: SELECT employeeid, qtysold, revenue FROM employee
(163) WHERE rank(qtysold)<10 New query: SELECT employeeid, qtysold, cast(employeeid as char(5))
(164) FROM employee
(165) WHERE rank(qtysold)<10 Result: Can be a cache hit. (Step 824 in
Example 13
(166) Cache query: SELECT employeeid, qtysold, revenue FROM employee
(167) WHERE rank(qtysold)<10 New query: SELECT employeeid, qtysold, rank(revenue) FROM employee
(168) WHERE rank(qtysold)<10 Result: Cache miss. (Step 828 in
Example 14
(169) Cache query: SELECT revenue FROM employee
(170) WHERE rank(qtysold)<5 New query: SELECT revenue, percentile(revenue) FROM employee
(171) WHERE rank(qtysold)<5 Result: Cache miss. (Step 828 in
Example 15
(172) Cache query: SELECT revenue FROM employee
(173) WHERE rank(revenue)<5 New query: SELECT revenue, rank(revenue) FROM employee
(174) WHERE rank(revenue)<5 Result: May be a cache hit. (Step 828 in
Example 16
(175) Cache query: SELECT siteid, productid, qtysold FROM sales
(176) WHERE rank(qtysold)<10 New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(177) FROM sales
(178) WHERE rank(qtysold)<10 Result: Cache miss. (Step 836 in
Example 17
(179) Cache query: SELECT siteid, productid, qtysold FROM sales
(180) WHERE rank(qtysold)<10 New query: SELECT siteid, productid, qtysold, log(qtysold) FROM sales
(181) WHERE rank(qtysold)<10 Result: Cache hit. (Step 834 in
Example 18
(182) Cache query: SELECT employeeid, qtysold, revenue FROM employee
(183) WHERE employeeid<5 New query: SELECT employeeid, qtysold, cast(employeeid as char(5)), rank(qtysold) FROM employee
(184) WHERE employeeid<5 Result: May be a cache hit. (Step 842 in
Example 19
(185) Cache query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(186) FROM sales
(187) WHERE aggregate(qtysold BY siteid)>1000 New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid), aggregate(qtysold BY) FROM sales
(188) WHERE aggregate(qtysold BY siteid)>1000 Result: Cache miss. (Step 856 in
Example 20
(189) Cache query: SELECT siteid, productid, qtysold FROM sales
(190) WHERE aggregate(qtysold BY siteid)>1000 New query: SELECT siteid, productid, qtysold, log(qtysold) FROM sales
(191) WHERE aggregate(qtysold BY siteid)>1000 Result: Cache hit. (Step 854 in
Example 21
(192) Cache query: SELECT revenue FROM employee
(193) WHERE rank(revenue)<5 New query: SELECT revenue, rank(revenue) FROM employee
(194) WHERE rank(revenue)<3 Result: Can be a cache hit. (Step 864 in
Example 22
(195) Cache query: SELECT revenue FROM employee
(196) WHERE rank(revenue)<5 New query: SELECT revenue, percentile(revenue) FROM employee
(197) WHERE rank(revenue)<3 Result: Cache miss. (Step 864 in
Example 23
(198) Cache query: SELECT siteid, productid, qtysold FROM sales
(199) WHERE rank(qtysold)<10 New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(200) FROM sales
(201) WHERE rank(qtysold)<10 AND aggregate(qtysold BY siteid)>1000 Result: Cache miss. (Step 872 in
Example 24
(202) Cache query: SELECT siteid, productid, qtysold FROM sales
(203) WHERE productid<10 New query: SELECT siteid, productid, qtysold , aggregate(qtysold BY siteid)
(204) FROM sales
(205) WHERE productid<10 AND rank(qtysold)<20 Result: Cache hit. (Step 876 in
Example 25
(206) Cache query: SELECT revenue FROM employee
(207) WHERE rank(revenue)<5 New query: SELECT log(revenue), revenue/1000.0, revenue FROM employee
(208) WHERE rank(revenue) in (1,3) Result: Cache hit. (Step 868 in
Example 26
(209) Cache query: SELECT employeeid, rank(qtysold), revenue FROM employee
(210) WHERE employeeid<5 and rank(revenue)<6 New query: SELECT employeeid, rank(qtysold), revenue FROM employee
(211) WHERE employeeid in (2,4) and rank(revenue)<6 Result: Cache miss. (Step 884 in
Example 27
(212) Cache query: SELECT employeeid, rank(qtysold), revenue FROM employee
(213) WHERE employeeid<5 and rank(revenue)<6 New query: SELECT employeeid, rank(qtysold), percentile(revenue) FROM employee
(214) WHERE employeeid in (1,2,4) and rank(revenue)<6 Result: Cache miss. (Step 884 in
Example 28
(215) Cache query: SELECT siteid, productid, qtysold FROM sales
(216) WHERE rank(qtysold)<100 New query: SELECT siteid, productid, qtysold, aggregate(qtysold BY siteid)
(217) FROM sales
(218) WHERE rank(qtysold)<100 AND aggregate(qtysold BY siteid)>1000 Result: Cache miss. (Step 884 in
Example 29
(219) Cache query: SELECT employeeid, rank(qtysold), revenue FROM employee
(220) WHERE employeeid<5 and rank(revenue)<6 New query: SELECT employeeid, log(revenue) FROM employee
(221) WHERE employeeid in (1,2,4) and rank(revenue)<6 Result: Cache hit. (Step 888 in