G06F16/24537

GENERATING CLASSIFICATION DATA VIA A QUERY PROCESSING SYSTEM

A database system is operable to determine a request to implement a K Nearest Neighbors (KNN) algorithm to generate classification data for a set of new records. A query operator execution flow is determined for the request that includes a KNN-join operator. A query resultant that indicates classification data for the set of new records is generated by performing a plurality of operator executions in accordance with the query operator execution flow based on, for each record of the set of new records, generating a plurality of similarity measures by performing a similarity function on the each record and each of a set of previously-classified records; identifying a proper subset of the set of previously-classified records that includes exactly a predefined number of records; and joining the each record with the proper subset of the set of previously-classified records.

PIPELINED SEARCH QUERY, LEVERAGING REFERENCE VALUES OF AN INVERTED INDEX TO ACCESS A SET OF EVENT DATA AND PERFORMING FURTHER QUERIES ON ASSOCIATED RAW DATA

Embodiments of the present disclosure provide techniques for using an inverted index in a pipelined search query. A field searchable data store is provided that comprises a plurality of event records, each event record comprising a time-stamped portion of raw machine data. Responsive to the reciept of an incoming search query, the search engine accesses an inverted index, wherein each entry in the inverted index comprises at least one field name, a corresponding at least one field value and a reference value associated with each field name and value pair that identifies a location in the data store where an associated event record is stored. Once the inverted index is accessed, it can be used to identify and search a subset of the plurality of event records, wherein the subset comprises one or more event records with corresponding reference values in the inverted index.

Dynamic Stream Operator Fission and Fusion with Platform Management Hints

Methods and apparatus, including computer program products, implementing and using techniques for data stream processing in a runtime data processing environment. A stream processing graph that includes several connected operators is received. Source code of the operators is analyzed to identify hints describing whether an operator contains data structures, method parameters or other data that can be applied in a parallelization data processing environment. Performance metrics of the data processing environment within parallel regions is evaluated to determine whether data processing resources can be dynamically scaled up or down. In response to determining that the data processing resources can be dynamically scaled up, one or more operators are split to be processed on two or more parallel processing resources. In response to determining that the data processing resources can be dynamically scaled down, one or more operators are combined to be processed on a single parallel processing resource.

Dynamic partition selection

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for dynamic partition selection. One of the methods includes receiving a representation of a query plan generated for a query, wherein the query plan includes a dynamic scan operator that represents a first computing node obtaining tuples of one or more partitions of a table from storage and transferring the tuples to a second computing node that executes a parent operator of the dynamic scan operator. A partition selector operator is generated corresponding to the dynamic scan operator. A location in the query plan is determined for the partition selector operator. A modified query plan is generated having the partition selector operator at the determined location.

SCALING QUERY PROCESSING RESOURCES FOR EFFICIENT UTILIZATION AND PERFORMANCE

Scaling of query processing resources for efficient utilization and performance is implemented for a database service. A query is received via a network endpoint associated with a database managed by a database service. Respective response times predicted for the query using different query processing configurations available to perform the query are determined. Those query processing configurations with response times that exceed a variability threshold determined for the query may be excluded. A remaining query processing configuration may then be selected to perform the query.

LOW-LATENCY DATABASE SYSTEM

A database system comprised of a decoupled compute layer and storage layer is implemented to store, build, and maintain a canonical dataset, a temporary buffer, and an edits dataset. The canonical dataset is a set of batch updated data. The data is appended in chunks to the canonical dataset such that the canonical dataset becomes a historical dataset over time. The buffer is a write ahead log that contains the most recent chunks of data and provides atomicity and durability for the database system. The edits dataset is the set of data that contains edits such as cell mutations, row appends and/or row deletions. The database system enables users to make cell or row-level edits to tables and observe those edits in analytical systems or downstream builds with minimal latency.

Database query optimization methods, apparatuses, and computer devices

A database query optimization computer-implemented method, medium, and system are disclosed. In one computer-implemented method, a data query request sent by a client device is received and parsed. An execution plan for executing the data query request is determined based on a parsing result. If the execution plan is a nested loop anti-join, whether there is a possibility that a to-be-queried field in a to-be-queried data table indicated by the data query request contains a NULL value is determined. If there is a possibility that the to-be-queried field contains a NULL value, a filter condition is generated and the execution plan is optimized based on the filter condition.

Joining multiple events in data streaming analytics systems
11669528 · 2023-06-06 · ·

A method is provided. The method includes determining whether second event data of a second data stream of the plurality of data streams is stored in a cache memory in accordance with a first key. The method further includes performing a join operation of the first event data and the second event data at least partially in view of whether the second event data is stored in the cache memory.

GraphQL management layer

Aspects of the invention include assessing, by a management layer executing on a first processor, a query from a client application requesting data from a server. The assessing occurs prior to the query being executed by a provider. The assessing includes extracting, by the management layer, characteristics of the query. The management layer compares the extracted query characteristics with a policy defined by the provider. Based at least in part on results of the comparing, it is determined by the management layer whether the query is permitted to be executed by the provider at the server. The management layer initiates execution of the query at the server in response to determining that the query is permitted to be executed. The management layer prevents execution of the query at the server in response to determining that the query is not permitted to be executed.

Graph Data Search Method and Apparatus
20170286484 · 2017-10-05 ·

A graph data search method and apparatus, where the method includes obtaining a query request including a query condition that carries a start graph node, the query request queries a first to-be-queried graph node matching the query condition from a graph data set, and the graph data set includes the start graph node, a plurality of to-be-queried graph nodes, an association relationship between the start graph node and the plurality of graph nodes, and an association relationship between each to-be-queried graph node and another graph node, filtering out, according to the query condition and a preset available resource condition, a second to-be-queried graph node that does not meet the query condition and an association relationship in the graph data set that includes the second to-be-queried graph node, and performing a query in the reduction subgraph using the query condition.