G06F16/2453

Compression, searching, and decompression of log messages
11593373 · 2023-02-28 · ·

Log messages are compressed, searched, and decompressed. A dictionary is used to store non-numeric expressions found in log messages. Both numeric and non-numeric expressions found in log messages are represented by placeholders in a string of log “type” information. Another dictionary is used to store the log type information. A compressed log message contains a key to the log-type dictionary and a sequence of values that are keys to the non-numeric dictionary and/or numeric values. Searching may be performed by parsing a search query into subqueries that target the dictionaries and/or content of the compressed log messages. A dictionary may reference segments that contain a number of log messages, so that all log message need not be considered for some searches.

Automated feedback and continuous learning for query optimization

In an approach to improve query optimization in a database management system, embodiments identify opportunities for improvement in a cardinality estimate using a workload feedback process using a query feedback performed during query compilation. Embodiments identify correlations and relationships based on the structure of the query feedback and the runtime feedback performed, and collects data from the execution of a query to identify errors in estimates of the query optimizer. Further, embodiments submit the query feedback and the runtime feedback to a machine learning engine to update a set of models. Additionally, embodiments update a set of models based on the submitted query feedback and runtime feedback, and output a new, updated, or re-trained model based on collected data from the execution of the query to identify the errors in estimates of the query optimizer, the submitted query feedback and the runtime feedback, or a trained generated mode.

Trimming blackhole clusters
11704315 · 2023-07-18 · ·

Disclosed are techniques for trimming large clusters of related records. In one embodiment, a method is disclosed comprising receiving a set of clusters, each cluster in the clusters including a plurality of records. The method extracts an oversized cluster in the set of clusters and performs a breadth-first search (BFS) on the oversized cluster to generate a list of visited records. The method terminates the BFS upon determining that the size of the list of visited records exceeds a maximum size and generates a new cluster from the list of visited records and adding the new cluster to the set of clusters. By recursively performing BFS traverse over the oversized cluster and extracting smaller new clusters from it, the oversized cluster is eventually partitioned into a set of sub-clusters with the size smaller than the predefined threshold.

Parallel branch operation using intermediary nodes

The disclosed implementations include a method performed by a data intake and query system. The method includes receiving a search query at a search head, the search query including a branching operation between sets of data, generating a first subquery and a second subquery corresponding to the sets of data for execution by a search node, generating instructions for an intermediary node to combine partial results of the first subquery and the second subquery and instructions to concurrently communicate the subqueries to a search node, and executing the query by providing the instructions for the intermediary node to the intermediary node and the subqueries to the search node, the intermediary node receiving sets of partial search results for the subqueries, performing at least a portion of the branching operation on the partial results, and communicating the combined results to another intermediary node or the search head.

AGGREGATION FRAMEWORK SYSTEM ARCHITECTURE AND METHOD

Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.

SYSTEM PERFORMANCE LOGGING OF COMPLEX REMOTE QUERY PROCESSOR QUERY OPERATIONS

Described are methods, systems and computer readable media for performance logging of complex query operations.

PARALLEL PROCESSING DATABASE SYSTEM

A method and system for executing database queries in parallel using a shared metadata store. The metadata store may reside on a master node, where the master node is the root node in a tree. The master node may distribute query plans and query metadata to other nodes in the cluster. These additional nodes may request additional metadata from each other or the master nodes as necessary.

Runtime metric estimations for functions

In some examples, a system receives function descriptors for different types of functions to be used when processing database queries, each function descriptor of the function descriptors comprising information relating to a respective function of the different types of functions. The system computes, based on a first function descriptor for a first function of the different types of functions, an estimate of a runtime metric associated with execution of the first function for processing a database query.

Method and Query Optimization Server for Associating Functions with Columns for Optimizing Query Execution
20180011901 · 2018-01-11 ·

A method of optimizing query execution by associating functions with columns includes receiving, by a query optimization server, data definition statement including information of one or more columns and function information for each of the one or more columns. The query optimization server associates the columns having the function information with corresponding predefined functions and stores in a memory. Upon receiving a query comprising a function associated to a column, the query optimization server compares the function with predefined functions stored in the memory. The query optimization server accesses the predefined function from the memory for executing the query based on the comparison.

Systems and methods for managing a highly available distributed hybrid transactional and analytical database

Systems and methods for managing a highly available distributed hybrid database comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: receive a query from a user device to retrieve data from a distributed database comprising a source node, a first plurality of replica nodes, and a second plurality of replica nodes, wherein the source node and the first plurality of replica nodes form a transactional cluster, and wherein the second plurality of replica nodes forms an analytical cluster; determine whether to process the query using the transactional cluster or the analytical cluster based on one or more rules; translate the query into a first protocol that the determined cluster comprehends; select a replica node corresponding to the determined cluster; process the query using the selected replica node; and send data associated with results from processing the query to the user device.