Patent classifications
G06F16/24554
SYSTEMS AND METHODS FOR DYNAMIC PARTITIONING IN DISTRIBUTED ENVIRONMENTS
Methods, systems, and computer-readable media are disclosed for dynamic partitioning in distributed computing environments. One method includes: receiving a first data set and a second data set; mapping the first data set into a first set of key-value pairs; mapping the second data set into a second set of key-value pairs; estimating, using a sketch, a frequency count for each key based on the first set of key-value pairs and the second set of key-value pairs; determining whether the estimated frequency count for each key is greater than or equal to a predetermined threshold; and partitioning the key when the estimated frequency count for the key is greater than or equal to the predetermined threshold.
DATA ARRANGEMENT MANAGEMENT IN A DISTRIBUTED DATA CLUSTER ENVIRONMENT OF A SHARED POOL OF CONFIGURABLE COMPUTING RESOURCES
Disclosed aspects relate to data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources. In the distributed data cluster environment, a set of data is monitored for a data redistribution candidate trigger. The data redistribution candidate trigger is detected with respect to the set of data. Based on the data redistribution candidate trigger, the set of data is analyzed with respect to a candidate data redistribution action. Using the candidate data redistribution action, a new data arrangement associated with the set of data is determined. Accordingly, the new data arrangement is established.
Scalable, schemaless document query model
Query models for document sets (such as XML documents or records in a relational database) typically involve a schema defining the structure of the documents. However, rigidly defined schemas often raise difficulties with document validation with even inconsequential structural variations. Additionally, queries developed against schema-constrained documents are often sensitive to structural details and variations that are not inconsequential to the query, resulting in inaccurate results and development complications, and that may break upon schema changes. Instead, query models for hierarchically structured documents that enable “twig” queries specifying only the structural details of document nodes that are relevant to the query (e.g., students in a student database having a sibling named “Lee” and a teacher named “Smith,” irrespective of unrelated structural details of the document). Such “twig” query models may enable a more natural query development, and continued accuracy of queries in the event of unrelated schema variations and changes.
Partition-aware distributed execution of window operator
Partition-aware calculation of a window operator can be supported. Different nodes can calculate window function sub-results on database partitions locally, in parallel and independently. Recognition of scenarios in which such parallelism is permissible can be performed. Overall superior performance can result.
Targeted sweep method for key-value data storage
A computer-implemented method for targeted sweep of a key-value data storage is provided. The method comprises before a write transaction to a database having a key value store commits, and before each of one or more write commands of the write transaction are persisted to the key value store, writing an entry for each of the one or more write commands to an end of a targeted sweep queue, the entry comprising metadata including: data identifying a cell to which the write command relates, a start timestamp of the write transaction, and information identifying a type of the write transaction.
SYSTEM AND METHODS FOR PROCESSING LARGE SCALE DATA
A clustered system is provided for querying large amounts of data at fast speed allowing for variable sampling and speculation to speed up subsequent queries. An API using actor messages is provided to the user to be able to send SQL queries, the desired sample rater and the cube schema in which the user believes all queries in this session should fit. The underlying data store is agnostic and can utilize any system that supports aggregation.
COMPUTER SYSTEM AND METHOD FOR INDEXING AND RETRIEVAL OF PARTIALLY SPECIFIED TYPE-LESS SEMI-INFINITE INFORMATION
A system for Partial Unstructured Information Processing, constituting storing, indexing, querying and retrieval of partially specified unstructured data, the system comprising: Quantum Clustering Algorithm that partitions data records in different dusters such that the data in each cluster can be indexed efficiently, a Compressed Ternary Tree that replaces all conceivable indices for each cluster thereby solving the Unthinkable Query Problem for each cluster, and a Virtual Query Processor that converts traditional data base queries to raw Compressed Ternary Tree queries and appropriate filters.
SYSTEMS AND METHODS FOR EXTENDING THE DATA MODEL OF A MONOLITHIC DATABASE THROUGH A MICROSERVICE FOR A MULTI-TENANT PLATFORM
A multi-tenant system comprises a monolithic database storing global records, each including global fields common for all tenants; a custom field database storing custom records, each including custom fields for a tenant; a custom field record service processing a custom record storage request by instructing the custom field database to store custom field values of the custom record for the tenant, and processing a custom record fetch request by instructing the custom field database to retrieve the custom field values; a monolithic application configured to receive a record storage or fetch request, configured to partition the record storage request into the global record storage request and the custom record storage request, configured to send the custom record storage request to the custom field record service, configured to partition the record fetch request into the global record fetch request and the custom record fetch request, and configured to send the custom record fetch request to the custom field record service.
Analytics query response transmission
Transmission handling of analytics query response includes a search head, in a data intake and query system, receiving a query from an analytics system. The search head distributes at least a portion of the query to at least one indexer for processing the query. The at least one indexer transmits, bypassing the search head, and to the analytics system, events matching the query. The search head receives from the at least one indexer, data regarding the events, and sends the data regarding the events to the analytics system.
System and method for accelerated data search of database storage system
Embodiments of the present disclosure provide a system for accelerated data search of a database storage system. The system includes a host device including a database storage engine; and a memory system including a controller and a memory device, which includes a plurality of pages storing multiple records. The controller includes a page processing accelerator configured to: read, from the plurality of pages, multiple pages in response to a filtered read command; filter particular pages among the multiple pages based on a column full search condition, the filtered pages including entries satisfying the column full search condition; and transfer, to the host device, information regarding the filtered pages.