Patent classifications
G06F16/254
DATA-SHARDING FOR EFFICIENT RECORD SEARCH
Data-sharding systems and/or methods for cost- and time-efficient record search are described. Data-sharding embodiments utilize a name-sharding dimension, optionally in combination with one or more additional dimensions such as record type and year, to reduce latency and reduce search-associated costs. The data-sharding systems and methods embodiments utilize an optimization algorithm to determine a distribution of records related to names. The optimization algorithm may use a three-character prefix for surnames in records to distribute shards across documents, with specific shards relating to no-name and multi-name records allocated.
INTERACTIVELY BUILDING PREVIEWS OF EXTRACT, TRANSFORM, LOAD (ETL) GRAPHS USING CACHED PREVIEWS OF SUBGRAPHS
Disclosed are some implementations of systems, apparatus, methods and computer program products for executing a process flow represented by a graph or portion thereof using cached subgraphs. A first request to execute a first portion of a process flow is processed, where the first portion of the process flow is represented by a first subgraph of a graph representing the process flow and a final node of the first subgraph corresponds to a set of computer-readable instructions. The first portion of the process flow is executed such that a first output of executing the first portion of the process flow is obtained. The first subgraph is stored in association with the first output in a first cache entry of a cache. A second request to execute a second portion of the process flow is processed, where the second portion of the process flow is represented by a second subgraph of the graph. At least one cache entry for which a corresponding subgraph matches at least a portion of the second subgraph is identified in the cache, where the at least one cache entry includes the first cache entry. The first output is retrieved from the first cache entry, a node of the second subgraph to which the final node of the first subgraph is connected is identified, and the second portion of the process flow is executed by providing the first output as input to the identified node of the second subgraph without executing the set of computer-readable instructions.
LOOKUP AND RELATIONSHIP CACHES FOR DYNAMIC FETCHING
Disclosed are methods, systems, and computer-readable medium for providing report results. Viscous attributes and non-viscous may be identified. A smart cube may be received and may include viscous values for the viscous attributes. The smart cube may be stored at a local cache. A report associated with an organization may be initiated. A runtime generation of the report may be generated based on initiating the report. The report may call a viscous attribute from the viscous attributes and call a non-viscous attribute from the non-viscous attributes. The runtime generation may be modified to remove the viscous attribute from the runtime generation. A viscous value for the viscous attribute may be retrieved from the smart cube at the local cache. The modified runtime generation may be executed to retrieve a non-viscous value for the non-viscous attribute from a remote database and a report result may be provided.
LOW LATENCY INGESTION INTO A DATA SYSTEM
Described herein are techniques for improving transfer of metadata from a metadata database to a database stored in a data system, such as a data warehouse. The metadata may be written into the metadata database with a version stamp, which is monotonic increasing register value, and a partition identifier, which can be generated using attribute values of the metadata. A plurality of readers can scan the metadata database based on version stamp and partition identifier values to export the metadata to a cloud storage location. From the cloud storage location, the exported data can be auto ingested into the database, which includes a journal and snapshot table.
Generating real-time aggregates at scale for inclusion in one or more modified fields in a produced subset of data
A data processing system for producing a subset of data from a plurality of data sources, including: memory storing a plurality of data sources to be represented in an editor interface; a data structure modification module that selects a plurality of data sources to be represented in an editor interface and generates a subset of data included in the plurality of data sources; memory that stores the selected data structures included in the subset, with at least one of the stored data structures including the one or more modified attributes of the one or more respective fields; rendering module that displays, in the editor interface, representations of the stored data structures; and a segmentation modules that segments a plurality of received data records.
Techniques for relationship discovery between datasets
The present disclosure related to techniques for analyzing data from multiple different data sources to determine a relationship between the data (also referred to herein a “data relationship discovery”). The relationships between any two compared datasets may be used to determine one or more recommendations for merging (e.g., joining), or “blending,” the data sets together. Relationship discovery may include determining a relationship between a subset of data, such as a relationship between a pair of columns, or column pair, each column in a different dataset of the datasets that are compared. Given two datasets to process for relationship discovery, relationship discovery may identify and recommends a ranked subset of column pairs between two compared datasets. The ranked column pairs identified as a relationship may be useful for blending the datasets with respect to those column pairs.
Self-service data provisioning system
A data exchange that provides self-service data provisioning is provided. The data exchange may include a raw data layer, a model data layer, a plurality of workspaces and a testing environment. The raw data layer may be a landing zone for raw data records received from systems of record. The raw data layer may receive a plurality of raw data records, model and process the data records and transfer the data records to the model data layer. The model data layer may be a data layer that includes data modeled to data exchange specifications and enables queries to be executed on the data included in the model data layer. Each workspace may be allocated to a consumer. The consumer may query the plurality of data records within the model data layer. The testing environment may test scripts to ensure that the scripts conform to a predetermined set of testing specifications.
Compatibility-based feature management for data prep applications
A method executes at a computing device having a display, processors, and memory. The device displays a user interface for a data preparation application, including icons in a flow element palette, each icon representing a parameterized operation that can be inserted into data preparation flows in a flow pane of the user interface. A user places icons into the flow pane, visually defining flow elements for a flow that extracts data from selected data sources, transforms the extracted data, and exports the transformed data. The device retrieves the version number of a corresponding server application running on a server. Using a feature matrix, the device determines which flow elements are not supported by the data prep server application according to the version number. When there are flow elements not supported by the data prep server application running on the server, the device indicates this to the user.
Merging buckets in a data intake and query system
Systems and methods are disclosed for processing and executing queries in a data intake and query system. An indexing system of the data intake and query system receives data and stores at least a portion of it in buckets, which are then stored in a shared storage system. The indexing system merges multiple buckets to generate merged buckets and uploads the merged buckets to the shared storage system.
Systems and methods for configuring system memory for extraction of latent information from big data
A system for extracting latent information from data includes obtaining or generating components of the data, where the data components include scores indicating how the component relates to the data. Memory is allocated for the components and the components are stored in the allocated memory. The components are then transformed into documents using a suitable transformation function, and the documents are analyzed using natural language processing, to extract latent information contained in the data.