Patent classifications
G06F16/2282
Processing of computer readable tables in a datalake
Systems and methods for identifying one or more master tables of a datalake are described. A system may obtain a plurality of computer readable tables of a datalake (with each computer readable table including one or more features). The system may also group the plurality of computer readable tables into a plurality of groups based on a number of features of each computer readable table of the plurality of computer readable tables. The system may further generate, for each of one or more groups of the plurality of groups, one or more neighborhoods based on a similarity of features between computer readable tables of the group. The system may also identify, for each neighborhood, one or more master tables from the one or more computer readable tables of the group. The system may further provide an indication of one or more master tables identified in the datalake.
Dynamic updating of query result displays
Described are methods, systems and computer readable media for dynamic updating of query result displays.
Dynamic performance tuning based on implied data characteristics
Techniques for improving system performance based on data characteristics are disclosed. A system may receive updates to a first data set at a first frequency. The system selects a first storage configuration, from a plurality of storage configurations, for storing the first data set based on the first frequency, and stores the first data set in accordance with the first storage configuration. The system may further receive updates to a second data set at a second frequency. The system selects a second storage configuration, from the plurality of storage configurations, for storing the second data set based on the second frequency, and stores the second data set in accordance with the second storage configuration. The second storage configuration is different than the first storage configuration.
Database with client-controlled encryption key
A distributed database encrypts a table using a table encryption key protected by a client master encryption key. The encrypted table is replicated among a plurality of nodes of the distributed database. The table encryption key is replicated among the plurality of nodes, and is stored on each node in a respective secure memory. In the event of node failure, a copy of the stored key held by another member of the replication group is used to restore a node to operation. The replication group may continue operation in the event of a revocation of authorization to access the client master encryption key.
Accelerating change data capture determination using row bitsets
Techniques described herein can accelerate change data capture determinations such as stream reads, which show changes made to a table between two points in time. Three distinct row bitests that mark deleted, updated, inserted, rows in micro-partitions can be added as metadata for the table. These bitsets can be generated during DML operations and then stored as metadata of the new partition generated by the DML operations. The bitsets can then be used to generate streams showing the changes in the table between two points in time (changes interval).
File defragmentation service
The subject technology selects a most recently created file from a set of files stored in a source table. The subject technology iterates, in the source table, starting from the most recently created file up to an age threshold to select a first set of files for performing a first defragmentation process. The subject technology sets an indication corresponding to a particular file that is a last file, from the first set of files, that meets the age threshold. The subject technology performs the first defragmentation process on the selected first set of files. The subject technology determines that the first defragmentation process was successful.
Maintenance of clustered materialized views on a database system
A cluster view method of a database to perform compaction and clustering of database objects, such as database materialized view is shown. The database can comprise a cache to store changes to storage units of tables of the database objects. The cluster view method can implement clustering to remove data based on the cache and clustering to group the data of the materialized view.
Systems and methods for a multi-hierarchy physical storage architecture for managing program and outcome data
In some aspects, the disclosure is directed to methods and systems for data storage and retrieval from a computer memory. A computing device may store a first hierarchical data structure having a first sequence of sub-data structures and a second hierarchical data structure having a second sequence of sub-data structures in memory. The computing device may link the first hierarchical data structure and the second hierarchical data structure together. The computing device may link the first hierarchical data structure and the second hierarchical data structure by inserting an identifier of a sub-data structure of the second sequence in the first sequence.
Compression, searching, and decompression of log messages
Log messages are compressed, searched, and decompressed. A dictionary is used to store non-numeric expressions found in log messages. Both numeric and non-numeric expressions found in log messages are represented by placeholders in a string of log “type” information. Another dictionary is used to store the log type information. A compressed log message contains a key to the log-type dictionary and a sequence of values that are keys to the non-numeric dictionary and/or numeric values. Searching may be performed by parsing a search query into subqueries that target the dictionaries and/or content of the compressed log messages. A dictionary may reference segments that contain a number of log messages, so that all log message need not be considered for some searches.
Identifying data relationships from a spreadsheet
Proposed are concepts for identifying data relationships from a spreadsheet. Such a concept may transform formulae by replacing the variables in each formula with descriptive labels. This may, for example, expressing the transformed formulae in terms that have more meaning to a user, the facilitating understanding and/or analysis that would otherwise not be possible with the existing tools.