Patent classifications
G06F11/3086
Continuous data quality assessment and monitoring for big data
A data quality assessment and monitoring tool addresses inconsistency in large data sets from differing sources, determining data quality attributes such as completeness, conformity, validity, and accuracy. Flexible taxonomies and rollup strategies accommodate diverse business unit needs across a complex enterprise, and provides insight into individual entities' performance. An exemplary tool comprises a data importer for importing data from a data lake; a rules manager for generating rules and rule sets; a scoring engine for generating data quality scores; a job manager; a data profiler for running data assessment tasks and collating the data quality scores for a plurality of hierarchical data entity units; a hierarchical scoring aggregator for aggregating sets of data quality scores into a plurality of first tier aggregate data quality scores and to further aggregate the first tier aggregate data quality scores into one or more second tier aggregate data quality scores; and a reporting component.
MANAGEMENT AND REMEDIATION OF DATABASE ISSUES
Systems and methods are described identify a database metric value associated with a database instance storing a dataset associated with a user system. A database issue is detected in view of a determination that the database metric value satisfies a condition. In response to satisfaction of the condition, a set of user action metrics associated with the user system is collected from one or more data monitoring systems. At least one notification communication is generated including at least a portion of the set of user action metrics and information identifying the database issue. The at least one notification communication is transmitted to a remediation execution system configured to execute, using the at least a portion of the set of user action metrics and information identifying the database issue, a remedial action in response to the database issue.
Multi-partitioning for combination operations
Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to avoid a significantly imbalanced partition.
DETERMINING INSIGHTS RELATED TO PERFORMANCE BOTTLENECKS IN A MULTI-TENANT DATABASE SYSTEM
Methods, systems, apparatuses, and computer program products are described. A system, such as a multi-tenant database system, may store tenant-specific observability data for multiple tenants of the system. The system may detect an inefficiency related to a performance metric for a tenant of the multiple tenants based on a subset of the data associated with the tenant and corresponding to a threshold time window. In some examples, the system may analyze the subset of the data for the threshold time window to determine an insight indicating a cause of the inefficiency. The system may determine a suggested action for the tenant based on the insight indicating the cause of the inefficiency, and the system may send, for display at a user interface of a user device, an indication of the insight and the suggested action, the user device operated by a user associated with the tenant.
Global inline name space verification for distributed file systems
Embodiments for providing global inline name space verification for a distributed file system in a network of a metadata server coupled to a plurality of data servers by taking a global dataless snapshot of a namespace of the distributed file system; walking all of the files in the namespace for each data server and the metadata server to generate parsed information; combining, by an XOR operation, the parsed information into data blocks for each server; obtaining a checksum of each data block of the data blocks; comparing actual and expected checksums from the metadata server and all of the data servers; and generating an alert if a comparison of any actual and expected checksums do not match.
Automated alert augmentation for deployments of software-defined storage
Methods, apparatus, and processor-readable storage media for automated alert augmentation for deployments of software-defined storage are provided herein. An example computer-implemented method includes obtaining an alert from at least one software-defined storage device; determining one or more items of additional information pertaining to one or more of the alert and the at least one software-defined storage device; augmenting the alert based at least in part on the one or more determined items of additional information; generating a modified version of the augmented alert by incorporating, into the augmented alert, dependency information pertaining to the at least one software-defined storage device and one or more additional software-defined storage devices; and performing one or more automated actions based at least in part on the modified version of the augmented alert.
Prototype-based machine learning reasoning interpretation
In some examples, a prototype model that includes a representative subset of data points (e.g., inputs and output classifications) of a machine learning model is analyzed to efficiently interpret the machine learning model's behavior. Performance metrics such as a critic fraction, local explanation scores, and global explanation scores are determined. A local explanation score capture an importance of a feature of a test point to the machine learning model determining a particular class for the test point and is computed by comparing a value of a feature of a test point to values for prototypes of the prototype model. Using a similar approach, global explanation scores may be computed for features by combining local explanation scores for data points. A critic fraction may be computed to quantify a misclassification rate of the prototype model, indicating the interpretability of the model.
Processing data object modifications in a networked storage system
Disclosed are various embodiments for triggering a processing of a data object stored in a bucket in a data store. In one embodiment, a cloud computing resource comprising one or more processors in communication with a data store receives a notification event generated in response to a modification of a data object in one or more buckets. The cloud computing resource determines, responsive to the received notification event, that a data object stored in a bucket in the data store has been modified. The cloud computing resource triggers a processing of the data object by executable code associated with the data object. The executable code causes a notification message, processing of the modified data object, or other user specified processes within a platform as a service cloud computing offering.
Preserving system integrity using file manifests
A system is described for preserving integrity of computing devices. A manifest that uniquely identifies files on a computing device is periodically captured from the computing device. The manifest is compared against a reference manifest, which represents an ideal or clean state of the device. If the manifest comparison indicates that there have been changes to the contents of the computing device, the system can determine whether the changes constitute a compromise to the endpoint's integrity. If it is determined that a change constitutes a compromise to the endpoint's integrity, the system can perform certain remedial actions, such as sending a message to an administrator or enforcing a base layer onto the device so that the content of the device is replaced with a clean image.
System for generating electronic alerts through predictive optimization analysis of resource conversions
A system generates electronic alerts through predictive analysis of resource conversions. The system may continuously monitor executed resource transfers to generate historical resource transfer data. Based on the historical resource transfer data, the system may generate a predicted outcome of executing transfers of resources in a first format compared to transfers of resources in a second format. The predicted outcome may then be implemented by the system to select a resource format for transfers occurring in the future and/or at specified intervals.