Patent classifications
G06F16/215
Dynamic updating of query result displays
Described are methods, systems and computer readable media for dynamic updating of query result displays.
ELASTIC DATA SAMPLING IN A DATA PIPELINE
Various embodiments comprise systems and methods to sample data outputs of a data pipeline. In some examples, data monitoring circuitry monitors the data pipeline wherein the data pipeline receives an input data set, processes the input data set, and responsively generates and transfers an output data set. The data monitoring circuitry ingests the output data set, determines an amount of available computing resources, and selects an amount of the values from the output data set based on the amount of available computing resources. The data monitoring circuitry generates a quality score for the selected values based on a data quality, generates a confidence score based on the amount of the selected values, and reports the quality score and the confidence score.
ELASTIC DATA SAMPLING IN A DATA PIPELINE
Various embodiments comprise systems and methods to sample data outputs of a data pipeline. In some examples, data monitoring circuitry monitors the data pipeline wherein the data pipeline receives an input data set, processes the input data set, and responsively generates and transfers an output data set. The data monitoring circuitry ingests the output data set, determines an amount of available computing resources, and selects an amount of the values from the output data set based on the amount of available computing resources. The data monitoring circuitry generates a quality score for the selected values based on a data quality, generates a confidence score based on the amount of the selected values, and reports the quality score and the confidence score.
STRING ENTROPY IN A DATA PIPELINE
Various embodiments comprise systems and methods to determine entropy in strings generated by a data pipeline. In some examples, data monitoring circuitry monitors a data pipeline that ingests input data, processes the input data, and responsively generates and transfers a data string that comprises character groups. The data monitoring circuitry receives the data string, identifies character groups in the data string, identifies group types for the character groups, and assigns numbers to the character groups based on the group types. The data monitoring circuitry determines a probability distribution for the numbers, calculates entropy for the data string based on probability distribution, and generates an entropy histogram based on the entropy. The data monitoring circuitry compares the entropy histogram of the data string to another entropy histogram for another data string, determines a change in entropy, and reports the change in entropy.
System and method for automatic correction/rejection in an analysis applications environment
Systems and methods for automatic error rejection are provided. Systems and methods described herein bypass the creation of a staging table at the outset and, instead, attempt a direct merge from a source data location to a target data location. In the event that the merge fails, then a temporary/staging table can be loaded where errors can be logged, validations can be performed, and erroneous data can be corrected.
System and method for automatic correction/rejection in an analysis applications environment
Systems and methods for automatic error rejection are provided. Systems and methods described herein bypass the creation of a staging table at the outset and, instead, attempt a direct merge from a source data location to a target data location. In the event that the merge fails, then a temporary/staging table can be loaded where errors can be logged, validations can be performed, and erroneous data can be corrected.
System for detecting data relationships based on sample data
A method of identifying relationships between data collections is disclosed. Each data collection comprises a plurality of data records made up of data fields. The method comprises performing a relationship search process based on a first seed value and a second seed value. A first set of records from the data collections is identified based on the first seed value. A second set of records from the data collections is identified based on the second seed value. The process then searches for a common value across the first and second record sets, wherein the common value is a value which appears in a first field in a first record of the first record set and in a second field in a second record of the second record set, wherein the first record is from a first data collection and the second record is from a second data collection. In response to identifying the common value, an indication is output identifying a candidate relationship between the first field of the first data collection and the second field of the second data collection.
Systems and methods for verification of property records
Systems and methods for verification of public property records and other information associated with real estate properties compare information from different providers. The information is formatted in different provider-specific ways. The systems and methods enable comparisons through predetermined sets of textual manipulations that counteract or remedy differences in formatting, collection methodology, and data management practices.
Systems and methods for verification of property records
Systems and methods for verification of public property records and other information associated with real estate properties compare information from different providers. The information is formatted in different provider-specific ways. The systems and methods enable comparisons through predetermined sets of textual manipulations that counteract or remedy differences in formatting, collection methodology, and data management practices.
Destination file copying and error handling
Object service receives communication of fingerprints stream, corresponding to file segments, from file source, and identifies sequential fingerprints in fingerprints stream as fingerprints group. Object service identifies group identifier for fingerprints group, and communicates fingerprints group to deduplication service associated with group identifier range including group identifier. Deduplication service identifies fingerprints in fingerprints group which are missing from fingerprint storage, and communicates identified fingerprints to object service, which communicates request for file segments, corresponding to identified fingerprints, to file source. Deduplication service receives communication of requested segments from file source, and stores requested segments. System identifies generation identifier associated with time of communicating by object service or deduplication service and identifies generation identifier associated with another time of communicating by object service or deduplication service. If generation identifier associated with time differs from generation identifier associated with other time, object service or deduplication service restarts communication.