Patent classifications
G06F16/2386
Fast database loading with time-stamped records
For a first record of a batch of records, a first timestamp may be determined to be stored with the first record in a database into which the batch of records are to be loaded as part of a database loading process. For each remaining record of the batch of records, a future timestamp may be generated using the first timestamp, until a final timestamp of a final record of the batch of records is generated. For a load completion time at which the database loading process completes, prior to the final timestamp, a wait time until a batch completion time may be determined by comparing the load completion time and the final timestamp, and waiting for the wait time to reach the batch completion time. If the load completion time is at or after the final timestamp, the batch completion time may be reached at the load completion time.
METHOD AND APPARATUS FOR INGESTING DATA, DEVICE AND STORAGE MEDIUM
The present disclosure provides a method and apparatus for ingesting data, a device, a storage medium and a program product, and particularly to the field of computer technology, the field of intelligent traffic, and the like. A specific implementation comprises: deploying a first data ingestion system at a first data source, the first data ingestion system comprising a first read plugin, a first transmission plugin and a first write plugin; reading data from the first data source through the first read plugin; writing the data into a first transmission buffer queue through the first transmission plugin; and acquiring the data from the first transmission buffer queue through the first write plugin and writing the data into a first storage component. The data ingestion system in the present disclosure is divided into a read part, a transmission part and a write part. Each part works by means of a plugin.
TUNING EXTERNAL INVOCATIONS UTILIZING WEIGHT-BASED PARAMETER RESAMPLING
Techniques are disclosed for tuning external invocations utilizing weight-based parameter resampling. In one example, a computer system determines a plurality of samples, each sample being associated with a parameter value of a plurality of potential parameter values of a particular parameter. The computer system assigns weights to each of the parameter values, and then selects a first sample for processing via a first external invocation based on a weight of the parameter value of the first sample. The computer system then determines feedback data associated with a level of performance of the first external invocation. The computer system adjusts the weights of the parameter values of the particular parameter based on the feedback data. The computer system then selects a second sample of the plurality of samples to be processed via execution of a second external invocation based on the adjustment of weights of the parameter values.
Adaptively updating databases of publish and subscribe systems using optimistic updates
An example system includes a processor to receive updates for a database of a publish and subscribe system. The processor is to generate an optimistic update based on the received updates. The processor is also to adaptively update the database using the optimistic update. In some examples, the processor generates a bulk optimistic update based on the optimistic update and adaptively updates the database using the bulk optimistic update or a read-modify-write sequence.
System and method for initiating bulk inserts in a distributed database
Described herein is a system and method for initiating a bulk insert in a distributed database. The system described herein provides for a client to identify target index servers of the data records by examining the data records. The system identifies a partition criterion of the target index servers and partitions the data records based on a partition criterion, into subsets of data records. The system builds buckets including the subsets of data records that correspond to a partition. The system inserts each bucket including the subset of the data corresponding to a specific partition to the respective target index server where the specific partition is physically located and inserts the subset of data records in the table locally. The system inserts each bucket in each respective target index server in parallel.
SOURCE-ADAPTED DATA RETRIEVAL FOR MULTI-TENANT SYSTEM
A method includes: maintaining, at an integration computing device, a scheduling parameter corresponding to a source computing device maintaining a plurality of source data records with record identifiers; receiving, from the source computing device, a plurality of modification indicators, each modification indicator generated at the source computing device in response to modification of a source data record, and containing the record identifier of the source data record; storing the modification indicators in a repository; according to the scheduling parameter, periodically obtaining a subset of the source data records from the source computing device, by: retrieving the modification indicators from the repository; generating a bulk record request containing the plurality of record identifiers from the retrieved modification indicators; transmitting the bulk record request to the source computing device; and responsive to transmitting the bulk record request, receiving the subset of source data records from the source computing device.
AUTOMATED BATCH GENERATION AND SUBSEQUENT SUBMISSION AND MONITORING OF BATCHES PROCESSED BY A SYSTEM
Automated batch generation and subsequent submission and monitoring of batches processed by a system is disclosed. A plurality of records to be submitted to a system for processing is accessed. A plurality of record groups is generated, each record group corresponding to a different subset of records. A plurality of batch transaction records is generated. Each batch transaction record corresponds to one of the plurality of record groups, and includes a subset identifier that references a subset of records of a record group to which the batch transaction record corresponds, and a status field. A first batch transaction record that identifies a first subset of records is accessed. The status field of the first batch transaction record is set to a submitted value, and instructions are sent to the system to process the first subset of records.
Computerized system for programmatic mapping of record lineage based on data flow through data storage components
An apparatus includes processing circuitry and a memory storing instructions that, when executed by the processing circuitry, cause the apparatus to identify a plurality of components and a data flow that interconnects the plurality of components. The instructions cause the apparatus to determine a lineage of a record generated by the plurality of components based on the data flow. The lineage indicates the data flow from a first component to a second component of the plurality of components to generate the record. The instructions cause the apparatus to present, to a user, a visual depiction of the lineage of the record. The visual depiction indicates the data flow of a query through at least the first component and the second component of the plurality of the components to generate the record.
DETECTION AND RESOLUTION OF DATA SYNCHRONIZATION CONFLICTS
A computer-implemented method for performing data synchronization between a plurality of computing devices within a communication network includes detecting, by a first computing device of the plurality of computing devices, a changed data field of a plurality of available data fields within a first database table tuple. A bitmap within a plurality of configuration fields of the first database table tuple is updated to indicate a bit location associated with the changed data field. Synchronization data including the changed one or more data fields, the bitmap, and a version number of the first database table tuple within the plurality of configuration fields is sent to a second computing device. A conflict resolution procedure is performed based on a synchronization response received from the second computing device. The synchronization response includes status information associated with the synchronization of the changed data field
Content processing management system and method
Systems and methods that offer significant improvements to current content processing management systems for heterogeneous and widely distributed content sources are disclosed. The proposed systems and methods are configured to provide a framework and libraries of extensible components that together are designed enable creation of solutions to acquire data from one or more content repositories, possibly distributed around the world across a wide range of operating systems and hardware, process said content, and publish the resulting processed information to a search engine or other target application. The proposed embodiments offer an improved architecture that incorporates manager nodes and worker (processing) nodes, where worker nodes are configured to scan and process data, while manager nodes are configured to handle all allocation of work (including throttling) and control state and failover. Such an arrangement enables the system to perform with greater scalability and reliability.