G06F16/2272

COMPUTER SYSTEM, INFERENCE METHOD, AND NON-TRANSITORY MACHINE-READABLE MEDIUM
20220398473 · 2022-12-15 · ·

A computer system manages a data set of learning data. The computer system is configured to: generate, in a case where a plurality of pieces of input data including the value of the explanatory variable and forming a time series are received, groups by arranging a plurality of pieces of the learning data in time-series order and grouping the plurality of pieces of the learning data in predetermined time widths; execute, for each of a plurality of the groups, index calculation processing of calculating a selection index of sampling of the learning data; select the plurality of pieces of the learning data from the data set based on the selection index; learn the model by using the selected plurality of pieces of the learning data; and output a predicted value of each of the plurality of pieces of input data by using the model.

ITERATIVE PERFORMANCE ANALYSIS WITH INTERVAL EXPANSION
20220398238 · 2022-12-15 ·

Performance data characterizing operations of an application may be collected by time interval, and a plurality of keys may be associated with each element of the performance data. A first time interval may be received. An iterative group-and-filter search may be executed against the keyed elements within the first time interval, each iteration including an iteration key used to perform a key-based grouping operation followed by a group-based filter operation, wherein each iteration key is added to a composite key at each iteration. A selection of at least one keyed element within the first time interval and obtained from the iterative group-and-filter search may be received. A second time interval that precedes the first time interval may be received, and the keyed elements may be filtered using the composite key and within the second time interval to return the at least one keyed element within the second time interval.

Incremental dynamic document index generation

A contextual index compendium that includes contextual index item generation rules that define document index entry generation transforms usable to transform text of the documents into embedded document index entries of document indexes within the documents is obtained by a processor. Using the document index entry generation transforms defined within the contextual index item generation rules in association with a document that includes embedded document index entries that are both embedded at locations of associated text distributed throughout the document and added as part of a document index within the document, new text of the document is programmatically transformed into at least one new document index entry in response to determining that at least one portion of the new text includes candidate text that is not already indexed within the existing embedded document index entries and the document index within the document.

System and method for preparing a data set for searching

Embodiments of the present disclosure relate to systems and methods for preparing a data set for searching. In addition, embodiments of the present disclosure relate to solutions for configuring a storage infrastructure and indexing process for a data set.

Computer-based systems for data entity matching detection based on latent similarities in large datasets and methods of use thereof

At least some embodiments are directed to an entity matching detection system. The entity matching detection system includes a latent similarity identification machine learning model that receives one or more data records and generates a final similarity score indicative of a latent similarity between the one or more data records and a second data record. The entity matching detection system can identify lexical and semantic similarities between attribute values and can analyze and compute similarity scores for direct-linked attribute values and cross-linked attribute values extracted from different data records.

Computer-readable recording medium recording index generation program, information processing apparatus and search method

A non-transitory computer-readable recording medium records an index generation program for causing a computer to execute processing of: inputting data which is described by a combination of an item and a value; and generating index information regarding an appearance position of each of the item and the value for each of the item and the value which are included in the data.

Remote control of a change data capture system

The present disclosure relates to a control system for remotely controlling a change data capture (CDC) system. The CDC system comprises a source computing system and target computing system. The target computing system is configured to store a copy of data of the source computing system. The source computing system and the target computing system are configured to execute coordinated actions using predefined agents in order identify a change to data of the source computing system and to propagate, and store the change to the target computing system. The control system is configured for dynamically installing User-Defined Functions, UDF functions, in the source and target systems in order to control the agents to perform the predefined actions.

Multicriteria record linkage with surrogate blocking keys

A computer-implemented method and a related system for record linkage of an incoming record to a reference data set may be provided. The method comprises providing a reference data set comprising a plurality of records, each record comprising a plurality of attributes. The method comprises further assigning each of the plurality of records an initial surrogate identifier value, assigning a plurality of block identifiers to each of the records by applying a locality sensitive hashing function to a predefined attribute of the records, resulting in the plurality of the block identifiers, and determining a final surrogate identifier value to each of the records assigned to one of the blocks such that the final surrogate identifier values in each block are uniformly distributed.

PRESERVING METADATA CONTEXT IN A HYBRID CLOUD CONTEXT

A technique for retaining a context in which data resides independently of a data store from which the data originates is disclosed. In relation to a method aspect of the technique, a computer-implemented method provides data with related first metadata, both originating from a data store and extracts the data and the related first metadata independently from the data store. A universal unique identifier of a portion of the data to which portion specific first metadata exists is created as part of the related first metadata. The universal unique identifier of the portion of the data is integrated into the related first metadata, thereby creating modified first metadata as an independently manageable and linkable representation of the related first metadata.

Staggered merging in log-structured merge forests
11514014 · 2022-11-29 · ·

At least one aspect of the present disclosure is directed to a systems and methods of maintaining key-value stores. The method can include establishing a first run of data records indexed by a key value. The method can include tracking, using an index, a merging of the data records of the first run onto a merge level on a database. The method can include establishing, concurrent to the merging of the first run, a second run of data records indexed by a key value. The method can include determining that the index tracking the merge of the data records of the first run onto the merge level satisfies a quantile condition. The method can include adding the subset of the second plurality of records of the second run to the merging of the first plurality of records of the first run onto the merge level maintained on the database.