G06F16/258

DATA STRUCTURE MANAGEMENT SYSTEM
20230043217 · 2023-02-09 ·

A computing device generates a first token for first data content that is associated with a first relationship and a second relationship, and a second token for second data content that is associated with the first relationship and a third relationship, such that the first token and second token are generated based on a frequency of use of data values included in the first and the second data content. The computing device calculates a first similarity score of data values from third data content that is associated with the second relationship and a fourth relationship with data values from fourth data content that is associated with the third relationship and the fourth relationship in response to the first and second token matching. The computing device then performs, in response to the first similarity score satisfying a similarity threshold, a first modification to any of the data content.

Cognitive data discovery and mapping for data onboarding

Performing an operation comprising transforming an input dataset to a predefined format, extracting, from the transformed dataset, a plurality of features describing the transformed dataset, and generating, by a machine learning (ML) algorithm executing on a processor and based on an ML model, a plurality of rules for modifying the transformed dataset to conform with a first data model.

Centralized data management

A data management platform for managing interconnected data and its derivatives is disclosed. For one example of the present disclosure, the data management platform receives data assets to a data management platform through an API gateway. The data assets are reformatted based upon a corresponding data model. A set of data management features are accessed through a corresponding API. The set of data management modules includes tagging, ownership, relationship, cataloging, discovery, lineage and provenance, and lifecycle. The management modules provide dynamic identification of interconnections between the data assets. Interconnections for the data assets are generated and the data assets and the interconnection data are stored based upon a format of the data.

Generation of text from structured data

Implementations of the subject matter described herein provide a solution for generating a text from the structured data. In this solution, the structured data is converted into its representation, where the structured data comprises a plurality of cells, and the representation of the structured data comprises plurality of representations of the plurality of cells. A natural language sentence associated with the structured data may be determined based on the representation of the structured data, thereby implementing the function of converting the structured data into a text.

Pagination processing and display of data sets
11550814 · 2023-01-10 · ·

A method including receiving a request for a report on a data set. The method also includes providing the report. The report includes a macro page having a subset of the data set. The method also includes converting the macro page into a primary tree data structure having levels. The method also includes buffering the primary tree data structure in a buffer to form a buffered tree data structure. The buffered tree data structure is buffered in a level order of the levels. The method also includes selecting a first micro page from the buffered tree data structure. The first micro page is configured for display on a user interface. The method also includes transmitting, to the user interface, the first micro page.

Data-determinant query terms

Systems and methods are disclosed for flexibly applying a query term to heterogeneous data. A query system can receive a query that includes a data-determinant query term. As the system executes the query it can generate interim search results. As the system query processes the interim search results based on the query, it can apply the data-determinant query term to records of the interims search results based on the structure of the records.

System for detecting data relationships based on sample data

A method of identifying relationships between data collections is disclosed. Each data collection comprises a plurality of data records made up of data fields. The method comprises performing a relationship search process based on a first seed value and a second seed value. A first set of records from the data collections is identified based on the first seed value. A second set of records from the data collections is identified based on the second seed value. The process then searches for a common value across the first and second record sets, wherein the common value is a value which appears in a first field in a first record of the first record set and in a second field in a second record of the second record set, wherein the first record is from a first data collection and the second record is from a second data collection. In response to identifying the common value, an indication is output identifying a candidate relationship between the first field of the first data collection and the second field of the second data collection.

Systems and methods for verification of property records

Systems and methods for verification of public property records and other information associated with real estate properties compare information from different providers. The information is formatted in different provider-specific ways. The systems and methods enable comparisons through predetermined sets of textual manipulations that counteract or remedy differences in formatting, collection methodology, and data management practices.

Custodian disambiguation and data matching

Provided is a technique for matching different user representations of a person in a plurality of computer systems may be provided. The technique includes collecting information sets about user representations from a plurality of computer systems; normalizing the information sets to a unified format; grouping the information sets in the unified format into indexing buckets based on a user name using a non-phonetic algorithm; determining a similarity score for each pair of information sets in each of the indexing buckets; classifying each information set pair into a set of classes based on the similarity scores, wherein the set of classes comprise at least matches and non-matches; and using a data structure for merging information of information set pairs classified as matches.

Multi-tenant system for providing arbitrary query support

A method comprising receiving by an arbitrary query engine a user request to perform a query associated with user data including first data and second data; partitioning the query into first and second sub-queries; providing the first sub-query to a first service provider interface (SPI) integrated into a first service configured to operate on the first data in a first datastore, the first SPI including a common interface component configured based on a uniform access specification to facilitate external communication between the arbitrary query engine and the first SPI, and the first SPI including a first service interface component configured to transform between the uniform access specification and a first service data specification and to facilitate internal data management; obtaining from the first datastore the first data formatted according to the first service data specification; transforming the first data; and providing the transformed first data to the arbitrary query engine.