G06F16/81

SCHEMA VALIDATION WITH SUPPORT FOR ORDERING
20230014239 · 2023-01-19 ·

Computer-readable media, methods, and systems are disclosed for validating data associated with schemas. A user defines the object model of at least one asset and a first schema is generated in accordance with the defined object model, and a unique fingerprint is generated. Data is collected from one or more devices in accordance with the object model. The collected data is serialized, and a second schema is generated. The second schema is ordered in accordance with the first schema and a unique fingerprint is generated. The fingerprint of the first schema is compared to the fingerprint of the second schema to provide an efficient review process for determining whether the schemas are equal, and the associated data may be validated. A fingerprint cache may be updated with fingerprints associated with a plurality of schemas, as well as version history of each schema, to provide an efficient review process.

Parser for schema-free data exchange format

A method includes obtaining a query containing at least one field from which data is being queried, obtaining a dataset having a schema-free data exchange format having multiple fields of data at different physical positions in the dataset, and parsing the dataset by obtaining a structural index that maps logical locations of fields to physical locations of the fields of the dataset, accessing the structural index with logical locations of the fields that index to the physical locations, and providing data from the fields based on the physical locations responsive to the query.

Parser for schema-free data exchange format

A method includes obtaining a query containing at least one field from which data is being queried, obtaining a dataset having a schema-free data exchange format having multiple fields of data at different physical positions in the dataset, and parsing the dataset by obtaining a structural index that maps logical locations of fields to physical locations of the fields of the dataset, accessing the structural index with logical locations of the fields that index to the physical locations, and providing data from the fields based on the physical locations responsive to the query.

Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags

Computer-implemented systems and methods are disclosed to interface with one or more storage devices storing a plurality of documents, wherein each of the plurality of documents is associated with one or more tags of one or more predefined hierarchies of tags, wherein the one or more hierarchies of tags include multiple dimensions. In accordance with some embodiments, a method is provided to identify one or more documents from the data storage devices. The method comprises acquiring, via an interface, a selection of one or more tags of the one or more predefined hierarchies of tags. The method further comprises identifying one or more documents from the data storage devices in response to the selection, the identified one or more documents having tags that have a relationship with the selected tags, and providing data corresponding to the identified documents for displaying in the interface.

Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags

Computer-implemented systems and methods are disclosed to interface with one or more storage devices storing a plurality of documents, wherein each of the plurality of documents is associated with one or more tags of one or more predefined hierarchies of tags, wherein the one or more hierarchies of tags include multiple dimensions. In accordance with some embodiments, a method is provided to identify one or more documents from the data storage devices. The method comprises acquiring, via an interface, a selection of one or more tags of the one or more predefined hierarchies of tags. The method further comprises identifying one or more documents from the data storage devices in response to the selection, the identified one or more documents having tags that have a relationship with the selected tags, and providing data corresponding to the identified documents for displaying in the interface.

TECHNIQUES FOR IMAGE CONTENT EXTRACTION

Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image. In some embodiments, the machine-facilitated annotation may be used to generate a template for the template database.

TECHNIQUES FOR IMAGE CONTENT EXTRACTION

Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image. In some embodiments, the machine-facilitated annotation may be used to generate a template for the template database.

Metadata driven dataset management

A method for configuring the operation of the software of a data as a service (DAAS) system during run time is described. The configuring includes at least one of configuring ingestion of a vendor dataset to produce an ingested dataset and which analysis operations to perform on the vendor dataset to produce an analyzed dataset, and the configuring also includes at least one of how to search the vendor dataset based on a search query from a customer to allow the customer to locate a new record from the vendor dataset and how to match records in the vendor dataset with a match query from the customer to provide an updated record to the customer.

Metadata driven dataset management

A method for configuring the operation of the software of a data as a service (DAAS) system during run time is described. The configuring includes at least one of configuring ingestion of a vendor dataset to produce an ingested dataset and which analysis operations to perform on the vendor dataset to produce an analyzed dataset, and the configuring also includes at least one of how to search the vendor dataset based on a search query from a customer to allow the customer to locate a new record from the vendor dataset and how to match records in the vendor dataset with a match query from the customer to provide an updated record to the customer.

Managing data objects for graph-based data structures

Various embodiments provide methods, systems, apparatus, computer program products, and/or the like for managing, ingesting, monitoring, updating, and/or extracting/retrieving information/data associated with an electronic record (ER) stored in an ER data store and/or accessing information/data from the ER data store, wherein the ERs are generated, updated/modified, and/or accessed via a graph-based domain ontology.