G06F40/149

Reducing nonvisual noise byte codes in machine readable format documents

A method may include obtaining a first byte stream from first document code and a second byte stream from second document code. The first document code has a document type and the second document code has the document type. The method may further include identifying, in the first byte stream, nonvisual noise corresponding to a custom byte code defined in a custom character encoding set. The nonvisual noise is invisible when rendering the first document code. The method may further include replacing, in the first byte stream, the custom byte code with at least one standard byte code defined in a standard character encoding set to obtain modified document code. The second document code uses the standard character encoding set. The method may further include comparing the modified document code with the second document code by comparing the first byte stream with the second byte stream.

Uniform parsing of configuration files for multiple product types

A platform is provided for uniform parsing of configuration files for multiple product types. One method comprises obtaining, by a parser of a given product type, a given request from a message queue based on a metadata message of an incoming configuration file from a remote product of a given product type, wherein the message queue stores metadata messages for a plurality of product types; extracting information from the incoming configuration file based on product-specific business logic obtained from a table store comprising tables for the plurality of product types, wherein the business logic provides a mapping between information extracted from the incoming configuration file and destination database tables; and storing the contents in the destination database tables of a product-specific predefined database schema.

Uniform parsing of configuration files for multiple product types

A platform is provided for uniform parsing of configuration files for multiple product types. One method comprises obtaining, by a parser of a given product type, a given request from a message queue based on a metadata message of an incoming configuration file from a remote product of a given product type, wherein the message queue stores metadata messages for a plurality of product types; extracting information from the incoming configuration file based on product-specific business logic obtained from a table store comprising tables for the plurality of product types, wherein the business logic provides a mapping between information extracted from the incoming configuration file and destination database tables; and storing the contents in the destination database tables of a product-specific predefined database schema.

PARALLEL PROCESSING OF HIERARCHICAL TEXT
20230367964 · 2023-11-16 ·

Apparatuses, systems, and techniques to parse textual data using parallel computing devices. In at least one embodiment, text is parsed by a plurality of parallel processing units using a finite state machine and logical stack to convert the text to a tree data structure. Data is extracted from the tree by the plurality of parallel processors and stored.

PARALLEL PROCESSING OF HIERARCHICAL TEXT
20230367964 · 2023-11-16 ·

Apparatuses, systems, and techniques to parse textual data using parallel computing devices. In at least one embodiment, text is parsed by a plurality of parallel processing units using a finite state machine and logical stack to convert the text to a tree data structure. Data is extracted from the tree by the plurality of parallel processors and stored.

PROVIDING A BINARY DATA FILE TO A CLIENT APPLICATION USING A DOCUMENT MODEL
20220245588 · 2022-08-04 ·

A method for providing a binary data file to at least a first client application and a second client application includes receiving a first request to access the binary data file from the first client application, and identifying a first document model associated with the binary data file, wherein the first document model reflects edits made to the binary data file by the first client application and the second client application. The method further includes transmitting data corresponding to the first document model to the first client application, receiving, from the second client application, a second request indicating a modification to the binary data file, revising the first document model associated with the binary data file to include the modification to the binary data file, and transmitting data corresponding to the revised first document model to the second client application.

PROVIDING A BINARY DATA FILE TO A CLIENT APPLICATION USING A DOCUMENT MODEL
20220245588 · 2022-08-04 ·

A method for providing a binary data file to at least a first client application and a second client application includes receiving a first request to access the binary data file from the first client application, and identifying a first document model associated with the binary data file, wherein the first document model reflects edits made to the binary data file by the first client application and the second client application. The method further includes transmitting data corresponding to the first document model to the first client application, receiving, from the second client application, a second request indicating a modification to the binary data file, revising the first document model associated with the binary data file to include the modification to the binary data file, and transmitting data corresponding to the revised first document model to the second client application.

System and method for obtaining documents from a composite file
11410445 · 2022-08-09 · ·

A system for obtaining documents from a composite file comprising a stream of multiple pages is provided. The system may comprise one or more processors configured to receive the composite file comprising the multiple pages and split the composite file to obtain individual pages of the composite file, wherein image of each of the individual pages and image vector for each of the individual pages from the image of the respective page may be obtained. The processor may further obtain text present in each of the individual pages and text vector for each of the individual pages from the text of the respective page. The processor may further determine continuity pattern between pages that are consecutive based on the image vector and the text vector of the consecutive pages and may categorize the consecutive pages as belonging to the same document in case the determined continuity pattern between the consecutive pages indicate that the consecutive pages belong to the same document.

System and method for spatial encoding and feature generators for enhancing information extraction
11837002 · 2023-12-05 · ·

A system and method for extracting data from a piece of content using spatial information about the piece of content. The system and method may use a conditional random fields process or a bidirectional long short term memory and conditional random fields process to extract structured data using the spatial information.

System and method for spatial encoding and feature generators for enhancing information extraction
11837002 · 2023-12-05 · ·

A system and method for extracting data from a piece of content using spatial information about the piece of content. The system and method may use a conditional random fields process or a bidirectional long short term memory and conditional random fields process to extract structured data using the spatial information.