Patent classifications
G06V30/413
AUTOMATED DOCUMENT PROCESSING FOR DETECTING, EXTRACTNG, AND ANALYZING TABLES AND TABULAR DATA
According to one embodiment, a computer-implemented method for detecting and classifying columns of tables and/or tabular data arrangements within image data includes: detecting one or more tables and/or one or more tabular data arrangements within the image data; extracting the one or more tables and/or the one or more tabular data arrangements from the processed image data; and classifying either: a plurality of columns of the one or more extracted tables; a plurality of columns of the one or more extracted tabular data arrangements; or both the columns of the one or more extracted tables and the columns of the one or more extracted tabular data arrangements. Corresponding systems and computer program products are also disclosed.
Method and apparatus for detecting anomalies in mission critical environments using word representation learning
A method and system for detecting anomalies in mission-critical environments using word representation learning are provided. The method includes parsing at least one received data set into a text structure; isolating a protocol language of the at least one received data set, wherein the protocol language is a standardized pattern for communication over at least one communication protocol; generating at least one document from the contents of the received at least one data set, wherein the at least one document includes at least one parsed text structure referencing a unique identifier; detecting insights in the at least one generated document, wherein insights are detected in at least one representation having at least one dimension, wherein the representation is mapped to at least one learned hyperspace; extracting rules from the detected insights; and detecting anomalies by applying the extracted rules on patterns for communication over at least one communication protocol.
Method and apparatus for detecting anomalies in mission critical environments using word representation learning
A method and system for detecting anomalies in mission-critical environments using word representation learning are provided. The method includes parsing at least one received data set into a text structure; isolating a protocol language of the at least one received data set, wherein the protocol language is a standardized pattern for communication over at least one communication protocol; generating at least one document from the contents of the received at least one data set, wherein the at least one document includes at least one parsed text structure referencing a unique identifier; detecting insights in the at least one generated document, wherein insights are detected in at least one representation having at least one dimension, wherein the representation is mapped to at least one learned hyperspace; extracting rules from the detected insights; and detecting anomalies by applying the extracted rules on patterns for communication over at least one communication protocol.
Systems and methods for recommendation generation
One or more computing devices, systems, and/or methods for generating and providing recommendations of products are provided. For example, content is extracted from a message sent to a user. The content is evaluated to identify a product identifier corresponding to a product title of a product. If the product identifier is a truncated version of the product title, then a database of product titles and frequencies of occurrence of the product titles is used to complete the product title. A model is used to infer a product category for the product title. Matching scores are assigned to products within a product category based upon weighted attributes. A recommendation is provided to the user for a product having a matching score greater than a matching threshold.
Systems and methods for recommendation generation
One or more computing devices, systems, and/or methods for generating and providing recommendations of products are provided. For example, content is extracted from a message sent to a user. The content is evaluated to identify a product identifier corresponding to a product title of a product. If the product identifier is a truncated version of the product title, then a database of product titles and frequencies of occurrence of the product titles is used to complete the product title. A model is used to infer a product category for the product title. Matching scores are assigned to products within a product category based upon weighted attributes. A recommendation is provided to the user for a product having a matching score greater than a matching threshold.
Systems, methods and computer readable media for identifying content to represent web pages and creating a representative image from the content
Provided herein are systems, methods and computer readable media for identifying content to represent web pages and creating a representative image from the content. An example method may include retrieving a web document using a uniform resource locator (URL) contained in a dequeued work item, determining, from the web document, candidate images for creation of the representative image including extracting image references, wherein the image references are extracted by identifying image tags with source attributes, values of which are URLs locating images, filtering the URLs using a blacklist of expressions designed to match the URLs of images comprising one or more predefined undesirable characteristics, and retrieving the images which do not match any of the expressions using an HTTP client, and creating the representative image, comprising at least modifying a chosen image selected from among the candidate images.
Systems, methods and computer readable media for identifying content to represent web pages and creating a representative image from the content
Provided herein are systems, methods and computer readable media for identifying content to represent web pages and creating a representative image from the content. An example method may include retrieving a web document using a uniform resource locator (URL) contained in a dequeued work item, determining, from the web document, candidate images for creation of the representative image including extracting image references, wherein the image references are extracted by identifying image tags with source attributes, values of which are URLs locating images, filtering the URLs using a blacklist of expressions designed to match the URLs of images comprising one or more predefined undesirable characteristics, and retrieving the images which do not match any of the expressions using an HTTP client, and creating the representative image, comprising at least modifying a chosen image selected from among the candidate images.
Fast identification of text intensive pages from photographs
Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.
Fast identification of text intensive pages from photographs
Methods and systems for training a neural network to distinguish between text documents and image documents are described. A corpus of text and image documents is obtained. A page of a text document is scanned by shifting a text window to a plurality of locations. In accordance with a determination that the text in the window at a respective location meets text line criteria, the text in the window is stored as a respective text snippet. A plurality of image windows are superimposed over at least one page of an image document. In accordance with a determination that the content of a respective image window meets image criteria, content of the image window is stored as a respective image snippet. The respective text snippet and the respective image snippet are provided to a classifier.
DERIVING GLOBAL INTENT FROM A COMPOSITE DOCUMENT TO FACILITATE EDITING OF THE COMPOSITE DOCUMENT
An illustrator system accesses a multi-element document, the multi-element document including a plurality of elements. The illustrator system determines, for each of the plurality of elements, an element-specific topic distribution comprising a ranked list of topics. The illustrator system creates a first aggregated topic distribution from the determined element-specific topic distributions. The illustrator system determines a global intent for the multi-element document, the global intent including one or more terms from the first aggregated topic distribution. The illustrator system queries a database using the global intent to retrieve a substitute element. The illustrator system generates a replacement multi-element document that includes a substitute element in place of an element in the multi-element document The at least one substitute element is different from the element in the displayed multi-element document.