G06F16/383

METHOD AND SYSTEM FOR IMPLEMENTING MACHINE LEARNING ANALYSIS OF DOCUMENTS

Disclosed is an approach for performing auto-classification of documents. A machine learning framework is provided to analyze the document, where labels associated with certain documents can be propagated to other documents.

METHOD AND SYSTEM FOR IMPLEMENTING MACHINE LEARNING ANALYSIS OF DOCUMENTS

Disclosed is an approach for performing auto-classification of documents. A machine learning framework is provided to analyze the document, where labels associated with certain documents can be propagated to other documents.

AUTOMATIC DETECTION OF PERSONAL IDENTIFIABLE INFORMATION
20230214522 · 2023-07-06 · ·

Described herein are example implementations for the automatic detection and handling of personal identifiable information (PII) in electronic records. In some aspects, a system receives one or more computer readable logs of information for one or more computer services, with each log including a string of characters. The system performs one or more string search algorithm based operations on the entirety of the one or more strings of the one or more computer readable logs to identify a range of the one or more strings to be searched for PII that is less than the entirety of the one or more strings. The system also performs one or more regular expression algorithm based operations on the range of the one or more strings to identify one or more instances of PII. The system generates and outputs an indication of the one or more instances of the PII that are identified.

Generating plug-in application recipe extensions

Techniques for generating plug-in application recipe (PIAR) extensions are disclosed. A PIAR management application discovers a particular data type within one or more data values for a particular field of a plug-in application, where the particular data type is (a) different from a data type of the particular field as reported by the plug-in application and (b) narrower than the data type of the particular field while complying with the data type of the particular field. The PIAR management application identifies one or more mappings between (a) the particular data type and (b) one or more data types for fields accepted by actions of plug-in applications. The PIAR management application presents a user interface including one or more candidate PIAR extensions based on the mapping(s). Based on a user selection of a candidate PAIR extension, the PIAR management application executes a PIAR that includes the selected PIAR extension.

Generating plug-in application recipe extensions

Techniques for generating plug-in application recipe (PIAR) extensions are disclosed. A PIAR management application discovers a particular data type within one or more data values for a particular field of a plug-in application, where the particular data type is (a) different from a data type of the particular field as reported by the plug-in application and (b) narrower than the data type of the particular field while complying with the data type of the particular field. The PIAR management application identifies one or more mappings between (a) the particular data type and (b) one or more data types for fields accepted by actions of plug-in applications. The PIAR management application presents a user interface including one or more candidate PIAR extensions based on the mapping(s). Based on a user selection of a candidate PAIR extension, the PIAR management application executes a PIAR that includes the selected PIAR extension.

AUTOMATIC LABELING OF TEXT DATA

The technology described herein determines whether a candidate text is in a requested class by using a generative model that may not be trained on the requested class. The present technology may use of a model trained primarily in an unsupervised mode, without requiring a large number of manual user-input examples of a label class. The may produce a semantically rich positive example of label text from a candidate text and label. Likewise, the technology may produce from the candidate text and the label a semantically rich negative example of label text. The labeling service makes use of a generative model to produce a generative result, which estimates the likelihood that the label properly applies to the candidate text. In another aspect, the technology is directed toward a method for obtaining a semantically rich example that is similar to a candidate text.

AUTOMATIC LABELING OF TEXT DATA

The technology described herein determines whether a candidate text is in a requested class by using a generative model that may not be trained on the requested class. The present technology may use of a model trained primarily in an unsupervised mode, without requiring a large number of manual user-input examples of a label class. The may produce a semantically rich positive example of label text from a candidate text and label. Likewise, the technology may produce from the candidate text and the label a semantically rich negative example of label text. The labeling service makes use of a generative model to produce a generative result, which estimates the likelihood that the label properly applies to the candidate text. In another aspect, the technology is directed toward a method for obtaining a semantically rich example that is similar to a candidate text.

Evaluating text classification anomalies predicted by a text classification model

In response to running at least one testing phrase on a previously trained text classifier and identifying a separate predicted classification label based on a score calculated for each respective at least one testing phrase, a text classifier decomposes extracted features summed in the score into word-level scores for each word in the at least one testing phrase. The text classifier assigns a separate heatmap value to each of the word-level scores, each respective separate heatmap value reflecting a weight of each word-level score. The text classifier outputs the separate predicted classification label and each separate heatmap value reflecting the weight of each word-level score for defining a heatmap identifying the contribution of each word in the at least one testing phrase to the separate predicted classification label for facilitating client evaluation of text classification anomalies.

Evaluating text classification anomalies predicted by a text classification model

In response to running at least one testing phrase on a previously trained text classifier and identifying a separate predicted classification label based on a score calculated for each respective at least one testing phrase, a text classifier decomposes extracted features summed in the score into word-level scores for each word in the at least one testing phrase. The text classifier assigns a separate heatmap value to each of the word-level scores, each respective separate heatmap value reflecting a weight of each word-level score. The text classifier outputs the separate predicted classification label and each separate heatmap value reflecting the weight of each word-level score for defining a heatmap identifying the contribution of each word in the at least one testing phrase to the separate predicted classification label for facilitating client evaluation of text classification anomalies.

Systems and methods for creating enhanced documents for perfect automated parsing

The disclosed enhanced document creation and parsing systems deal with enhanced documents that allow for the presentation of document content in a preferred visual manner, while ensuring that the document content can be captured accurately by an automated parser with nothing being discarded or misrepresented. The enhanced document creation system may create an enhanced document by encoding document content in accordance with a defined schema, optionally encrypting the resulting structured data into an encrypted byte string, and embedding the encrypted byte string as non-visible metadata in a rendered document. The resulting enhanced document can be completely and accurately parsed by an enhanced document parsing system that is capable of extracting, decrypting and decoding the embedded document metadata.