G06F16/355

Automated assignment of data set value via semantic matching

An apparatus comprises a processing platform implementing a data set discovery engine and a data set valuation engine. The data set discovery engine is configured to generate data set similarity measures each relating a corresponding one of a plurality of data sets to one or more other ones of the plurality of data sets. The data set valuation engine is coupled to the data set discovery engine and configured to generate valuation measures for respective ones of at least a subset of the plurality of data sets based at least in part on respective ones of the data set similarity measures generated by the data set discovery engine. For example, the data set valuation engine may generate the valuation measure for a given data set as a function of valuation measures previously generated for respective other data sets determined to exhibit at least a threshold similarity to the given data set.

System and method for peer group detection, visualization and analysis in identity management artificial intelligence systems using cluster based analysis of network identity graphs

Systems and methods for graph based artificial intelligence systems for identity management systems are disclosed. Embodiments of the identity management systems disclosed herein may utilize a network graph approach to peer grouping of identities of distributed networked enterprise computing environment. Specifically, in certain embodiments, data on the identities and the respective entitlements assigned to each identity as utilized in an enterprise computer environment may be obtained by an identity management system. A network identity graph may be constructed using the identity and entitlement data. The identity graph can then be clustered into peer groups of identities. The peer groups of identities may be used by the identity management system and users thereof in risk assessment or other identity management tasks.

Generating plug-in application recipe extensions

Techniques for generating plug-in application recipe (PIAR) extensions are disclosed. A PIAR management application discovers a particular data type within one or more data values for a particular field of a plug-in application, where the particular data type is (a) different from a data type of the particular field as reported by the plug-in application and (b) narrower than the data type of the particular field while complying with the data type of the particular field. The PIAR management application identifies one or more mappings between (a) the particular data type and (b) one or more data types for fields accepted by actions of plug-in applications. The PIAR management application presents a user interface including one or more candidate PIAR extensions based on the mapping(s). Based on a user selection of a candidate PAIR extension, the PIAR management application executes a PIAR that includes the selected PIAR extension.

Permutation-based clustering of computer-generated data entries
11693851 · 2023-07-04 · ·

A computer-generated data entry is received. The computer-generated data entry is segmented into a set of tokens. A plurality of different token permutation groupings are determined. Each of the different token permutation groupings includes a different subset of tokens from the set of tokens of the computer-generated data entry. For the computer-generated data entry, a corresponding token permutation grouping identifier is determined for each grouping of the plurality of different token permutation groupings. It is determined whether the computer-generated data entry belongs to any data entry cluster among a plurality of previously identified data entry clusters based on a search performed using the token permutation grouping identifiers of the computer-generated data entry.

Cloud environment analytics using snapshotting
11695785 · 2023-07-04 · ·

The technology disclosed relates to streamlined analysis of security posture of a cloud environment. In particular, the disclosed technology relates to a system that analyzes data posture in a cloud environment database using a snapshot of the database. A computer-implemented method includes receiving a request to access a database in the cloud environment, wherein the database includes a first authentication requirement. The method includes identifying a snapshot of the database, wherein the snapshot includes a second authentication requirement that is different than the first authentication requirement. The method includes accessing the snapshot using the second authentication requirement, generating a representation of the database using the snapshot, and generating a data posture analysis result indicative of a data posture of the database based on scanning the representation of the database.

Multilayered Generation and Processing of Computer Instructions

Systems, devices, computer-implemented methods, and tangible non-transitory computer readable media for performing multilayered generation and processing of computer instructions are provided. For example, a computing device may receive a request with instructions in a first computer language, parse the instructions in the first computer language, analyze the instructions in the first computer language in view of information describing structure of a first application, generate instructions in a second computer language different from the first computer language where the instructions in the second computer language are generated based on the instructions in the first computer language and the information describing structure of the first application, obtain a result from a second application where the result comprises information based on the instructions in the second computing language, and provide the result in response to the request comprising the instructions in the first computer language.

METHOD OF GRAPH MODELING ELECTRONIC DOCUMENTS WITH AUTHOR VERIFICATION

A method for generating a graphical model of a plurality of electronic documents establishes connections between individual electronic documents with common authorship even if the spelling of the name of the author varies amongst the documents, for instance, due to the use of abbreviations, pseudonyms, misspellings, and the like. The graphical model is generated by ingesting data from the electronic documents and constructing a base graphical model using the processed data. Thereafter, as part of a disambiguation step, similar authors amongst the plurality of electronic documents are identified and clustered to yield an author similarity graph, which is preferably refined over time. A degree of belief, or similarity inference, is then calculated for documents determined to have common authorship and, in turn, incorporated into the base graphical model. As a result, an inference of the accuracy of linked information in the graphical model can be established.

TEXT STRING COMPARISON FOR DUPLICATE OR NEAR-DUPLICATE TEXT DOCUMENTS IDENTIFIED USING AUTOMATED NEAR-DUPLICATE DETECTION FOR TEXT DOCUMENTS
20250231993 · 2025-07-17 · ·

Techniques described herein provide for text string comparison for documents identified using automated near-duplicate detection. In one example, a system can receive a pair of documents. The system can extract text strings from the documents. The system can normalize the extracted text strings using a predefined normalization scheme. The system can identify boilerplate text segments in the normalized text strings. The system can remove the boilerplate text segments from the normalized text strings to generate filtered text strings. The system can divide the filtered text strings by identifying section indicators. The system can, for each section, generate groupings of text strings and determine a similarity score between each pair of corresponding groupings to identify matching groupings of text strings. The system can generate an output for display showing the visual indications of the matched groupings of text strings.

EXPEDITING AUTOMATED NEAR-DUPLICATE DETECTION FOR NEW TEXT DOCUMENTS
20250231992 · 2025-07-17 · ·

Techniques described herein provide for automated near-duplicate detection for new text documents given text documents that were previously processed using automated near-duplicate detection for text documents. In one example, a system can receive new documents and documents that were previously processed using a predefined processing technique for automated near-duplicate detection. The system can process the new documents and cluster the new documents into multiple predefined clusters previously identified using the predefined processing technique. For each predefined cluster including at least one new document, the system can generate document groups by determining similarity scores using the predefined processing technique as applied to the documents in the predefined clusters. The system can identify a representative document for each document group and generate an output data structure including the document groups and the representative document for each group.

CRF-based span prediction for fine machine learning comprehension

A method for determining, from a document, an answer to a query using a query answering system, comprising: (i) encoding, using an encoder, one or more documents; (ii) encoding a received query; (iii) generating, using an attention mechanism, a query-aware document representation comprising alignment between one or more words in one of the plurality of documents and one or more words in the query; (iv) generating, using a hierarchical self-attention mechanism, a word-to-sentence alignment of the query-aware document representation; (v) labeling, using a conditional random field classifier, each of a plurality of words in the word-to-sentence alignment with one of a one of a plurality of different sequence identifiers, resulting in possible labeled answering spans; and (vi) generating, from the one or more possible labeled answering spans, a response to the query.