G06F16/90344

Accuracy metric for regular expression
11520831 · 2022-12-06 · ·

A regular expression that is able to be used to identify an item as belonging to a specific group among a plurality of different groups is determined. The regular expression is tested against a sampling of items known to belong to the specific group to determine a true positive metric. The regular expression is tested against a sampling of items known to belong to other groups among the plurality of different groups outside the specific group to determine a false positive metric. An accuracy metric of the determined regular expression is calculated based at least in part on the true positive metric and the false positive metric. The accuracy metric is provided for use in evaluating the regular expression.

Method and system for technical language processing
11514093 · 2022-11-29 · ·

Exemplary embodiments disclose a method, a computer program product, and a computer system for searching technical documents. Exemplary embodiments may include the use of lexicons to generate customized hash functions; utilizing customized hash functions to generate hashcodes of technical text in document repositories; building a database of hashcodes from the repository; utilizing the customized hash functions for generating a hashcode of a search query; and correlating the search hashcode with the hashcode database to produce search results. A computer-implemented method to search technical text includes constructing one or more base hash functions for generating hashcodes that represent semantic content of technical text and accessing one or more lexicons describing technical terminology. The method includes utilizing the one or more lexicons to create one or more customized hash functions from the one or more base hash functions to generate hashcodes that more accurately represent a semantic content of the technical terminology in the one or more lexicons compared to the base hash functions.

System and method for serving subject access requests

Systems and methods for serving subject access requests (SARs) are disclosed. A network connection is established with a user. An SAR, including at least one piece of personal data corresponding to an entity associated with said user, is received from the user via the network connection. Text data is extracted from a plurality of data objects, the data objects including personal data associated with the user. The text data is then processed to identify instances of names and instances of personal data within the text data. Associations are generated between identified names and identified personal data. A subset of the identified personal data that corresponds to the entity is identified based on the associations. A response to the SAR is provided, based at least in part on the identified personal data corresponding to the entity.

SOFTWARE APPLICATION CONTAINER HOSTING

Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: examining target application container configuration data to identify one or more target container base image referenced in the target application container configuration: subjecting script data associated to the one or more target container base image to text based processing for evaluation of security risk associated to the one or more container base image, the script data obtained from at least one candidate hosting computing environment; and selecting a hosting computing environment from the at least one computing environment for hosting the target application container, the selecting in dependence on the text based processing.

System and method to analyse and predict impact of textual data

System and method to analyze and predict impact of textual data are provided. The system also includes a processing subsystem configured to select textual data from a plurality of data sets stored in a memory, to extract data from external sources using crawling, to identify at least one context of the textual data using one or more identification methods. The processing subsystem includes an NLP module configured to match the textual data with NLP frameworks using a mapping method based on a plurality of parameters, to apply feature engineering and transformation on the textual data to extract a plurality of features from the plurality of data sets and to analyze matched textual data of the textual using at least one analysis method. The processing subsystem also includes a predictive module configured to predict one or more future values of the analyzed textual data using the one or more predictive methods.

Automated metadata asset creation using machine learning models
11593435 · 2023-02-28 · ·

Systems and methods are described that employ machine learning models to optimize database management. Machine learning models may be utilized to decide whether a new database record needs to be created (e.g., to avoid duplicates) and to decide what record to create. For example, candidate database records potentially matching a received database record may be identified in a local database, and a respective probability of each candidate database record matching the received record is output by a match machine learning model. A list of statistical scores is generated based on the respective probabilities and is input to an in-database machine learning model to calculate the probability that the received database record already exists in the local database.

MACHINE LEARNING TECHNIQUES FOR GENERATING STRING-BASED DATABASE MAPPING PREDICTION

Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive mapping operations with respect to a ground-truth database table. Certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive mapping operations utilizing a hierarchical string matching machine learning framework using at least one or more of an exact match model, a probabilistic match model, a disjoint match model, and an embedding-based match model.

Fast and accurate geomapping

A system and method are provided for discovering k-nearest-neighbors to a given point within a certain distance d. The method includes constructing an index of geometries using geohashes of geometries as an indexing key to obtain an indexed set of geometries, and calculating a geohash representation of the given point with a resolution equal to a magnitude value of d. The method includes searching for a closest-prefix geometry from the indexed set using the geohash representation of the given point, and identifying geometries from the indexed set having a same prefix as the closest-prefix geometry. The method further includes calculating distances between the given point and the geometries identified from the indexed set having the same prefix as the closest-prefix geometry, and determining k geometries with respective shortest distances less than d from the geometries identified from the indexed set having the same prefix as the closest-prefix geometry.

Interpreting meaning of content

A method for execution by a computing device includes obtaining a phrase that includes string of words and generating a valid sequence of words utilizing the phrase. The method further includes identifying a set of identigens for each word of the valid sequence of words to produce sets of identigens. The method further includes identifying, for each identigen of the sets of identigens, a word type associated with phrase structure grammar rules to produce sets of identigen-type associations. The method further includes interpreting, utilizing the phrase structure grammar rules, the sets of identigen-type associations to produce an entigen group. The entigen group represents a most likely interpretation of the phrase.

Method and system for enhancing a VMS by intelligently employing access control information therein

Methods, systems, and techniques for enhancing a VMS are disclosed. One of the disclosed methods includes populating a user interface page with one or more images, each showing a single person matched to a known identity, and each taken contemporaneously with one or more respective access control event occurrences identifiable to the single person. User selection input is receivable to mark at least one of the images as a reference image for an appearance search to find additional images of the single person captured by video cameras within a surveillance system.