IPIQ

G06F16/355

CLASSIFICATION DEVICE, CLASSIFICATION METHOD, AND CLASSIFICATION PROGRAM

20220365972 · 2022-11-17 ·

A classification device (10) includes: an obtainment section (141) that obtains operation logs describing operation content for a window on a terminal screen; a classification section (142) that classifies the operation logs into a plurality of groups, using a document classification method, based on the operation content of the operation logs, and assigns words characteristic of the respective groups as labels to these groups; and an information display section (143) that causes an output unit (15) to output the operation logs assigned the labels.

Natural language processing of unstructured data

11586813 · 2023-02-21 ·

International Business Machines Corporation

A computer system for processing unstructured data, the computer system comprising a computer processor, a computer memory operatively coupled to the computer processor and the computer memory having disposed within it computer program instructions that, when executed by the processor, cause the computer system to carry out the steps of receiving unstructured data input from a client device, analyzing the unstructured data for features that satisfy logical segment criteria by using natural language processing (NLP), and partitioning the unstructured data into logical segments based on satisfaction of the logical segment criteria.

Orchestrator for machine learning pipeline

11586986 · 2023-02-21 ·

Sap Se

Provided is a system and method for training and validating models in a machine learning pipeline for failure mode analytics. The machine learning pipeline may include an unsupervised training phase, a validation phase and a supervised training and scoring phase. In one example, the method may include receiving an identification of a machine learning model, executing a machine learning pipeline comprising a plurality of services which train the machine learning model via at least one of an unsupervised learning process and a supervised learning process, the machine learning pipeline being controlled by an orchestration module that triggers ordered execution of the services, and storing the trained machine learning model output from the machine learning pipeline in a database associated with the machine learning pipeline.

Training a Model in a Data-Scarce Environment Using Added Parameter Information

20220366133 · 2022-11-17 ·

Microsoft Technology Licensing, Llc

Peter Joseph POTASH

A training process produces a machine-learned model that, once trained, can be applied to process different types of data items. The training process accomplishes this result by combining data items in a training set with type-specific parameter information, to produce supplemented data items. The training process then trains a model based on the supplemented data items. Training involves adjusting model weights together with the type-specific parameter information. In an inference stage of processing, the technology combines a new data item with an appropriate type of trained parameter information, and then maps the resultant supplemented data item to an output data item. The technology is particularly effective in adapting an initial model to a new subject matter domain in those situations in which a robust set of data items that pertain to the subject matter domain and which have a desired type is lacking.

System and method for identifying cyberthreats from unstructured social media content

11586739 · 2023-02-21 ·

Proofpoint, Inc.

Daniel Clark Salo

A cyberthreat detection system queries a content database for unstructured content that contains a set of keywords, clusters the unstructured content into clusters based on topics, and determines a cybersecurity cluster utilizing a list of vetted cybersecurity phrases. The set of keywords represents a target of interest such as a newly discovered cyberthreat, an entity, a brand, or a combination thereof. The cybersecurity cluster thus determined is composed of unstructured content that has the set of keywords as well as some percentage of the vetted cybersecurity phrases. If the size of the cybersecurity cluster, as compared to the amount of unstructured content queried from the content database, meets or exceeds a predetermined threshold, the query is saved as a new classifier rule that can then be used by a cybersecurity classifier to automatically, dynamically and timely identify the target of interest in unclassified unstructured content.

DOCUMENT MANAGEMENT PLATFORM

20220365981 · 2022-11-17 ·

In some implementations, a device may receive a request for a plurality of documents that are associated with an individual or entity. The device may perform a search for the plurality of documents in one or more document repositories. The device may store, based on a determination that a first document, of the plurality of documents, is available in a document repository, of the one or more document repositories, information indicating that the first document is available in the document repository. The device may determine, based on a determination that a second document, of the plurality of documents, is not available in the one or more document repositories, a procedure that is to be used for obtaining the second document. The device may perform the procedure for obtaining the second document based on determining the procedure that is to be used for obtaining the second document.

Managing and measuring semantic coverage in knowledge discovery processes

11586826 · 2023-02-21 ·

CrowdSmart, Inc.

Thomas Kehler

Provided are processes of balancing between exploration and optimization with knowledge discovery processes applied to unstructured data with tight interrogation budgets. Natural language texts may be processed, such as into respective vectors, by a natural language processing model. An output vector of (or intermediate vector within) an example NLP model may include over 500 dimensions, and in many cases 700-800 dimensions. A process may manage and measure semantic coverage by defining geometric characteristics, such as size or a relative distance matrix, of a sematic space corresponding to an evaluation during which the natural language texts are obtained based on the vectors of the natural language texts. A system executing the process may generate a visualization of the semantic space, which may be reduced to or is a latent embedding space, by reducing the dimensionality of vectors while preserving their relative distances between the high and reduced dimensionality forms.

GENERATING PLUG-IN APPLICATION RECIPE EXTENSIONS

20230047578 · 2023-02-16 ·

Oracle International Corporation

Techniques for generating plug-in application recipe (PIAR) extensions are disclosed. A PIAR management application discovers a particular data type within one or more data values for a particular field of a plug-in application, where the particular data type is (a) different from a data type of the particular field as reported by the plug-in application and (b) narrower than the data type of the particular field while complying with the data type of the particular field. The PIAR management application identifies one or more mappings between (a) the particular data type and (b) one or more data types for fields accepted by actions of plug-in applications. The PIAR management application presents a user interface including one or more candidate PIAR extensions based on the mapping(s). Based on a user selection of a candidate PAIR extension, the PIAR management application executes a PIAR that includes the selected PIAR extension.

Event detection based on text streams

11501058 · 2022-11-15 ·

Microsoft Technology Licensing, Llc

Alexander James Wilson

A text stream source is accessed that includes a plurality of text content items. Unique word groupings are determined for the plurality of text content items. A burst detection algorithm is executed to determine word groupings that are currently bursting and that started within a specified time period. Based on the word groupings, an issue is determined based on identifying a set of texts forming at least one clique.

Artificial intelligence (AI) based data processing

11501186 · 2022-11-15 ·

Accenture Global Solutions Limited

An Artificial Intelligence (AI)-based data processing system employs a trained AI model for extracting features of products from various product classes and building a product ontology from the features. The product ontology is used to respond to user queries with product recommendations and customizations. Training data for the generation of the AI model for feature extraction is initially accessed and verified to determine of the training data meets a data density requirement. If the training data does not meet the data density requirement, data from one of a historic source or external sources is added to the training data. One of the plurality of AI models is selected for training based on the degree of overlap and the inter-class distance between the datasets of the various product classes within the training data.

Patent classifications

G06F16/355