G06F16/2465

Methods and apparatus to partition data

Methods and apparatus to partition data are disclosed. An example apparatus to partition panelist data includes an indicator matrix generator to determine an indicator matrix including panelist vectors for panelists based on panelist data associated with the panelists, a first one of the panelist vectors for a first one of the panelists to indicate whether the first one of the panelists: has a first characteristic, meets a first criterion based on the first characteristic and a second characteristic, has a third characteristic, and meets a second criterion based on the third characteristic and a fourth characteristic. The example apparatus further includes a matrix reducer to reduce the indicator matrix to a set of unique panelist vectors that represent partitions of the panelist data, the partitions of the panelist data to utilize less storage capacity than the panelist data.

Method and system for automated intent mining, classification and disposition

An agent automation system includes a memory configured to store a corpus of utterances and a semantic mining framework and a processor configured to execute instructions of the semantic mining framework to cause the agent automation system to perform actions, wherein the actions include: detecting intents within the corpus of utterances; producing intent vectors for the intents within the corpus; calculating distances between the intent vectors; generating meaning clusters of intent vectors based on the distances; detecting stable ranges of cluster radius values for the meaning clusters; and generating an intent/entity model from the meaning clusters and the stable ranges of cluster radius values, wherein the agent automation system is configured to use the intent/entity model to classify intents in received natural language requests.

Rapid predictive analysis of very large data sets using the distributed computational graph using configurable arrangement of processing components

A system for predictive analysis of very large data sets using a distributed computational graph that intelligently combines processing of a current data stream with the ability to retrieve relevant stored data in such a way that conclusions or actions may be drawn in a predictive manner. The system has a pipeline construction module that allows a user to construct a streaming analytic workflow using modular building blocks, each of which represents either an environmental orchestration stage or a data processing stage of a streaming analytic workflow, and has a pipeline processing module that receives a data stream and constructs a directed computational graph by processing the data stream through the streaming analytic workflow. The directed computational graph is used to analyze the data stream.

Unsupervised anomaly detection
11507563 · 2022-11-22 · ·

Described are techniques for anomaly detection including a method comprising sorting a univariate data set in an numeric order and generating a second univariate data set based on the sorted univariate data set, where respective elements in the second univariate data set correspond to respective differences between consecutive elements in the sorted univariate data set. The method further comprises sorting the second univariate data set in numeric order and generating a third univariate data set that includes index values corresponding to respective differences in the sorted second univariate data set that are above a threshold. The method further comprises modifying the third univariate data set and defining a set of clusters based on the modified third univariate data set. The method further comprises clustering the sorted univariate data set according to the set of clusters and characterizing a new data point as anomalous in response to the clustering.

Information Infrastructure Management Tools with Extractor, Secure Storage, Content Analysis and Classification and Method Therefor
20230058063 · 2023-02-23 ·

Data processing method organizing and processing data in a distributed computing system. By organizing select/secret content with [enterprise and external designated] categorical filters (content, contextual and taxonomic) to create further search terms for data mining both enterprise and external data sources (databases, data collections, data stores). The result being aggregated select/secret content. The aggregated select/secret content is stored in the corresponding select/secret content data store. The aggregated select/secret content is further processed for convergent or divergent characteristics. By identifying unknown data elements, another search gathers supplemental documents. Data element ranges are set by taxonomic filter and said contextual filters. Relevancy factors are identified by relationship between input and supplemental documents. Search is controlled by user selection, continuous search, iterative search (n cycles), search within m search terms, and search time.

Vectorization of structured documents with multi-modal data
11507886 · 2022-11-22 · ·

Methods, systems, and computer-readable storage media for receiving structured data including a set of columns and a set of rows, determining, for each column, a column width defining a number of characters, providing, for each row, a set of padded values, each padded value corresponding to a column and including a value and one or more padding characters, the value and the one or more padding values collectively having a length equal to a respective column width, defining a set of strings by, for each row, concatenating padded values in the set of padded values to provide a string, and training the ML model by providing, for each string in the set of strings, an embedding as an abstract representation of a record of a respective row and processing the embedding through an attention layer of the ML model.

Associating data from different nodes of a distributed ledger system

Systems and methods are described to associate data from different nodes of a distributed ledger system. The nodes can generate transaction notifications, log data, and/or metrics data. At least some of the data generated by the nodes can be obtained by a data intake and query system via a distributed ledger system monitor. The data from the distributed ledger system can be stored in the data intake and query system and correlated. Based on an association between at least some of the data of the first node and at least some of the data of the second node, the data intake and query system can determine at least a partial history of a transaction in the distributed ledger system, relationships between components of the distributed ledger system, and/or an architecture of the distributed ledger system.

COHORT BASED RESILIENCY MODELING

A method including: storing, by a computing device, obfuscated metadata from a plurality of interconnected computing environments into respective data puddles; identifying, by the computing device, a behavior of a first computing environment of the plurality of interconnected computing environments; determining, by the computing device, an expected future performance issue associated with a second computing environment of the plurality of interconnected computing environments based on the identified behavior of the first computing environment; identifying, by the computing device, a locus of the expected future performance issue associated with the second computing environment based on the identified behavior of the first computing environment; and outputting, by the computing device and to an operator of the second computing environment, an impact notification and remedial steps being taken to prevent the expected future performance issue.

Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs

A computer-implemented method is presented for automatically generating alerting rules. The method includes identifying, via offline analytics, abnormal patterns and normal patterns from history logs based on machine learning, statistical analysis and deep learning, the history logs stored in a history log database, automatically generating the alerting rules based on the identified abnormal and normal patterns, and transmitting the alerting rules to an alerting engine for evaluation. The method further includes receiving a plurality of online log messages from a plurality of computing devices connected to a network, augmenting the plurality of online log messages, and extracting information from the plurality of augmented online log messages to be provided to the alerting engine, the alerting engine configured to approve and enforce the alerting rules automatically generated by the offline analytics processing.

DEVICE AND METHOD FOR DISCOVERING CAUSAL PATTERNS

A method of identifying causal relationships includes receiving data comprising a set of values corresponding to one or more variables, and generating a list of candidate causal models of relationships between or within the variables. The list is ranked based on a likelihood of each candidate causal model, wherein the likelihood includes at least a correlation value. The method further includes receiving feedback identifying a candidate causal model and a change in rank of the candidate causal model, re-ranking the list based on the feedback, and displaying the re-ranked list. The method generates an intervention comprising a suggested modification corresponding to a variable of a selected causal model among the candidate causal models in the re-ranked list, receives additional data corresponding to the variable of the suggested modification and evaluates the additional data to determine whether the likelihood of the selected causal model has changed.