G06F16/90

SYSTEM AND METHOD FOR PROVIDING AN AGGREGATION TOOL

Embodiments of the present invention assist in the development, management, and deployment of aggregated data attributes for multiple data sources. One embodiment provides a development interface that allows for elements of attributes, including filters, to be moved into a coding area in which an attribute or an attribute element is being edited. In another embodiment, the user interface presents data fields to assist in the development of filters for multiple data sources with divergent formats. The application further provides a validation interface through which users can validate attributes and trace the results returned by various elements referenced by the attributes under validation. Another embodiment provides a system for managing attributes and deploying them to various systems by creating a deployment file that is used by an attribute calculation system. In one embodiment, the attribute calculation system is a scalable system that dynamically calculates attributes for multiple data sources.

Data deduplication in data platforms

One embodiment of the invention provides a method for data deduplication storage management in a data platform including a plurality of data stores. The method comprises, for each data store of the plurality of data stores, determining a corresponding multi-level signature mapping data content of the data store into an ordered logical form comprising a plurality of data abstraction levels, determining a data similarity between the data store and each other data store of the plurality of data stores based on the multi-level signature corresponding to the data store and another multi-level signature corresponding to the other data store, and determining data usage of the data content of the data store. The method further comprises improving storage in the data platform by detecting duplicate data across the plurality of data stores based on each data similarity determined and each data usage determined.

User switching for multi-user devices

User switching for multi-user devices is provided. Each of multiple users of the multi-user device can have multiple external accounts that are each associated with one or more applications installed at the device. The user switching is provided such that each application installed at the device can be operated using user-specific data for the correct account of the correct user. The user switching is provided such that an app developer of an application for the multi-user device does not have to manage multiple data stores for multiple users.

User switching for multi-user devices

User switching for multi-user devices is provided. Each of multiple users of the multi-user device can have multiple external accounts that are each associated with one or more applications installed at the device. The user switching is provided such that each application installed at the device can be operated using user-specific data for the correct account of the correct user. The user switching is provided such that an app developer of an application for the multi-user device does not have to manage multiple data stores for multiple users.

Mode-specific search query processing

Provided are techniques for mode-specific search query processing. A current search query is received from a user, wherein the user has a user profile. In response to determining that a query mode for the current search query is a guided mode, a query context of the current search query is determined. A classification for the current search query is determined. One or more search influencers are identified using the classification, where each of the one or more search influencers has a corresponding user profile. The current search query is rewritten based on the query context, a private portion of the user profile of the user, and a public portion of each corresponding user profile of each of the one or more search influencers. The rewritten search query is executed to generate search results, and the search results are returned.

Smart rollover

A system and method, including determining, by a processor, a data type for each column of a database table; determining, by the processor and based on the determined data type for each column of the database table, an indication of a size of the database table; calculating, by the processor and based on the determined indication of the size of the database table, a start nbit size for a nbit compression process to be used on the database table; specifying, by the processor, the calculated start nbit size for the nbit compression process; and compressing the database table by executing the nbit data compression process using the specified start nbit size.

Recommending machine learning models and source codes for input datasets

Asset recommendation for a particular input dataset is provided. Candidate data analysis assets having a corresponding relatedness score associated with the particular input dataset greater than a defined relatedness score threshold value are selected. Those candidate data analysis assets having a corresponding relatedness score greater than the defined relatedness score threshold value are ranked by score. Those candidate data analysis assets having a corresponding relatedness score greater than the defined relatedness score threshold value are listed by rank from highest to lowest. A justification for each candidate data analysis asset is inserted in the ranked list of candidate data analysis assets. The ranked list of candidate data analysis assets along with each respective justification is outputted on a display device.

Feature engineering in neural networks optimization

A transitive closure data structure is constructed for a pair of features represented in a vector space corresponding to an input dataset. The data structure includes a set of entries corresponding to a set of all possible paths between a first feature in the pair and a second feature in the pair in a graph of the vector space. The data structure is reduced by removing a subset of the set of entries such that only a single entry corresponding to a single path remains in the transitive closure data structure. A feature cross is formed from a cluster of features remaining in a reduced ontology graph resulting from the reducing the transitive closure data structure. A layer is configured in a neural network to represent the feature cross, which causes the neural network to produce a prediction that is within a defined accuracy relative to the dataset.

Detecting, diagnosing, and directing solutions for source type mislabeling of machine data, including machine data that may contain PII, using machine learning

A computerized method of diagnosing a mislabeling of a source type of a received event. The method comprising operations of receiving an event by a source type analysis logic with a data index and query system, wherein the event includes a portion of raw machine data and is associated with a specific point in time, obtaining an original source type assigned to the event and one or more predicted source types. The one or more predicted source types are determined by analysis of a data representation of the event in view of training data and the training data includes a plurality of data representations corresponding to known source types. Additionally, the computerized method also includes an operation of, determining whether the event has been mislabeled and in response to determining the event has been mislabeled, diagnosing a source of the mislabeling.

Using a graph representation of join history to distribute database data

Using a graph representation of join history may be performed to distribute database data. Join history may be collected, captured, or tracked which describes the history of join operations between columns of different tables in a database. A graph representation of the join history may be generated. The graph representation may indicate a likelihood of different joins that may be performed between the columns of the tables of a database. An evaluation of the join history may be performed to identify columns for tables in the database to distribute the data of the tables amongst multiple storage locations according to the identified columns.