Patent classifications
G06F16/2465
DATA MODEL FOR MINING
This disclosure relates to managing data by an agent located within a mining operation. The data is stored as voxel data on a voxel net server. The server processes user input from a user controlling the agent within the mining operation and receives from the agent a request for voxel data associated with one or more voxels. The one or more voxels are a subset of voxels stored on the voxel net server and each of the one or more voxels is identified based on connections with voxels of previous requests. The server then queries a database representing the voxel net for the one or more voxels to retrieve associated voxel data based on the connections and returns the voxel data to the agent. Finally, the voxel data is displayed on a user device to the user.
System and method of selecting events or locations based on content
Systems and methods of returning location and/or event results using information mined from non-textual information are provided. Non-textual information is captured using a hardware component of a user device. Text-based social media content input on the user device is then retrieved. A location of the user device is determined using a global positioning system module in the user device. The non-textual information is converted to a machine-analyzable format, and the converted non-textual information is compared to a database of converted non-textual information samples to analyze and classify the converted non-textual information. The classification is sent to a server for storage in a database in a manner that ties the classification to the geographical location of the user device.
Distributed sequential pattern mining (SPM) using static task distribution strategy
Seed patterns are derived from a sequence database. Execution costs for types of seed patterns are computed. Each seed pattern is iteratively distributed to distributed nodes along with that seed pattern's assigned execution cost. The distributed nodes processing in parallel to mine the sequence database for super patterns found in the sequence database. When a distributed node exhausts its execution budget, any remaining mining needed for the seed pattern being mined is reallocated to another distributed node having remaining execution budget.
Systems and methods for accelerating exploratory statistical analysis
Embodiments of the invention utilize a “data canopy” that breaks statistical measures down to basic primitives for various data portions and stores the basic aggregates in a library within an in-memory data structure. When a queried statistical measure involves a basic aggregate stored in the library over a data portion that at least partially overlaps the data portion associated with the basic aggregate, the basic aggregate may be reused in the statistical computation of the queried measure.
Dynamically normalizing intervals in a table
Dynamically normalizing intervals in a table including receiving, from a client computing system, a request to normalize intervals for a data set on a cloud-based data warehouse, wherein the request comprises a reference to the data set and a data range; generating, on the cloud-based data warehouse, an interval table using the data range; joining, into a joined table on the cloud-based data warehouse, the interval table and the data set; receiving the joined table from the cloud-based data warehouse; and presenting, via a graphical user interface on the client computing system, the joined table as a worksheet.
SYSTEM AND METHODS FOR CREDIT UNDERWRITING AND ONGOING MONITORING USING BEHAVIORAL PARAMETERS
Embodiments of the present disclosure may include a method for credit underwriting, the method including receiving a dataset of user details. Embodiments may also include creating a convolutional neural network (CNN) with the dataset of user details. In some embodiments, the convolutional neural network organizes at least a portion of the dataset of user details into a layered and weighted dataset. Embodiments may also include creating an enriched layered and weighted dataset. Embodiments may also include executing a back-propagation operation to remove at least a portion of the layered and weighted data from the layered and weighted dataset. Embodiments may also include receiving a plurality of enriched layered and weighted datasets.
DATA-SHARDING FOR EFFICIENT RECORD SEARCH
Data-sharding systems and/or methods for cost- and time-efficient record search are described. Data-sharding embodiments utilize a name-sharding dimension, optionally in combination with one or more additional dimensions such as record type and year, to reduce latency and reduce search-associated costs. The data-sharding systems and methods embodiments utilize an optimization algorithm to determine a distribution of records related to names. The optimization algorithm may use a three-character prefix for surnames in records to distribute shards across documents, with specific shards relating to no-name and multi-name records allocated.
Visual data computing platform using a progressive computation engine
The subject matter herein provides a method, apparatus and computer program product that combines, in one intuitive interface, visualization user interfaces (UIs) as used for descriptive analytics, with workflow UIs as used for predictive analytics. These interfaces provide a visual workspace front-end. The workspace is coupled to a back-end that comprises a data processing engine that combines progressive computation, approximate query processing, and sampling, together with a focus on supporting user-defined operations, to drive the front-end efficiently and in real-time. The processing engine achieves rapid responsiveness through progressive sampling, quickly returning an initial answer, typically on a random sample of data, before continuing to refine that answer in the background. In this manner, any operation carried out in the platform immediately provides a visual response, regardless of the underlying complexity of the operation or data size.
Enabling advanced analytics with large data sets
The present disclosure describes methods, systems, and computer program products for enabling advanced analytics with large datasets. One computer-implemented method includes receiving, by operation of a computer system, a dataset of multiple data records, each of the plurality of data records comprising one or more features and a target variable; selecting key features among the one or more features based at least on relevance measures of the one or more features with respect to the target variable; dividing the dataset into multiple subsets; for each of the multiple subsets, identifying a number of clusters and respective centroids of the number of clusters based on the key features; identifying a number of final centroids based on the respective centroids of the number of clusters for the each of the number of subsets, the number of final centroids being respective centroids of a number of final clusters; and for each data record in the multiple subsets, assigning the data record to one of the number of final clusters based on distances between the data record and the number of final centroids.
Systems and methods for harvesting data associated with fraudulent content in a networked environment
Exemplary embodiments of the present disclosure relate to systems, methods, and non-transitory computer-readable media for harvesting, parsing, and analyzing item identifiers in networked content to identify fraudulent content.