Patent classifications
G06F16/00
Data storage method and method for executing an application with reduced access time to the stored data
The invention concerns a storage method for storing, on data servers (3, 4), data file (5, 61 to 64) slices (51 to 58) from the execution of a plurality of processes (65 to 68) of one or more applications (83, 85), comprising: distributing the stored data file (5, 61 to 64) slices (51 to 58) over different data servers (3, 4), characterized in that: this distribution is carried out in such a way that the data file (5, 61 to 64) slices (51 to 58) likely to be subsequently accessed simultaneously by different application (83, 85) processes (65 to 68) are stored on different data servers (3, 4) so as to reduce the subsequent access, to each of all or part of these data servers (3, 4) by too many application (83, 85) processes (65 to 68) simultaneously, and in that: the determination of the data file (5, 61 to 64) slices (51 to 58) likely to be subsequently accessed simultaneously by different application (83, 85) processes (65 to 68) has been carried out, during a prior phase of executing these application (83, 85) processes (65 to 68), by observing the behavior of these application (83, 85) processes (65 to 68) in order to access these stored data file (5, 61 to 64) slices (51 to 58) over time.
System for implementing intelligent data analysis
Systems, computer program products, and methods are described herein for implementing intelligent data analysis. The present invention is configured to receive, from a computing device of a user, a first data file, wherein the first data file is associated with a base version of a data source; receive, from the computing device of the user, a second data file, wherein the second data file is associated with an updated version of the data source; initiate an intelligent data analysis engine on the first data file and the second data file; analyze the second data file to determine one or more discrepancies in relation to the first data file; determine one or more discrepancy types associated with the one or more discrepancies; retrieve, from an action datastore, one or more offset actions to rectify the one or more discrepancies; and automatically execute the one or more offset actions.
Model variable candidate generation device and method
A model variable candidate generation device generating explanatory variable candidates to be used as candidates for an explanatory variable in generation of a prediction model includes: a data input unit inputting analysis data each entry having one or more items and the items having item values; a first item determination unit preliminarily setting properties of the items included in the analysis data as first item properties; a data property determination unit determining data properties being of the analysis data on the basis of the first item properties; a second item determination unit determining properties of the items included in the analysis data as second item properties on the basis of the data properties of the analysis data; and a variable candidate generation unit generating the explanatory variable candidates by selecting from the items or processing the items on the basis of the second item properties.
Statistics based query transformation
Techniques are described for responding to aggregate queries using optimizer statistics already available in the data dictionary of the database in which the database object targeting by the aggregate query resides, without the user creating any additional objects (e.g. materialized views) and without requiring the objects to be loaded into volatile memory in a columnar fashion. The user query is rewritten to produce a transformed query that targets the dictionary tables to form the aggregate result without scanning the user tables. “Accuracy indicators” may be maintained to indicate whether those statistics are accurate. Only accurate statistics are used to answer queries that require accurate answers. The accuracy check can be made during runtime, allowing the query plan of the transformed query to be used regardless of the accuracy of the statistics. For queries that request approximations, inaccurate statistics may be used so long as the statistics are “accurate enough”.
Method of distributed graph loading for minimal communication and good balance via lazy materialization and directory indirection using indexed tabular representation
Techniques herein minimally communicate between computers to repartition a graph. In embodiments, each computer receives a partition of edges and vertices of the graph. For each of its edges or vertices, each computer stores an intermediate representation into an edge table (ET) or vertex table. Different edges of a vertex may be loaded by different computers, which may cause a conflict. Each computer announces that a vertex resides on the computer to a respective tracking computer. Each tracking computer makes assignments of vertices to computers and publicizes those assignments. Each computer that loaded conflicted vertices transfers those vertices to computers of the respective assignments. Each computer stores a materialized representation of a partition based on: the ET and vertex table of the computer, and the vertices and edges that were transferred to the computer. Edges stored in the materialized representation are stored differently than edges stored in the ET.
Fast migration of metadata
One or more buckets of key-value pairs of a first node of a distributed storage system are selected to be migrated to a second node of the distributed storage system. One or more underlying database files corresponding to the one or more selected buckets are identified. The one or more identified underlying database files are directly copied from a storage of a first node to a storage of the second node. The copied underlying database files are linked in a database of the second node to implement the one or more selected buckets in the second node.
IMAGE-BASED POPULARITY PREDICTION
A machine may be configured to access an image of an item described by a description of the item. The machine may determine an image quality score of the image based on an analysis of the image. A request for search results that pertain to the description may be received by the machine, and the machine may present a search result that references the item's image, based on its image quality score. Also, the machine may access images of items and descriptions of items and generate a set of most frequent text tokens included in the item descriptions. The machine may identify an image feature exhibited by an item's image and determine that a text token from the corresponding item description matches one of the most frequent text tokens. A data structure may be generated by the machine to correlate the identified image feature with the text token.
IMAGE-BASED POPULARITY PREDICTION
A machine may be configured to access an image of an item described by a description of the item. The machine may determine an image quality score of the image based on an analysis of the image. A request for search results that pertain to the description may be received by the machine, and the machine may present a search result that references the item's image, based on its image quality score. Also, the machine may access images of items and descriptions of items and generate a set of most frequent text tokens included in the item descriptions. The machine may identify an image feature exhibited by an item's image and determine that a text token from the corresponding item description matches one of the most frequent text tokens. A data structure may be generated by the machine to correlate the identified image feature with the text token.
Iterative data processing
Data is processed iteratively by a database system with a first cache storing key-value data which resulted from previous iterations of processing input data and a second cache storing aggregated data which resulted from previous iterations of processing key-value data stored in the first cache. In a current iteration, the database system receives further input data related to the input data of the previous iterations, transforms the further input data into further key-value data and stores the further key-value data in the first cache in addition to the stored key-value data which resulted from previous iterations. The database system further processes the further key-value data and the aggregated data stored in the second cache to form updated aggregated data, and stores the updated aggregated data in the second cache for usage in further iterations. The database system also provides the updated aggregated data to at least one client.
Digital Messages in a Load Control System
A load control system may comprise load control devices for controlling respective electrical loads, and a system controller operable to transmit digital messages including different commands to the load control devices in response to a selection of a preset. The different commands may include a preset command configured to identify preset data in a device database stored at the load control device and/or a multi-output command configured to define the preset data for being stored in the device database. The system controller may decide which of the commands to transmit to the load control devices in response to the selection of the preset.