G06F16/2462

Systems and methods for accelerating exploratory statistical analysis

Embodiments of the invention utilize a “data canopy” that breaks statistical measures down to basic primitives for various data portions and stores the basic aggregates in a library within an in-memory data structure. When a queried statistical measure involves a basic aggregate stored in the library over a data portion that at least partially overlaps the data portion associated with the basic aggregate, the basic aggregate may be reused in the statistical computation of the queried measure.

Leveraging feature engineering to boost placement predictability for seed product selection and recommendation by field

An example computer-implemented method includes receiving a plurality of agricultural data records including yield properties of products grown in fields and raw field features of the fields. The method also includes transforming the raw field features into distinct feature classes that characterize key features affecting yield of the one or more products, and generating, using data from the plurality of agricultural data records and the distinct feature classes, genomic-by-environmental relationships between one or more products, yield properties of the one or more products, and field features associated with the one or more products. Further, the method includes generating, based at least in part on the genomic-by-environmental relationships, predicted yield performance for a set of products associated with one or more target environments, generating product recommendations for the one or more target environments based on the predicted yield performance for the set of products, and providing one or more instructions configured to cause display of the product recommendations.

ELECTIVE DEDUPLICATION
20230237030 · 2023-07-27 · ·

Techniques described herein elect how data is deduplicated in a storage system. A similarity hash signature for a data unit is calculated. A digest table is searched for a similarity hash signature within a predetermined distance of the similarity hash signature for the data unit. Based on the search, either a similarity hash signature or a strong hash signature of the data unit is added to the digest table.

Detecting relationships across data columns

There is a need for more effective and efficient detection of cross-data-column relationships. This need can be addressed by, for example, techniques for detecting cross-data-column data relationships that utilize at least one of feature-based similarity models and deep-learning-based similarity models. The cross-data-column data relationships may be displayed to an end-user using a cross-column relationship detection user interface.

Statistics based query transformation

Techniques are described for responding to aggregate queries using optimizer statistics already available in the data dictionary of the database in which the database object targeting by the aggregate query resides, without the user creating any additional objects (e.g. materialized views) and without requiring the objects to be loaded into volatile memory in a columnar fashion. The user query is rewritten to produce a transformed query that targets the dictionary tables to form the aggregate result without scanning the user tables. “Accuracy indicators” may be maintained to indicate whether those statistics are accurate. Only accurate statistics are used to answer queries that require accurate answers. The accuracy check can be made during runtime, allowing the query plan of the transformed query to be used regardless of the accuracy of the statistics. For queries that request approximations, inaccurate statistics may be used so long as the statistics are “accurate enough”.

IMAGE-BASED POPULARITY PREDICTION
20230229692 · 2023-07-20 ·

A machine may be configured to access an image of an item described by a description of the item. The machine may determine an image quality score of the image based on an analysis of the image. A request for search results that pertain to the description may be received by the machine, and the machine may present a search result that references the item's image, based on its image quality score. Also, the machine may access images of items and descriptions of items and generate a set of most frequent text tokens included in the item descriptions. The machine may identify an image feature exhibited by an item's image and determine that a text token from the corresponding item description matches one of the most frequent text tokens. A data structure may be generated by the machine to correlate the identified image feature with the text token.

FUZZY LOGIC MODELING FOR DETECTION AND PRESENTMENT OF ANOMALOUS MESSAGING
20230231876 · 2023-07-20 · ·

Disclosed is an approach that applies a fuzzy logic model that may involve fuzzy-matching a plurality of address fields to determine a common physical address, and determining a number of communiques directed to that address with reference to a threshold that may determine an excessive number of communiques. The plurality of address fields may also be fuzzy-matched to information in a fraud-risk database which may comprise a fraud-risk address. One or more matches may be presented to a user who may adjust the views of the various matches, track various trends within the data, and harmonize the various address fields relating to a physical address.

MODELING METHOD AND APPARATUS

A modeling method and an apparatus are disclosed. The method includes: obtaining a first data set of a first indicator, and determining, based on the first data set, a second indicator similar to the first indicator; and determining a first model based on one or more second models associated with the second indicator. The first model is used to detect a status of the first indicator, and the status of the first indicator includes an abnormal state or a normal state. The second models are used to detect a status of the second indicator, and the status of the second indicator includes an abnormal state or a normal state.

SEARCH DEVICE, SEARCHING METHOD, AND PLASMA PROCESSING APPARATUS

A model learning unit learns a prediction model on the basis of learning data, a target setting unit sets a target output parameter value by interpolating between a goal output parameter value and an output parameter value which is the closest to the goal output parameter value in output parameter values in the learning data, a processing condition search unit estimates input parameter values which corresponds to the goal output parameter value and the target output parameter value, a model learning unit updates the prediction model by using a set of the estimated input parameter value and an output parameter value which is a result of processing that a processing device performs as additional learning data.

Dynamic-Ledger-Enabled Edge-Device Query Processing
20230222413 · 2023-07-13 ·

A method for processing a query for data stored in a distributed database includes receiving, at an edge device, the query for data stored in the distributed database from a query device. The method includes causing, by the edge device, the query to be stored on a dynamic ledger maintained by the distributed database. The method includes detecting, by the edge device, that summary data has been stored on the dynamic ledger. The method includes generating, by the edge device, an approximate response to the query based on the summary data stored on the dynamic ledger. The method includes transmitting, to the query device, the approximate response.