Patent classifications
G06F16/254
Systems and methods for data linkage and entity resolution of continuous and un-synchronized data streams
The present disclosure is directed to a scalable, extensible, fault-tolerant system for stateful joining of two or more streams that are not fully synchronized, event ordering is not guaranteed, and certain events arrive a bit late. The system can ensure to combine the events or link the data in near real-time with low latency to mitigate impacts on downstream applications, such as ML models for determining suspicious behavior. Apart from combining events, the system can ensure to propagate the needed entities to other product streams or help in entity resolution. If any of the needed data is yet to arrive, a user can configure a few parameters to achieve desired eventual and attribute consistency. The architecture is designed to be agnostic of stream processing framework and can work well with both streaming and batch paths.
CONFIGURABLE PARSER AND A METHOD FOR PARSING INFORMATION UNITS
A packet processing technique can include receiving a packet, and parsing the packet based on a protocol field to generate a parse result vector. The parse result vector is used to select between forwarding the packet to a virtual machine executing on a host processing integrated circuit, forwarding the packet to a physical media access controller, multicasting the packet to multiple virtual machines executing on the host processing integrated circuit, and sending the packet to a hypervisor.
EXTRACTING DATA FROM A BLOCKCHAIN NETWORK
According to at least one embodiment, a method, computer system, and computer program product for model-driven extraction of event data by a computational device is provided. The present invention may include transforming event data of an event occurring on a blockchain network using an event schema to comply with a data model of an external data structure; mapping, based on the event schema and the data model, the event data to a logical structure of the external data structure; and loading the transformed event data to the external data structure to modify a data content of the external data structure based on the mapping.
PROCESSING DATA INPUTS FROM ALTERNATIVE SOURCES TO GENERATE A PREDICTIVE SIGNAL
A computer-implemented method includes a method comprising using at least one hardware processor to: receive a plurality of data from a plurality of data sources; standardize the plurality of data; tag the standardized plurality of data with one or more companies; train a prediction model to predict a metric for each of the one or more companies based on the standardized plurality of data tagged with that company and historical measurements for that company; and apply the prediction model to new data to predict the metric for at least one of the one or more companies.
A SYSTEM AND METHOD FOR ETL PIPELINE PROCESSING
The invention provides an ETL pipeline system including an interface configured to obtain a plurality of graph configuration components. Each graph configuration component includes information representative of one or more computational logic rules. The system further includes a computation graph generator configured to generate a computation graph based on the obtained graph configuration components. The generated computation graph includes a node for each graph configuration component and one or more links representative of relationships between the nodes. The system further includes a computation graph adaptor configured to receive, from an external source, external information relating to the graph configuration components, and to adapt the generated computation graph based on the external information. The system further includes a computation graph runner configured to run the adapted computation graph.
Systems and methods of generating datasets from heterogeneous sources for machine learning
A computer system is provided that is programmed to select feature sets from a large number of features. Features for a set are selected based on metagradient information returned from a machine learning process that has been performed on an earlier selected feature set. The process can iterate until a selected feature set converges or otherwise meets or exceeds a given threshold.
ON-DEMAND DATA FROM HETEROGENEOUS SOURCES
A method enables on-demand delivery of data from a plurality of heterogeneous external data sources to a data analytics tool. With a mapping of one or more identified data connectors, a no-code client query, as formulated in a first data model, is dynamically translated to one or more external queries formulated in one or more alternate data models of the heterogeneous external data sources. With the mappings of the one or more identified connectors, each response to the one or more external queries is reformulated from the one or more alternate data models to the first data model to yield one or more client query results objects. The client query results objects are sent to the data analytics tool.
Machine learning systems for ETL data streams
Apparatus and methods an artificial intelligence method of reducing failure in an informational flow of a data stream controlled by an Extract Transform Load process using a machine learning (“ML”) model training system are provided. The method may include deploying a software sensor that periodically captures data points for an extract job executed during an extract phase of the process. The method may also include building a behavior profile concurrently with the receipt of each of the data points. The method may further include comparing the behavior profile to behavior profiles stored in an Adverse Behavior Model database and behavior profiles stored in a Normal Behavior Model database. When the behavior profile is determined to have a threshold number of match points matching the behavior profile to behavior profiles in the Adverse Behavior Model database, the method may include increasing a target database storage capacity.
DIGITAL FORENSICS TOOL AND METHOD
A digital forensics tool and associated method are disclosed for extracting digital data from a user computing device, transforming and analyzing the digital data, and generating an interactive user interface that facilitates the identification of important digital data, such as for a criminal investigation.
TRANSFERRING DATA TO BACKUP DATA REPOSITORIES
Disclosed herein are system, method, and computer program product embodiments for managing data storage devices. In some embodiments, a server receives a request to store data in a first storage device. The server determines the storage space remaining in the first data storage device based on historical data associated with the first data storage device. The server further determines that the first data storage device will exceed its storage capacity based on a size of the data and the storage space remaining in the first data storage device. The server transfers a set of data stored in the first data storage device to a second storage device. Transferring the set of data causes the storage space remaining in the first storage device to be greater than or equal to the size of the data. The server stores the data in the first data storage device.