G06F16/24568

Computer-based systems configured to adjust data capacity in a data stream generated from multiple data producer applications and methods of use thereof

A method includes receiving from a transmitting data interface, a data stream mapping of a data input into data shards for transmission in a data stream over a data stream communication channel. Data capacity for a data producing software application from a plurality of data producing software applications is adjusted by increasing or decreasing a number of data shards in the data stream assigned to the data producing software application. An updated data stream mapping of the data input into the plurality of data shards is generated by updating a start hash key and an end hash key in a range for each of the data shards assigned to the data producing software application. The updated data stream mapping is sent to the transmitting data interface for adjusting the data capacity in the data stream transmitted over the data stream communication channel of the data producing software application.

VARIABLE DENSITY-BASED CLUSTERING ON DATA STREAMS
20230044676 · 2023-02-09 ·

In some implementations, a device may receive, from a data stream, a set of data points arranged in a dimensional data space. The device may compare the set of data points to identify one or more clusters using values of a distance parameter for data points included in the set of data points, wherein the values of distance parameter includes different values of the distance parameter for different data points. The device may transmit an indication of the one or more clusters to cause a device to display information associated with the one or more clusters. The device may receive, from the device, feedback information associated with at least one data point, wherein the feedback information indicates that at least one data point is associated with an error. The device may modify a value of the distance parameter associated with the at least one data point to a modified value.

Data stream processing

Techniques for partitioning data from a data stream into batches and inferring schema for individual batches based on the field values of each batch are disclosed. The system may infer different schemas corresponding to different batches of data records even though the batches are received from a common data stream or a common data source. The system may infer a schema by determining whether a field contains single values or multiple values. Then the system determines the field type(s) associated with the values. These determinations are then stored in a dictionary generated for each batch.

Joining and dimensional annotation in a streaming pipeline

Disclosed are embodiments for providing batch performance using a stream processor. In one embodiment, a method is disclosed comprising receiving, at a stream processor, an event, the stream processor including a plurality of processing stages; generating, by the stream processor, an augmented event based on the event, the augmented event including at least one additional field not appearing in the event, the additional field generated by an operation selected from the group consisting of a join or dimensional annotation operation; and emitting, by the stream processor, the augmented event to downstream consumer.

Abstraction layer for streaming data sources

Methods and systems for implementing an abstraction layer for streaming data sources are disclosed. A request to perform an operation based on one or more keys is received using a key-value interface. A streaming data source is selected based on the request. The operation is performed using the streaming data source, wherein the operation comprises storing or retrieving one or more values based on the one or more keys.

STRING ENTROPY IN A DATA PIPELINE
20230040648 · 2023-02-09 ·

Various embodiments comprise systems and methods to determine entropy in strings generated by a data pipeline. In some examples, data monitoring circuitry monitors a data pipeline that ingests input data, processes the input data, and responsively generates and transfers a data string that comprises character groups. The data monitoring circuitry receives the data string, identifies character groups in the data string, identifies group types for the character groups, and assigns numbers to the character groups based on the group types. The data monitoring circuitry determines a probability distribution for the numbers, calculates entropy for the data string based on probability distribution, and generates an entropy histogram based on the entropy. The data monitoring circuitry compares the entropy histogram of the data string to another entropy histogram for another data string, determines a change in entropy, and reports the change in entropy.

Managing sharable cell-based analytical notebooks

In an embodiment, a data processing method comprises accessing a computer memory comprising a shareable cell-based computation notebook comprising: notebook metadata specifying a kernel for execution, and a computational cell comprising cell metadata, a source code reference, and an output reference, wherein the cell metadata identifies a particular version of source code of a function that defines an input dataset, a transformation, and one or more variables that are to be associated with output data that is to be generated as a result of executing the particular version of the source code; updating the source code reference to identify a first storage location that is to contain the particular version of the source code of the function; and updating the output reference to identify a second storage location that is to contain the output data that is to be generated as a result of executing the particular version of the source code identified in the cell metadata using the kernel specified in the notebook metadata, wherein the method is performed by one or more computing devices.

Apparatuses, methods, and computer program products for triggering component workflows within a multi-component system

Methods, apparatuses, or computer program products provide for triggering component workflows within a multi-component system. An update to one or more component metadata records of a component metadata vector associated with a first component identifier may be received. The component metadata vector may include a plurality of records. Each record of the plurality of records may include a unique component metadata record identifier and a component metadata value. The component metadata vector associated with the first component identifier may be traversed after updating the one or more component metadata records. Based at least in part on detecting a component metadata condition associated with a component workflow trigger associated with the first component identifier, a first component workflow action of a first component workflow action series comprising a plurality of component workflow actions may be executed. Furthermore, a component workflow trigger notification may be transmitted to a first computing device.

Providing writable streams for external data sources
11593310 · 2023-02-28 · ·

The subject technology determines, using a connection to an external data source, a set of shards stored in an external data source, the connection to the external data source being established using an external integration, the external integration including security and configuration information. The subject technology determines a set of offsets of each shard of the set of shards. The subject technology generates a query plan indicating a degree of parallelism based at least in part on a size of the set of offsets. The subject technology, based on the set of shards and the set of offsets, performs an operation on the external data source by performing, using the connection to the external data source, a write operation from a query statement on the external data source, the external data source being different than a storage platform associated with the system.

DYNAMICALLY CHANGING INPUT DATA STREAMS PROCESSED BY DATA STREAM LANGUAGE PROGRAMS
20180011695 · 2018-01-11 ·

An instrumentation analysis system processes data streams by executing instructions specified using a data stream language program. The data stream language allows users to specify a search condition using a find block for identifying the set of data streams processed by the data stream language program. The set of identified data streams may change dynamically. The data stream language allows users to group data streams into sets of data streams based on distinct values of one or more metadata attributes associated with the input data streams. The data stream language allows users to specify a threshold block for determining whether data values of input data streams are outside boundaries specified using low/high thresholds. The elements of the set of data streams input to the threshold block can dynamically change. The low/high threshold values can be specified as data streams and can dynamically change.