G06F16/2255

Cache-aware system and method for identifying matching portions of two sets of data in a multiprocessor system

A system and method matches data from a first set of data with that of an other set of data in a manner based on the size of a cache.

Method for determining duplication of security vulnerability and analysis apparatus using same

A method for determining duplication of a vulnerability may include a vulnerability extraction step of extracting vulnerability uniform resource locator (URL) addresses including the vulnerability from an analysis target server; a hash generation step of generating the URL hash value corresponding to the extracted vulnerability from the vulnerability URL address; and a duplication determination step of determining, when the URL hash value is present in the first comparison table, that the vulnerability is duplicated and excluding the corresponding vulnerability from vulnerability information.

ELECTIVE DEDUPLICATION
20230237030 · 2023-07-27 · ·

Techniques described herein elect how data is deduplicated in a storage system. A similarity hash signature for a data unit is calculated. A digest table is searched for a similarity hash signature within a predetermined distance of the similarity hash signature for the data unit. Based on the search, either a similarity hash signature or a strong hash signature of the data unit is added to the digest table.

Fast Skip-List Scan and Insert
20230237035 · 2023-07-27 ·

Techniques are disclosed relating to efficiently managing skip list data structures. In various embodiments, a computing system stores a skip list including a plurality of key-value records that include one or more pointers to others of the plurality of key-value records. The computing system scans the skip list for a location associated with a particular key. The scanning includes using a prefix of the particular key to identify a particular portion of the skip list, where the particular portion includes key-value records having keys with the same prefix. The scanning also further includes initiating a scan for the location within the identified portion. In some embodiments, the computing system inserts a key-value record into the skip list at the location associated with the particular key in response to the scan identifying the location.

Tree-based format for data storage

A tree-based format may be implemented for data stored in a data store. A table may be maintained across one or multiple storage nodes in storage slabs. Storage slabs may be mapped to different nodes of a tree. Each node of the tree may be assigned a different range of distribution scheme values which identify what portions of the table are stored in the storage slab. Storage slabs mapped to child nodes in the tree may be assigned portions of the range of distribution scheme values assigned to a parent. Storage nodes may be added or removed for storing the table. Storage slabs may be moved from one storage node to another in order to accommodate the addition or removal of storage nodes.

Message Object Traversal In High-Performance Network Messaging Architecture
20230027817 · 2023-01-26 · ·

A communications system implements instructions including maintaining a message object that includes an array of entries. Each entry of the array includes a field identifier, a data type, and a next entry pointer. The next entry pointers and a head pointer establish a linked list of entries. The instructions include, in response to a request to add a new entry to the message object, calculating an index based on a field identifier of the new entry and determining whether the entry at the calculated index within the array of entries is active. The instructions include, if the entry is inactive, writing a data type, field identifier, and data value of the new entry to the calculated index, and inserting the new entry into the linked list. The instructions include, if the entry is already active, selectively expanding the size of the array and repeating the calculating and determining.

HASH-BASED IDENTIFICATION OF DATA CORRUPTION ISSUES IN TIME-SERIES DATA
20230025284 · 2023-01-26 ·

An apparatus includes a memory and a processor. The memory stores a time-series of data sets, and a first version of a data structure generated from the time-series as it existed at a first time. The data structure includes a bottom level of nodes, and subsequent levels of nodes, ending with a top level terminal node. Each bottom level node stores a hash of an assigned time-series data set. Each node of each subsequent level stores data generated from an assigned group of nodes of a previous level. The processor receives a validation request. In response, the processor generates a second version of the data structure based on the time-series as it exists at a second time. The processor determines that the terminal nodes in the first and second versions of the data structure do not match. In response, the processor generates an alert.

HARDWARE-BASED SENSOR ANALYSIS
20230229549 · 2023-07-20 ·

A method of monitoring messages from a sensor using an integrated circuit is provided. The messages include data measured by that sensor. The method includes reading a first message from interconnect circuitry of the integrated circuit. The interconnect circuitry connects the sensor to one or more core devices configured to process the messages. A first hash value is calculated for the first message. The first hash value is compared to one or more prior hash values stored in a hash store. Each prior hash value of the one or more prior hash values corresponds to a message that was read from the interconnect circuitry prior to the first message. A corrective action is performed when a difference between the first hash value and at least one of the prior hash values stored in the hash store is below a predetermined threshold.

Method and system to estimate the cardinality of sets and set operation results from single and multiple HyperLogLog sketches
11561954 · 2023-01-24 · ·

A system and method for the estimation of the cardinality of large sets of transaction trace data is disclosed. The estimation is based on HyperLogLog data sketches that are capable to store cardinality relevant data of large sets with low and fixed memory requirements. The disclosure contains improvements to the known analysis methods for HyperLogLog data sketches that provide improved relative error behavior by eliminating a cardinality range dependent bias of the relative error. A new analysis method for HyperLogLog data structures is shown that uses maximum likelihood analysis methods on a Poisson based approximated probability model. In addition, a variant of the new analysis model is disclosed that uses multiple HyperLogLog data structured to directly provide estimation results for set operations like intersections or relative complement directly from the HyperLogLog input data.

CROSS-SILO DATA STORAGE AND DEDUPLICATION
20230229643 · 2023-07-20 ·

In some aspects, a computing system may generate a content-defined tree. A content-defined tree may be a tree of cryptographic hashes where each leaf is a hash of a chunk (e.g., data chunk) of a data object, and each parent node (e.g., interior node) is the hash of a concatenation of the hashes of the parent's children nodes. To create parent nodes for the leaf nodes, a computing system may group leaf nodes together based on a rolling hash (e.g., a rolling hash of the hashes of the leaf nodes) satisfying a condition. Each parent node may include a hash that represents the concatenation of the hashes of the leaf nodes that fall under the corresponding parent node.