G06F16/152

Efficient filename storage and retrieval
11704336 · 2023-07-18 · ·

The disclosed technology relates to a system configured to detect a modification to a node in a tree data structure. The node is associated with a content item managed by a content management service as well as a filename. The system may append the filename and a separator to a filename array, determine a location of the filename in the filename array, and store the location of the filename in the node.

SYSTEMS AND METHODS FOR PERFORMANT DATA MATCHING
20230014556 · 2023-01-19 ·

The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.

Virtual file organizer
11698887 · 2023-07-11 · ·

A virtual file organization system, method and program product are disclosed. Included is a system that assigns classification tags to files stored within a storage system based on a natural language processing (NLP) context analysis of each file; and a virtual smart folder that is viewable within a user interface, wherein: opening the virtual smart folder causes a set of virtual subfolders to be displayed in which each virtual subfolder includes a category title; opening of a virtual subfolder causes a set of files residing at disparate locations in the storage system to be displayed; and the files displayed by opening the virtual subfolder each include an assigned classification tag that is associated with the category title of the virtual subfolder.

System and method for performing an antivirus scan using file level deduplication

Aspects of the disclosure describe methods and systems for performing an antivirus scan using file level deduplication. In an exemplary aspect, prior to performing an antivirus scan on files stored on at least two storage devices, a deduplication module calculates a respective hash for each respective file stored on the storage devices. The deduplication module identifies a first file stored the storage devices and determines whether at least one other copy of the first file exists on the storage devices. In response to determining that another copy exists, the deduplication module stores the first file in a shared database, replaces all copies of the first file on the storage devices with a link to the first file in the shared database, and performs the antivirus scan on (1) the first file in the shared database and (2) the files stored on the storage devices.

STORAGE OF ORDER BOOKS WITH PERSISTENT DATA STRUCTURES
20230214355 · 2023-07-06 ·

An electronic message is read, and a delta is generated based on a comparison of the electronic message to an existing order book. A new order book is generated based on the delta. An event is generated based on the existing order book, the delta, and the new order book. A sequence of events, including the event, is accumulated in a queryable persistent data structure over a time span. The queryable persistent data structure thus efficiently stores representations of order books.

FILE DE-DUPLICATION FOR A DISTRIBUTED DATABASE

A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.

Systems and methods for performant data matching

The present disclosure is directed to systems and methods for performant data matching. Entities maintain large amounts of data and desire to reconcile duplicative records. One way to solve this problem is through data matching. However, standard data matching at the record level can be laborious and inefficient. To remedy these inefficiencies in data matching, the present disclosure describes a system where the token records are tokenized a second time into token sets based on the token records satisfying at least one token set rule. A token set rule may be based on the common presence of multiple tokens in a token record. If multiple token records have the required tokens from the set rule, then those token records can be hashed and rolled-up into the token set (i.e., tokenized a second time into the token set). The token set allows for more efficient data matching.

INFORMATION PROCESSING DEVICE AND FILE ACCESS METHOD
20220414096 · 2022-12-29 · ·

An attribute information setting section loads information indicating whether or not access to each of a plurality of files is allowed, into a memory. A readout request receiving section receives a readout request including a file path from a program. A hash value deriving section derives a hash value of a file path included in the readout request. A file confirming section confirms whether or not the derived hash value matches with one of hash values of the files included in software. A determining section refers to the information loaded into the memory by the attribute information setting section in a case in which matching of the hash values is confirmed, to thereby determine whether or not a process on the file which has been subjected to the readout request is executable.

Markers for hash code calculations on occupied portions of data blocks

A method for performing hash code calculations may include calculating, during a write operation for a data block, a hash code for an occupied portion of the data block, inserting, during the write operation, a marker into the data block, calculating, during a read operation for the data block, a hash code for the occupied portion of the data block, searching, during the read operation, for the marker in the data block, and terminating the hash code calculation in response to finding the marker. A system may include a first interface configured to receive data blocks, a second interface configured to transmit data blocks, and hash logic coupled between the first and second interfaces, wherein the hash logic is configured to calculate a hash code for the occupied portion of a data block received through the first interface, and insert a marker in an unoccupied portion of the data block.

FILE STORAGE METHOD, TERMINAL, AND STORAGE MEDIUM
20220407725 · 2022-12-22 ·

Embodiments of the present disclosure disclose a file storage method, terminal, and storage medium. The file storage method includes: obtaining a to-be-stored file, performing splitting processing on the to-be-stored file to obtain N sub-files corresponding to the to-be-stored file, wherein N is an integer greater than or equal to 1; sending the N sub-files to an IPFS, and receiving M pieces of address information corresponding to the N sub-files returned by the IPFS, wherein M is an integer greater than or equal to 1 and less than or equal to N; generating an address set corresponding to the to-be-stored file according to the M pieces of address information, and encrypting the address set to obtain an address set ciphertext; sending the address set ciphertext to a blockchain network and receiving a target index value returned by the blockchain network, wherein the target index value is used to identify the address set ciphertext.