G06F16/174

Systems and methods for document search and aggregation with reduced bandwidth and storage demand

Methods and systems comprising a gateway coordinator of a local system that receives a task comprising search criteria, crawls for files on a local data source of the local system, and encounters one or more files of interest. The one or more files of interest may be deNISTed and deduplicated and sent to an upload coordinator of a remote cloud facility. In one or more examples, the gateway coordinator may be a virtual machine.

INLINE DEDUPLICATION BETWEEN NODES IN STORAGE SYSTEMS
20230237021 · 2023-07-27 · ·

Techniques described herein coordinate inline deduplication among nodes in a storage system. The method includes storing, in a page descriptor ring on a node, data and a fingerprint associated with the data in an entry. The method includes determining that a flushing work set (FWS) has been frozen. The node identifies, in the page descriptor ring, entries associated with the frozen FWS and having fingerprints with a parity associated with the node. The node deduplicates the entries based on a fingerprint database on the node. The node synchronizes deduplication of the frozen FWS with a peer node, so as to receive deduplication results concerning entries having fingerprints with a parity associated with the peer node. The node replaces entries in the page descriptor ring with the deduplication results from the peer node, and flushes entries in the frozen FWS to a storage device.

CONTAINER-BASED ERASURE CODING

A repository of replicated chunk files is analyzed to identify chunk files that meet at least a portion of combination criteria. Selected chunk files are associated together under a data protection grouping container. Erasure coding is applied to the data protection grouping container including by utilizing the selected chunk files as different data stripes of the erasure coding and generating one or more parity stripes based on the different data stripes.

COMPRESSION OF LOCALIZED FILES
20230021891 · 2023-01-26 · ·

A method for compressing a first application file and second application file includes accessing the first and the second application files, the first application file being in a first language and the second application being in a second language and being a counterpart of the first application file, decompressing the first and second application files to access internal files for the first and the second application files, comparing one of the first internal files to one of the second internal files, upon determining that the first internal file is identical to the second internal file, copying one of the internal files to an output folder, and upon determining that the files are not identical, copying both of the internal files to the output folder, or executing a differencing procedure on the first and second internal files to identify differences between them, storing data about the differences in the output folder, and compressing the output folder into one output file.

SYSTEM AND METHOD FOR A CONTENT-AWARE AND CONTEXT-AWARE COMPRESSION ALGORITHM SELECTION MODEL FOR A FILE SYSTEM
20230021513 · 2023-01-26 ·

A method for managing a file system includes obtaining, by a compression optimizing manager, a compression algorithm selection request for the file system, determining a set of selection inputs based on a set of file system parameters of the file system, applying a compression selection model to the set of selection inputs to obtain a compression algorithm selection, and initiating a file system compression implementation of the file system using the compression algorithm selection.

Information source agent systems and methods for distributed data storage and management using content signatures
11561931 · 2023-01-24 · ·

Information source agent systems and methods for distributed content storage and management using content signatures that use file identicality properties are provided. A data management system is provided that includes a content engine for managing the storage of file content, a content signature generator that generates a unique content signature for a file processed by the content engine, a content signature comparator that compares content signatures and a content signature repository that stores content signatures. Information source agents are provided that include content signature generators and content signature comparators. Methods are provided for the efficient management of files using content signatures that take advantage of file identicality properties. Content signature application modules and registries exist within information source clients and centralized servers to support the content signature methods.

Management of encryption agents in data storage systems

A method for managing keys and encrypting data is provided. The method includes receiving data to be written to a logical disk, generating an encryption table indicating one or more locations on the logical disk for storing the data and indicating a key used for encrypting the data, encrypting the data to be written to the logical disk, and transmitting the encrypted data and the encryption table to a storage array.

FILE COMPRESSION USING SEQUENCE SPLITS AND SEQUENCE ALIGNMENT
20230229632 · 2023-07-20 ·

Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. When splitting the file into sequences or when performing subsequent recursive splitting, the splitting is based on a longest sequence match. The result is a compression matrix, where each row of the matrix corresponds to part of the file. A consensus sequence is determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.

FILE COMPRESSION USING SEQUENCE ALIGNMENT
20230229631 · 2023-07-20 ·

Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. The result is a compression matrix, where each row of the matrix corresponds to part of the file. A consensus sequence id determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.

ADDING CONTENT TO COMPRESSED FILES USING SEQUENCE ALIGNMENT
20230229633 · 2023-07-20 ·

Compressing files is disclosed. An input, which is associated with an original file and new content, is to be compressed. The input includes a consensus sequence of the original file and the new content. The new content is aligned based using the consensus sequence of the original file in order to generate a new consensus sequence that reflects both the original content and the new content. The compression engine generates a new compression matrix and a new consensus sequence. Using the new consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The new compressed file includes the pointer pairs and the new consensus sequence.