G06F16/152

Sketch-based image retrieval techniques using generative domain migration hashing

This disclosure relates to improved sketch-based image retrieval (SBIR) techniques. The SBIR techniques utilize a neural network architecture to train a domain migration function and a hashing function. The domain migration function is configured to transform sketches into synthetic images, and the hashing function is configured to generate hash codes from synthetic images and authentic images in a manner that preserves semantic consistency across the sketch and image domains. The hash codes generated from the synthetic images can be used for accurately identifying and retrieving authentic images corresponding to sketch queries, or vice versa.

File Storage Method and Apparatus, and Device and Readable Storage Medium
20230008406 · 2023-01-12 ·

A file storage method, apparatus, device and a readable storage medium. The method includes: performing striping processing on a target file to obtain multiple target objects, and calculating fingerprint information of each target object; using a first target object and logical information of the target file to form a logical header object, and storing the logical header object in a storage system; using the fingerprint information of each second target object to determine whether the second target object has been stored in the storage system; and if the second target object has not been stored in the storage system, determining the second target object as a third target object and storing same in the storage system. According to the method, logical information of each file can be preserved, and files of some users can be prevented from being modified or deleted after deduplication is performed in the storage system.

Source file copying and error handling

Object service receives request to copy file to destination and identifies group identifier for fingerprints group corresponding to sequential segments in file. Object service communicates request for fingerprints group to deduplication service associated with group identifier range including group identifier. Deduplication service communicates fingerprints group, retrieved from fingerprint storage, to object service, which communicates fingerprints group and group identifier to destination. Object service communicates request for file segments, corresponding to fingerprints missing in destination, communicated from destination, to deduplication service, which communicates requested segments, retrieved from source storage, to object service, which communicates requested segments to destination. System identifies generation identifier associated with time of communicating by object service or deduplication service, and generation identifier associated with another time of communicating by object service or deduplication service. If generation identifier associated with time differs from generation identifier associated with other time, object service or deduplication service restarts communication.

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.

Update of deduplication fingerprint index in a cache memory

In some examples, a system performs data deduplication using a deduplication fingerprint index in a hash data structure comprising a plurality of blocks, wherein a block of the plurality of blocks comprises fingerprints computed based on content of respective data values. The system merges, in a merge operation, updates for the deduplication fingerprint index to the hash data structure stored in a persistent storage. As part of the merge operation, the system mirrors the updates to a cached copy of the hash data structure in a cache memory, and updates, in an indirect block, information regarding locations of blocks in the cached copy of the hash data structure.

Optimized client-side deduplication

One example method includes optimizing client-side deduplication. When backing up a client, an overwrite ratio is determined based on a size of actual changes made to a volume and a size indicated by changes in a change log. Client-side deduplication is enabled or disabled based on a value of the overwrite ratio.

EXTENDING FILESYSTEM DOMAINS WITH A DOMAIN MEMBERSHIP CONDITION
20230237016 · 2023-07-27 ·

The described technology is generally directed an extension to the IFS domains architecture, referred to herein as filter domains. IFS domains allows tagging of files in a tree-like dataset. Thus, a domain can be defined at the root of the dataset such as the topmost directory under which all files reside. These domains are inherently hierarchichal, path-based entities. Filter domains extends this organization to allow domains to be applied beyond hierarchical tree structures in order to also provide arbitrary grouping of file objects based on any suitable membership condition.

INLINE DEDUPLICATION BETWEEN NODES IN STORAGE SYSTEMS
20230237021 · 2023-07-27 · ·

Techniques described herein coordinate inline deduplication among nodes in a storage system. The method includes storing, in a page descriptor ring on a node, data and a fingerprint associated with the data in an entry. The method includes determining that a flushing work set (FWS) has been frozen. The node identifies, in the page descriptor ring, entries associated with the frozen FWS and having fingerprints with a parity associated with the node. The node deduplicates the entries based on a fingerprint database on the node. The node synchronizes deduplication of the frozen FWS with a peer node, so as to receive deduplication results concerning entries having fingerprints with a parity associated with the peer node. The node replaces entries in the page descriptor ring with the deduplication results from the peer node, and flushes entries in the frozen FWS to a storage device.

LOGICAL IMAGING APPARATUS AND METHOD FOR DIGITAL FORENSIC TRIAGE

Disclosed herein are a logical imaging apparatus and method for digital forensic triage. The logical imaging method for digital forensic triage includes receiving files selected as a digital evidence target, creating a logical imaging file, inside of which is formatted in a predetermined file system structure, recording the selected files in accordance with the file system structure of the created logical imaging file, and storing selected file list information about a list of the recorded selected files, and creating a separate selected list information file and a separate logical imaging summary file outside the logical imaging file.

TIME-SERIES DATA DEDUPLICATION (DEDUP) CACHING
20230229329 · 2023-07-20 · ·

Aspects of the present disclosure relate to data deduplication (dedup) techniques for storage arrays. In embodiments, a sequence of input/output (IO) operations in an IO stream received from one or more host devices by a storage array are identified. Additionally, a determination is made as to whether previously received IO operations match the identified IO based on an IO rolling offsets empirical distribution model. Further, one or more data deduplication (dedup) techniques are performed on the matching IO sequence based on a comparison of a source compression technique and a target compression technique related to the identified IO sequence.