Patent classifications
G06F16/1748
System and method for using multimedia content as search queries
There is provided a method for searching a plurality of information sources using a multimedia element, the method may include receiving at least one multimedia element; generating, by a signature generator, for the at least one multimedia element at least one signature that is unidirectional, and yields compression; generating at least one textual search query using the at least one signature; wherein the generating of the textual search query comprises: (a) searching for at least one matching stored signature that matches one or more of the at least one signature; and (b) using a mapping between stored signatures and textual search queries, selecting at least one textual search query mapped to at least one matching stored signature; searching the plurality of information sources using the at least one textual search query; and causing a display of search results retrieved from the plurality of information sources.
Determining content-dependent deltas between data sectors
In one implementation, a method includes identifying a first content-dependent feature associated with a data sector. The method further includes determining a baseline data sector associated with the data sector. The method further includes determining, by a processing device, a content-dependent delta between the first content-dependent feature and a second content-dependent feature of the baseline data sector. The method further includes providing the content-dependent delta and an indicator to the baseline data sector for storage on a plurality of storage devices.
DATA DEDUPLICATION IN A DISAGGREGATED STORAGE SYSTEM
A data deduplication process is performed in a storage system which includes storage nodes, and storage control nodes which can access data directly from each storage node. A first storage control node sends a message to a second storage control node to initiate a deduplication process with respect to a given data block and an original data block owned by the second storage control node. The second storage control node increments a reference counter associated with the original data block, and sends a message to the first storage control which includes metadata. The first storage control node uses the metadata to read the original data block from a given storage node, performs a data compare process to determine whether the given data block matches the original data block, and creates a reference to the original data block, if the given data block matches the original data block.
Data deduplication method and apparatus
A data deduplication method includes receiving an overwrite request sent by an external device, where the overwrite request carries a data block and a first address into which the data block is to be stored, determining whether an overwrite quantity of the first address exceeds a first threshold within a time period [t1, t2], where both t1 and t2 are time points, and t2 is later than t1, and when the overwrite quantity of the first address exceeds the first threshold within the time period [t1, t2], skipping performing a deduplication operation on the data block or when the overwrite quantity of the first address does not exceed the first threshold within the time period [t1, t2], performing a deduplication operation on the data block.
PERSISTENT MEMORY TIERING SUPPORTING FAST FAILOVER IN A DEDUPLICATED FILE SYSTEM
A memory tier including persistent memory (PMEM) devices is established in nodes of a cluster system having a deduplicated file system. At least a portion of metadata generated by the deduplicated file system is persisted to the memory tier. The portion of metadata includes an index of fingerprints corresponding to data segments stored by the deduplicated file system to a storage pool. A determination is made that an instance of the deduplicated file system has failed. A new instance of the deduplicated file system is started to recover file system services by loading the index of fingerprints from the memory tier.
SYSTEMS AND METHODS FOR PHYSICAL CAPACITY ESTIMATION OF LOGICAL SPACE UNITS
Systems and methods of determining physical capacity of logical space units are disclosed. The method populates a first smart filter to track a physical capacity of a first logical space unit (LSU). The method adds fingerprints from the first LSU to register(s) of the first smart filter. The method populates a second smart filter to track fingerprints deleted by garbage collection (GC). The method adds the deleted fingerprints to register(s) of the second smart filter. Using the first and second smart filters, the method determines an intersection cardinality of the first LSU and the deleted fingerprints. The method determines a cardinality of unique fingerprints in the first LSU based on the intersection cardinality of the first LSU and the deleted fingerprints. The method determines the physical capacity of the first LSU based at least on the cardinality of unique fingerprints in the first LSU.
DUPLICATE FILE MANAGEMENT FOR CONTENT MANAGEMENT SYSTEMS AND FOR MIGRATION TO SUCH SYSTEMS
In large installations of document management systems, files are often duplicated. Users may place their own copies of files in convenient locations, or for other reasons files may be unintentionally duplicated. Duplication of files causes many problems for systems reliant on document management, chiefly because the additional (identical) files accept extra storage space, and must be handled like all other files, which results in greater network and resource utilization (with a concomitant increase in processing, search and retrieval times). A tool to standardize the identification of duplicate files (based on their binary contents), as well as the identification of a primary duplicate (the original file) across multiple repositories in a manner that minimizes the time for identification is disclosed.
Distributed storage device and data management method in distributed storage device
The number of inter-node communications in inter-node deduplication can be reduced and both performance stability and high capacity efficiency can be achieved. A storage drive of storage nodes stores files that are not deduplicated in the plurality of storage nodes, duplicate data storage files in which deduplicated duplicate data is stored, and cache data storage files in which cache data of duplicate data stored in another storage node is stored, in which when a read access request for the cache data is received, the processors of the storage nodes read the cache data if the cache data is stored in the cache data storage file, and request another storage node to read the duplicate data related to the cache data if the cache data is discarded.
Utilizing data source identifiers to obtain deduplication efficiency within a clustered storage environment
Described is a system (and method) that intelligently distributes data within a clustered storage environment. To provide such a capability, the system may distribute backup files by considering a source of the data to be backed-up. In particular, the system may leverage the ability of front-end components such as a backup application to perform a granular data source identification of data. Such information may be propagated to back-end components such as a storage filesystem in the form of a data source identifier (e.g. placement tag). The data source identifiers may then be accessed by the clustered storage system to intelligently distribute backup files amongst a set of storage nodes forming a cluster. For example, backup files from the same data source may be stored on the same storage node to obtain the same deduplication efficiency as a single storage system.
REDUCING BANDWIDTH DURING SYNTHETIC RESTORES FROM A DEDUPLICATION FILE SYSTEM
A request is received to restore a file at a deduplicated storage system to a client. The file resides at the storage system as a synthetic file based on a base file at the storage system. The request includes an indication that the base file is also present at the client. Metadata generated during a backup of the file to the storage system is reviewed. The metadata includes references to data determined to be in the base file at the storage system, and references to other data determined to not be in the base file at the storage system. The other data determined to not be in the base file is read from the storage system and transmitted to the client. Upon receipt, the client assembles the requested file using the base file present at the client and the other data determined to not be in the base file.