Patent classifications
G06F16/1748
METHODS AND SYSTEMS FOR REDUCING THE STORAGE VOLUME OF LOG MESSAGES
Automated methods and systems for compressing log messages stored in a log message databased are described herein. The automated methods and systems perform lossy compression of an original set of log messages by identifying log messages that represent each of the various types of events recorded in the original set. The log messages in the original set are overwritten by corresponding representative log messages. Source coding is used to construct a source coding scheme and variable length binary codewords for each of the representative log messages. The representative log messages are replaced by the codewords, which occupies significantly less storage space than the original set. The lossy compressed set of log messages can be decompressed to obtain the representative log messages using the source coding scheme.
System and method for performing an antivirus scan using file level deduplication
Aspects of the disclosure describe methods and systems for performing an antivirus scan using file level deduplication. In an exemplary aspect, prior to performing an antivirus scan on files stored on at least two storage devices, a deduplication module calculates a respective hash for each respective file stored on the storage devices. The deduplication module identifies a first file stored the storage devices and determines whether at least one other copy of the first file exists on the storage devices. In response to determining that another copy exists, the deduplication module stores the first file in a shared database, replaces all copies of the first file on the storage devices with a link to the first file in the shared database, and performs the antivirus scan on (1) the first file in the shared database and (2) the files stored on the storage devices.
PREVENTING DUPLICATION OF FILES IN A STORAGE DEVICE
Duplication of files in a storage device of a computing device can be avoided using some techniques described herein. In one example, a system can determine a checksum of a file in a software package. The system can then determine that the file is absent from a storage device by issuing a command for accessing the file based on the checksum. In response to determining that the file is absent from the storage device, the system can download a copy of the file from a remote computing device to the storage device over a network.
METHOD AND SYSTEM FOR FACILITATING DISTRIBUTED ENTITY RESOLUTION
A method for providing data blocking to facilitate distributed entity resolution is disclosed. The method includes receiving data sets from a source, the data sets including records that correspond to an entity; grouping each of the records into a block based on a shared characteristic, the block including a blocking key; converting the block into a data file, the data file corresponding to a predetermined file format; partitioning the data file based on the corresponding blocking key; determining, via a worker node, a potential record pair by using the partitioned data file; and persisting the potential record pair.
FILE DE-DUPLICATION FOR A DISTRIBUTED DATABASE
A device configured to identify a file in a network device, to generate a first set of block hash codes for data blocks for a first instance of the file, and to generate a second set of block hash codes for data blocks for a second instance of the file. The device is further configured to determine the first set of block hash codes matches the second set of block hash codes and to generate an entry in a file list for the instances of the file. The device is further configured to count the number of entries that are associated with the file and to determine the number of entries is greater than the redundancy threshold value. The device is further configured to delete one or more instances of the file in response to determining that the number of entries is greater than the redundancy threshold value.
Unique ID generation for sensors
Systems, methods, and computer-readable media are provided for generating a unique ID for a sensor in a network. Once the sensor is installed on a component of the network, the sensor can send attributes of the sensor to a control server of the network. The attributes of the sensor can include at least one unique identifier of the sensor or the host component of the sensor. The control server can determine a hash value using a one-way hash function and a secret key, send the hash value to the sensor, and designate the hash value as a sensor ID of the sensor. In response to receiving the sensor ID, the sensor can incorporate the sensor ID in subsequent communication messages. Other components of the network can verify the validity of the sensor using a hash of the at least one unique identifier of the sensor and the secret key.
Efficient mechanism to perform auto retention locking of files ingested via distributed segment processing in deduplication backup servers
A command requesting creation of a backup file and issued by a client-side deduplication library is received. Upon creating the file, a first flag is set on the file indicating that the file should be automatically retention locked after a cooling off period has elapsed. During the cooling off period, a command requesting that the file be opened for writes is received. The first flag is cleared to exclude the file from being automatically retention locked after the cooling off period has elapsed. A second flag is set on the file indicating that writes to the file are in progress. A command requesting that the file be closed, the writes to the backup file thereby being complete, is received. The second flag is cleared. The first flag is reset to allow the file to be automatically retention locked after the cooling off period has elapsed.
Container management system with a remote sharing manager
Methods, systems, and computer storage media for providing a set of common flat files in a composite image that can be mounted as a container (i.e. composite container) to support isolation and interoperation of computing resources. Container management is provided for a container management system based on a composite image file system engine that executes composite operations. In particular, a remote sharing manager operates with a composite engine interface to support generating composite images configured for split layer memory sharing, split layer direct access memory sharing, and dynamic base images. In operation, a plurality of files and a selection of a remote sharing configuration for generating a composite image are accessed. The composite image for the plurality of files and the remoting sharing configuration is generated. The composite image is communicated to cause sharing of the composite image, sharing of the composite image is based on the remote sharing configuration.
OPTIMIZING RESOURCES IN A DISASTER RECOVERY CLEANUP PROCESS
In an approach for optimizing resources in a disaster recovery cleanup process, processors are configured for receiving transaction entries represented by transaction identifiers at a source database in communication with target databases via Synchronous-to-Asynchronous Traffic Converters (SATCs). Further, the processors are configured for transmitting a transaction payload from the SATCs to the target databases; identifying completed tracking entries corresponding to tracking entries having a complete status for the SATCs; deleting remaining transaction entries ranging from a transaction entry associated with a highest processed transaction identifier to a transaction entry associated with a lowest processed transaction identifier; providing a list of the remaining transaction entries that were deleted to predecessors of the SATCs; removing the remaining transaction entries from the SATCs if the transaction entries were delivered to all target databases; and detecting a topology change corresponding to one or more additional SATCs integrated with the one or more SATCs.
Elastic, ephemeral in-line deduplication service
A deduplication service can be provided to a storage domain from a services framework that expands and contracts to both meet service demand and to conform to resource management of a compute domain. The deduplication service maintains a fingerprint database and reference count data in compute domain resources, but persists these into the storage domain for use in the case of a failure or interruption of the deduplication service in the compute domain. The deduplication service responds to service requests from the storage domain with indications of paths in a user namespace and whether or not a piece of data had a fingerprint match in the fingerprint database. The indication of a match guides the storage domain to either store the piece of data into the storage backend or to reference another piece of data. The deduplication service uses the fingerprints to define paths for corresponding pieces of data.