Patent classifications
G06F16/152
COMPARING DATASETS USING HASH VALUES OVER A SUBSET OF FIELDS
Methods and apparatus are disclosed for comparing relevant content of datasets. A hash value is computed over a selected subset of fields to obtain a signature of a dataset, with other fields being disregarded. A hash value can be computed directly for all records of the dataset, or by combining individual hash values for each record. Comparison of the signature with that of other datasets leads to efficient determination whether two datasets match with respect to relevant content in the selected fields. For larger groups of datasets, lists of matched and mismatched datasets can be reported. Optional features include matches insensitive to permutation of the records, or identification of which records in a dataset fail to match.
METHOD AND APPARATUS FOR REPLICATING A TARGET FILE BETWEEN DEVICES
There is provided a method and apparatus for remote differential compression (RDC) and data deduplication. According to embodiments, when a sending device acquires a new target file, the following steps are performed. Initially, Jaccard segmentation is performed, followed by performing identity-based segment deduplication and similarity-based segment deduplication. The transmission of the target file in the deduplicated form to the recipient device is subsequently performed. The recipient device can then rebuild the original target file from the deduplicated form thus replicating the target file at the recipient device with the target file originally present at the sending device.
Ensuring consistent metadata across computing devices
Techniques are disclosed for ensuring consistent metadata across computing devices. In one example, a user device of a plurality of user devices receives a manifest that includes first metadata associated with a file system update of a file system of the user device. The user device generates second metadata of the file system based on performing the file system update. The user device then generates a dictionary based on comparing metadata records of the first metadata with metadata records of the second metadata. The dictionary may indicate a difference between at least one metadata record of the first metadata and at least one metadata record of the second metadata. The user device then updates the second metadata of the file system to match the first metadata based at least in part on the difference indicated by the dictionary.
MEDIA MONITORING USING MULTIPLE TYPES OF SIGNATURES
Example local devices disclosed herein include memory including a set of reference fingerprints corresponding to media, the set of reference fingerprints from a remote device different from the local device and one or more processor circuits to execute machine readable instructions to generate a monitored fingerprint of the media presented at a location and compare the monitored fingerprint to at least some of the set of reference fingerprints from the remote device. Additionally, the one or more processor circuits are to determine an amount of time that has passed since the media started and after a match between the monitored fingerprint and one or more reference fingerprints of the set of reference fingerprints, cause transmission of audience measurement information to identify the media, the audience measurement information including data indicative of the amount of time that has passed since the media started.
SYNCHRONIZATION SYSTEM AND METHOD
A synchronization system configured to identify a first workload file on a central storage and identify a first workload file on a compute system remote from the central storage. The synchronization system determines if the first workload file on the central storage is different than the first workload file on the compute system. If the two first workload files are different, the synchronization system automatically copies the first workload file on the central storage to the compute system so the a compute system workload will be performed using the copy of the first workload file from the central storage and the compute system workload will not be performed using the first workload file on the compute system.
Systems and methods for sharding based on distributed inverted indexes
According to one embodiment, distributing data across a plurality of storage shards can comprise generating a file key for each file of a plurality of files stored in a plurality of physical shards, each physical shard maintained by a node of a plurality of nodes in one or more clusters. The file key can comprise a hash of an enterprise identifier for an entity to which the creator of the file is a member, a hash of a folder identifier for a location in which the file is stored, and a hash of a file identifier uniquely identifying the file. The generated file keys can be sorted into an ordered list and the ordered list can be logically partitioning into a plurality of logical shards. Each logical shard of the plurality of logical shards can then be mapped to one of the plurality of physical shards.
System and method for content-hashed object storage
Features are detected from a sensor signal via a deep-learning network or other feature engineering methods in an edge processing node. Machine-learned metadata is created that describes the features, and a hash is created with the machine-learned metadata. The sensor signal is stored as a content object at the edge processing node, the object being keyed with the hash at, the edge processing node.
INDEXING DOCUMENTS IN A NESTED HIERARCHY OF DIRECTORIES
An online storage system receives a plurality of documents to be stored in a directory. The storage system stores document data from each document in a document database. The storage system generates an entry for each document in an entry table and indexes the documents stored in the directory. The storage system samples a subset of the plurality of documents assigned to the directory in a directory index to determine a sampled subset of the plurality of documents. The storage system indexes the sampled subset in a directory index. The storage system can receive a request, from a client device, to view the indexed documents in the directory. Responsive to the request, the storage system presents the indexed documents in the directory retrieved from the directory index.
Persistent indexing and free space management for flat directory
Methods, non-transitory computer readable media, computing devices and systems for persistent indexing and space management for flat directory include creating, using at least one of said at least one processors, an index file to store mapping information, computing, using at least one of said at least one processor, a hash based on a lookup filename, searching, using at least one of said at least one processor, the index file to find all matching directory cookies based on the computed hash, selecting, using at least one of said at least one processor, the directory entity associated with the lookup filename from among the matched directory cookies, and returning, using at least one of said at least one processor, the determined directory entity.
Traversal rights
The present technology pertains to a organization directory hosted by a synchronized content management system. The corporate directory can provide access to user accounts for all members of the organization to all content items in the organization directory on the respective file systems of the members' client devices. Members can reach any content item at the same path as other members relative to the organization directory root on their respective client device. In some embodiments novel access permissions are granted to maintain path consistency.