G06F16/164

Method For Decentralized Accessioning For Distributed Machine Learning and Other Applications

A method for injecting metadata into an existing artifact is described. The method generates metadata related to an existing artifact having a predetermined structure and encodes the metadata in accordance with the predetermined structure. The encoded metadata is embedded within the existing artifact in accordance with the predetermined structure and is delineated within the predetermined structure as one or more individual records. The artifact, including embedded metadata, is stored within a storage entity and is accessible to processes related to the artifact. Additional records may be generated and embedded over time, thus creating a timeline if event related to the artifact.

Intelligent routing based on the data extraction from the document
11556502 · 2023-01-17 · ·

An approach is provided for using parsing rules to automatically identify attributes and attribute values from documents and generate metadata that maps attribute values to display labels that may be searched, filtered, and sorted upon within an external storage service. A document processing system maintains parsing rules that define how to identify field labels, which represent attributes, and corresponding field values, which represent attribute values, and metadata mappings that map associations between field values and display labels. The display labels are used within the graphical user interface of the external storage service. The system receives a batch of multiple documents and uses the parsing rules to identify field labels and field values. The system generates metadata using the defined metadata mappings and associates the metadata to the documents processed. The system then sends the documents and their associated metadata to the external storage service for storage.

MULTI-DIMENSIONAL DATA LABELING
20230009237 · 2023-01-12 ·

Methods and systems for multi-dimensional data labeling. A structured data set having a plurality of rows is obtained, the structured data set comprising a set of data attributes, each data attribute having a data value for each of the plurality of rows of the structured data set. The structured data set is decomposed into a plurality of dimensions, each dimension defining a proper subset of the data attributes based on coherence criterion. A dimension label is obtained for each dimension of at least a portion of the plurality of rows of the structured data set and the dimension labels for a given one of the rows of the structured data set are consolidated into at least one row label for the given one of the rows.

Metadata management in storage systems
11550479 · 2023-01-10 · ·

Techniques are disclosed for managing metadata of a storage system. A storage control system receives data to be written to primary storage, and writes the received data together with metadata to a write cache. The storage control system destages the metadata from the write cache to a primary metadata structure which is configured to persistently store and index the metadata. The primary metadata structure comprises (i) a first data structure that is configured to accumulate the metadata destaged from the write cache and organize the accumulated metadata in blocks of metadata sorted by index keys, and (ii) a second data structure that is configured to receive the accumulated metadata from the first data structure, and organize the received metadata using an index structure that enables random-access to the metadata using the index keys.

Cloud hybrid application storage management (CHASM) system

The cloud hybrid application storage management system spans local data center and cloud-based storage and provides a unified view of content and administration throughout an enterprise. The system manages synchronization of storage locations, ensuring that files are replicated, uniquely identified, and protected against corruption. The system ingests digital media assets and creates instances of the assets with their own identification and rights and houses the identification and relationships in a CAR (Central Asset Registry). The system tracks the different instances of the assets in multiple storage locations using the CAR, which is a central asset registry that ties together disparate digital asset management repository systems (DAMs) and cloud-based storage archives in which the instances reside. While the invention treats and manages multiple files/instances independently, the CAR identifies them as related to each other.

Systems and methods for managed asset distribution in a distributed heterogeneous storage environment
11574025 · 2023-02-07 · ·

Embodiments of systems and methods for the rules based distribution of managed content across heterogeneous storage distributed in a network environment are disclosed. In particular, certain embodiments may employ entity rules in association with a content management system. An entity rule may be a rule specifying a set of parameters and a destination secondary storage location. When the entity rule is evaluated by the content system, a set of content managed by the content management system responsive to the rule may be determined using the parameters of the rule. Responsive content can be determined, for example, by searching the content of the content management system based on the parameters. Responsive content may be moved from the primary storage location of the content management system to the secondary storage location specified by the entity rule.

Automated runtime configuration for dataflows

Methods, systems and computer program products are provided for automated runtime configuration for dataflows to automatically select or adapt a runtime environment or resources to a dataflow plan prior to execution. Metadata generated for dataflows indicates dataflow information, such as numbers and types of sources, sinks and operations, and the amount of data being consumed, processed and written. Weighted dataflow plans are created from unweighted dataflow plans based on metadata. Weights that indicate operation complexity or resource consumption are generated for data operations. A runtime environment or resources to execute a dataflow plan is/are selected based on the weighted dataflow and/or a maximum flow. Preferences may be provided to influence weighting and runtime selections.

REGENERATED CONTAINER FILE STORING
20180004766 · 2018-01-04 ·

A regenerated container file is detected, and a file in the regenerated container file is determined that is different from any file in an existing container file related to the regenerated container file. To store the regenerated container file, the different file is sent to the data storage for storing.

SEARCH FILTERED FILE SYSTEM USING SECONDARY STORAGE, INCLUDING MULTI-DIMENSIONAL INDEXING AND SEARCHING OF ARCHIVED FILES

Techniques for enabling user search of content stored in a file archive include providing a search interface comprising a search rules portion and an action rules portion, receiving a file archive search criterion comprising at least one search rule, and searching the file archive using the search criterion. The techniques also include generating a set of files filtered using the search criterion and performing an action specified in the action rules portion on a file included in the set of files.

Virtual client file systems and methods within remote sessions

A method is provided that includes establishing, by an application server, a remote access session with a client device, and creating, by a file system agent running on the application server, a metadata-only virtual file system associated with the remote access session, wherein the virtual file system only comprises file metadata associated with a plurality of files residing in a local file system of the client device. The method further includes responsive to receiving, by the virtual file system, a request to access content of a file referenced by the virtual file system, redirecting the request to a file system driver implementing at least a sub-tree of the local file system of the client device.