G06F16/156

Smart near-real-time folder scan based on a breadth first search

In response to a folder event received for a first folder, a first work item is dequeued from an ID queue and metadata of the first folder, and immediate children of the first folder, is fetched and enqueued as work items in a metadata queue. If further first folder children remain to be scanned, the first work item is updated with child IDs for each immediate child of the first folder that is a folder, and it is inserted into the ID queue. In a second pass, a child ID is dequeued and metadata of immediate children of the folder associated with the child ID is fetched and enqueued as work items in the metadata queue. The second pass is repeated for all child IDs in the updated work item. This process is repeated for each generation of children of the first folder or until a specified limit is met.

Unified data processing across streaming and indexed data sets

Systems and methods are described for unified processing of indexed and streaming data. A system enables users to query indexed data or specify processing pipelines to be applied to streaming data. In some instances, a user may specify a query intended to be run against indexed data, but may specify criteria that includes not-yet-indexed data (e.g., a future time frame). The system may convert the query into a data processing pipeline applied to not-yet-indexed data, thus increasing the efficiency of the system. Similarly, in some instances, a user may specify a data processing pipeline to be applied to a data stream, but specify criteria including data items outside the data stream. For example, a user may wish to apply the pipeline retroactively, to data items that have already exited the data stream. The system can convert the pipeline into a query against indexed data to satisfy the users processing requirements.

Interactive visual data preparation service

Techniques for visual data preparation are described. An interactive visual data preparation service provides a user with a graphical user interface that presents values from a sample taken of a dataset along with statistical information associated with those values. A user uses the graphical user interface to test out various transformations to the sample dataset by applying transformations and viewing near-immediate results of those transformations as applied to the sample. The desired set of transformations is represented as a recipe object, which can be used to perform data preparation against the overall dataset or other datasets on behalf of the user or other users.

DOCUMENT SEARCH SYSTEM, DOCUMENT SEARCH METHOD, AND COMPUTER-READABLE STORAGE MEDIUM
20220350777 · 2022-11-03 · ·

A document search system allowing a user to easily and intuitively designate a search condition including a feature amount of a document is provided. The document search system searches for at least one document stored in a file server by referring to at least one index including a feature amount relating to at least one object included in each of the at least one document stored in the file server. The document search system searches for the document matched with the search condition from among the at least one document stored in the file server by referring to the search condition including disposition information about at least one symbol on the virtual page and the at least one index.

SYSTEM AND METHOD FOR MANAGING A PLURALITY OF DATA STORAGE DEVICES
20220342849 · 2022-10-27 ·

A system and method for managing files on multiple storage devices, such as USB sticks. The system includes a hub that has multiple input ports for multiple storage devices, wherein has a unique code visually presented next to the respective port. The system can assign a barcode label to each storage device which can be printed and pasted on the respective memory device. The system further scans the files on each of the multiple storage devices to generate a master index based on the unique identification code for each storage device.

Apparatuses, computer-implemented methods, and computer program products for improved file scanning and remediation in data systems

Embodiments of the present disclosure enable improved methodologies of scanning large file repositories and managing target files identified from such scanning efficiently and effectively. Embodiments of the present disclosure scan any number of file repositories of a data system to identify particular target files that satisfy scan criteria, and process the target files identified therefrom. The target files may be processed to identify file owner data and utilize the file owner data for any of a myriad of purposes, for example to provide scan alert(s) corresponding to the target files to such users. Any of a number of file remediation actions may be performed based on the scan results, for example by the users receiving scan alert(s) and/or automatically in the embodiments described.

Project generating system and method thereof

A project generating method comprises extracting a keyword from a plurality of text files with a specified category, determining whether the keyword is a theme, extracting a geographical name from the text files corresponding to the theme, determining whether to keep the theme according to an internet volume of the theme, filtering a plurality of review files from a review website according to the geographical name, calculating a first ratio therein to determining whether to keep the theme, and generating a project including the geographical name and the theme serving as a recommendation row.

CONSENT DATA PIPELINE ARCHITECTURE AND OPERATION

The disclosure herein describes processing consent data and using the processed consent data in workflows. Customer consent data is accessed, wherein the customer consent data includes subject consent instances including associated consent purpose-value pairs. The customer consent data is mapped to a raw consent data schema based on mapping selections made on a mapping UI, wherein the mapping includes mapping consent purpose-value pairs of the consent instances to data columns of the raw consent data schema. Metadata representing one or more consent rules related to the raw consent data schema is generated based on rule selections made on a rule configuration UI and the consent rules are applied to one or more workflows. The disclosure enables consent data in different formats and/or from different sources to be ingested and standardized in a single platform such that consent checking functionality can be provided for applications in a consistent and comprehensive manner.

SYSTEM AND METHOD FOR UTILIZING SEARCH TREES AND TAGGING DATA ITEMS FOR DATA COLLECTION MANAGING TASKS
20230071438 · 2023-03-09 ·

Described are a method and system for presenting a search tree session of a storage tree associated with a collection of data items to a user. First, a search query is received at a computer configured to be able to access the storage tree. The search query includes one or more terms or non-term conditions for searching through the storage tree. A search tree is created based on the search query that includes one or more search tree nodes. One or more grouping search tree nodes are then created, each including a search tree node, a node presentation parameter corresponding to how the node will be displayed and a search query parameter corresponding to the search query. A search tree representation incorporating at least one of the grouping search tree nodes is then created for display to a user and output to the display.

Real-time document filtering systems and methods

Methods, systems and computer program products for organizing and displaying in real-time data related to a plurality of documents. A plurality of documents and a plurality of entity identifiers are stored in a relational database storage. Each entity identifier has an entity type selected from a plurality of entity types. A plurality of entity associations between the plurality of entity identifiers and the plurality of documents are stored in a non-relational database storage. Each entity association defines a relationship between one or more entity identifiers and a selected document in the plurality of documents. A plurality of file icons are displayed in a display interface. The plurality of file icons includes active icons corresponding to a selection of the plurality of documents where the selection is determined by querying the non-relational database storage using at least one currently-selected entity identifier.