G06F16/334

SELF-SUPERVISED DOCUMENT REPRESENTATION LEARNING

One example method involves operations for a processing device that include receiving, by a machine learning model trained to generate a search result, a search query for a text input. The machine learning model is trained by receiving pre-training data that includes multiple documents. Pre-training the machine learning model by generating, using an encoder, feature embeddings for each of the documents included in the pre-training data. The feature embeddings are generated by applying a masking function to visual and textual features in the documents. Training the machine learning model also includes generating, using the feature embeddings, output features for the documents by concatenating the feature embeddings and applying a non-linear mapping to the feature embeddings. Training the machine learning model further includes applying a linear classifier to the output features. Additionally, operations include generating, for display, a search result using the machine learning model based on the input.

Distributed data acquisition, indexing and search system
11513846 · 2022-11-29 · ·

A scheduler manages execution of a plurality of data-collection jobs, assigns individual jobs to specific forwarders in a set of forwarders, and generates and transmits tokens (e.g., pairs of data—collection tasks and target sources) to assigned forwarders. The forwarder uses the tokens, along with stored information applicable across jobs, to collect data from the target source and forward it onto an indexer for processing. For example, the indexer can then break a data stream into discrete events, extract a timestamp from each event and index (e.g., store) the event based on the timestamp. The scheduler can monitor forwarders' job performance, such that it can use the performance to influence subsequent job assignments. Thus, data-collection jobs can be efficiently assigned to and executed by a group of forwarders, where the group can potentially be diverse and dynamic in size.

Generating statistics associated with unique field values

Embodiments are directed towards real time display of event records and extracted values based on at least one extraction rule, such as a regular expression. A user interface may be employed to enable a user to have an extraction rule automatically generate and/or to manually enter an extraction rule. The user may be enabled to manually edit a previously provided extraction rule, which may result in real time display of updated extracted values. The extraction rule may be utilized to extract values from each of a plurality of records, including event records of unstructured machine data. Statistics may be determined for each unique extracted value, and may be displayed to the user in real time. The user interface may also enable the user to select at least one unique extracted value to display those event records that include an extracted value that matches the selected value.

Natural language processing for entity resolution

An apparatus includes a data access circuit that interprets data records, each having a number of data fields, a record parsing circuit that determines a number of n-grams from terms of each of the data records and maps the number of n-grams to a corresponding number of mathematical vectors, and a record association circuit that determines whether a similarity value between a first mathematical vector for the first data record and a second mathematical vector for the second data record is greater than a threshold similarity value, and associates the first and second data records in response to the similarity value exceeding the threshold similarity value. An example apparatus includes a reporting circuit that provides a catalog entity identifier, associates each of the first term and the second term to the catalog entity identifier, and provides a summary of activity for an entity.

Generating, using a machine learning model, request agnostic interaction scores for electronic communications, and utilization of same

Training and/or utilizing a machine learning model to generate request agnostic predicted interaction scores for electronic communications, and to utilization of request agnostic predicted interaction scores in determining whether, and/or how, to provide corresponding electronic communications to a client device in response to a request. A request agnostic predicted interaction score for an electronic communication provides an indication of quality of the communication, and is generated independent of corresponding request(s) for which it is utilized. In many implementations, a request agnostic predicted interaction score for an electronic communication is generated “offline” relative to corresponding request(s) for which it is utilized, and is pre-indexed with (or otherwise assigned to) the electronic communication. This enables fast and efficient retrieval, and utilization, of the request agnostic interaction score by computing device(s), when the electronic communication is responsive to a request.

Search and navigation of hidden elements of a web page

A system, method and computer program product for a search tool is provided. A user-input search term is received in a search tool for finding instances of the search term in a document currently presented to the user. An instance of the search term in a hidden content element of the document is identified. A content element is determined for exposing the hidden content element. The determined content element is presented to the user and highlighted, to prompt user interaction therewith so as to expose the hidden content element.

System and method for update of data and meta data via an enumerator

A data storage system includes storage and a global enumerator. The storage stores data chunks, object level metadata associated with portions of the data chunks, and chunk level metadata associated with respective data chunks. The global enumerator obtains an update request including a metadata characteristic and update data; in response to obtaining the update request: matches the metadata characteristic to at least one selected from a group consisting of a portion of the object level metadata and a portion of the chunk level metadata to identify an implicated metadata portion; and modifies, based on the update data, the implicated metadata portion.

APPARATUS AND METHOD FOR MACHINE-LEARNING-BASED POSITIONING DATABASE CREATION AND POSITIONING OF UNCOLLECTED POINTS USING MATCHING FEATURE WITH WIRELESS COMMUNICATION INFRASTRUCTURE

Disclosed herein are an apparatus and method for positioning of uncollected points based on machine learning using matching wireless communication infrastructure points. The apparatus includes memory in which at least one program according to an embodiment is recorded and a processor for executing the program. The program may compare collected data acquired from wireless communication infrastructure with positioning data measured by a positioning target terminal and thereby extract matching feature points; create a fingerprint database of global grid cells, including uncollected points, for the extracted feature points in real time; and estimate the optimal composite location of the positioning target terminal based on the created fingerprint database.

Assigning documents to entities of a database

In an approach, a processor groups documents into a plurality of groups based on similarity, where: documents of each group have a same document structure; and the document structure is defined by coordinates of text blocks. A processor, for each group of the plurality of groups and for each document of the respective group: retrieves a value of each text block of the respective document in accordance with a document structure of the group; and assigns to each text block of the respective document an attribute that represents the retrieved value of the text block. A processor assigns a first document of the documents to an entity of a database that matches the first document based on the group of text block values and the assigned attributes of the document.

RETRIEVAL OF UNSTRUCTURED DATA IN DPP INFORMATION ACCESS
20230054316 · 2023-02-23 ·

Systems, methods, and computer-readable media are disclosed for the centralized retrieval of personal data about a data subject across a plurality of applications. The data subject may request the retrieval of personal data from a company. To retrieve the personal data, a data model may be created for each application having personal data about the data subject. Each application may store personal data in the form of attachments. The data model may be in tabular form and store virtual representations of the attachments. Metadata for the attachments may be retrieved using the virtual representations of the attachments. The attachment metadata may then be used to retrieve the attachments. The attachments may then be provided to the data subject for download. The personal data may be provided to the data subject in both machine-readable and human-readable form to comply with data privacy regulations.