G06F16/325

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.

SYSTEM AND METHOD FOR FINGERPRINTING-BASED CONVERSATION THREADING
20180011880 · 2018-01-11 ·

Systems, methods, and computer readable media for staging a corpus of electronic communication documents for analysis, such as, for example, via a content analysis platform. The staging may include a staging platform accessing the corpus of electronic communication document. For each electronic communication document within the corpus, the staging platform may generate a fingerprint based upon the output of a hash function executed upon a set of characteristics corresponding to each segment within the electronic communication document. The staging platform may analyze the generated fingerprints to generated a plurality of threaded conversations that do not include electronic communication documents that fail to convey any new information. The systems and methods may also include detecting and flagging any segments within an electronic communication document that may have been mutated by its author.

KEY PACKING FOR FLASH KEY VALUE STORE OPERATIONS

A key value (KV) store, a method thereof, and a storage system are provided herein. The KV store may include a key logger; and a processor configured to receive a first command for storing a first KV in the KV store, write a first value of the first KV to a first NAND page, generate an extent map for identifying the first memory page including the first value, write the extent map to a second memory page, append an entry for storing the first KV to the key logger, and update a device hashmap of the KV store to include a first key of the first KV, upon a threshold being met within the key logger.

EFFICIENT SEARCH FOR COMBINATIONS OF MATCHING ENTITIES GIVEN CONSTRAINTS

Methods, systems, and computer-readable storage media for receiving a set of inference results generated by a ML model, the inference results including a set of query entities and a set of target entities, each query entity having one or more target entities matched thereto by the ML model, processing the set of inference results to generate a set of matched sub-sets of target entities by executing a search over target entities in the set of target entities based on constraints, for each problem in a set of problems, providing the problem as a tuple including an index value representative of a target entity in the set of target entities and a value associated with the query entity, the value including a constraint relative to the query entity, and executing at least one task in response to one or more matched sub-sets in the set of matched sub-sets.

Search system for providing search results using query understanding and semantic binary signatures
11698921 · 2023-07-11 · ·

Technology for the improved processing of search queries is provided. In one embodiment, methods may return semantically relevant search results for a search query. During a pre-computing offline processing, an inventory semantic index may be generated and may include inventory binary hashing signatures that are associated with inventory listings, such as goods or services for sell, and the index may be partitioned by categories and shards. When a search query is received, relevant categories are determined using a relevant category recognition service, and a search query binary hashing signature maybe generated for the search query. The relevant categories are searched to determine hamming distances between the inventory binary hashing signatures and the search query binary hashing signature, where the hamming distance indicates semantic relevance.

Method and apparatus for determining evidence authenticity based on blockchain ledger
11551319 · 2023-01-10 · ·

Disclosed are a method and apparatus for determining evidence authenticity based on a blockchain ledger. The method includes: identifying target electronic evidence, providing a relatively high authenticity reference score for the target electronic evidence in response to that it is determined that the target electronic evidence is stored by at least one candidate blockchain ledger platform, and providing a relatively low authenticity reference score for the target electronic evidence in response to that it is determined that the target electronic evidence is not stored by at least one candidate blockchain ledger platform. If the target electronic evidence corresponds to a relatively high authenticity reference score, it indicates that the identified target electronic evidence has a relatively high degree of authenticity (possibility of being authentic) and a relatively low possibility of being tampered with.

Systems and methods for indexing geological features

Systems and methods for indexing geological features are disclosed. In one embodiment, a method for indexing geological features includes accessing a database storing a plurality of map objects that originate from documents. Each map object includes a map defined by a geographical boundary and a text caption. The method includes, for each map object, determining a plurality of geohashes within the geographical boundary, and includes, for each map object, comparing terms of the text caption with a list of geological keywords. For each map object, the method includes identifying one or more geological noun phrases within the text caption that match one or more geological noun phrases of the list. The method includes determining, for each geological noun phrase, one or more geohashes associated with the geological noun phrase and, for each geohash, determining a frequency that the geohash is associated with the geological noun phrase.

Multi-layered key-value storage

Systems and methods for multi-layered key-value storage are described. For example, methods may include receiving two or more put requests that each include a respective primary key and a corresponding respective value; storing the two or more put requests in a buffer in a first datastore; determining whether the buffer is storing put requests that collectively exceed a threshold; responsive to the determination that the threshold has been exceeded, transmitting a write request to a second datastore, including a subsidiary key and a corresponding data file that includes the respective values of the two or more put requests at respective offsets in the data file; for the two or more put requests, storing respective entries in an index in the first datastore that associate the respective primary keys with the subsidiary key and the respective offsets; and deleting the two or more put requests from the buffer.

Data Loss Prevention via dual mode Indexed Document Matching
20230037489 · 2023-02-09 ·

Cloud-based data loss prevention (DLP) systems and methods include monitoring a file to be checked for sensitive data from a user associated with a tenant; obtaining one or more dictionaries for the tenant; identifying a DLP match based on any of identifying exact document matches between the file and files in the one or more dictionaries, identifying same text in the file as in an indexed document in the one or more dictionaries, identifying content in the file that contains a subset of text in an indexed document in the one or more dictionaries, and identifying content that is similar but not exact as the text in an indexed document in the one or more dictionaries; and, responsive to the DLP match, blocking the file in the cloud-based system.

Determining Similar Loan Documents
20230100396 · 2023-03-30 · ·

The system prepares PDF documents to be digitally populated or signed. The method may comprise converting a document into an image; detecting words on the document; searching the words for keywords; searching for an object on the document; determining an object field based on the keywords and the object; creating a tag with metadata about the object field; and associating the tag with the object field. The method may also comprise determining, by a processor, metadata about a document; creating, by the processor, a hash from the metadata; storing, by the processor, an association of the hash, the metadata and the document in a knowledge database; creating, by the processor, a new hash for a new document; comparing, by the processor, the hash with the new hash; and determining, by the processor, that the new document has similar characteristics as the document based on the comparing.