Patent classifications
G06F16/2468
METHOD AND SYSTEM FOR PROCESSING SUBPOENA DOCUMENTS
A method and a system for extracting information from a subpoena document are provided. The method includes: receiving a subpoena document; extracting raw text included in the subpoena document; identifying, based on the extracted raw text, entities that are named in the subpoena document; determining, based on the extracted raw text, first information that relates to a scope period, a law enforcement agency, and/or an investigative agent associated with the subpoena document; retrieving second information that relates to the identified entities from a customer database; and outputting a subset of the determined first information and a subset of the obtained second information. The method may also include using a weighted fuzzy name match algorithm to match the identified entities with the second information.
SYSTEMS AND METHODS FOR EXECUTING QUERIES ON A BITMAP INDEX
Systems and methods for executing queries on a bitmap index are disclosed. The system may receive a first data stream from a database and generate a bitmap index based on the first data stream. The system may receive an input selection of one or more data conditions from a user device and generate a Boolean expression based on the input selection. The system may query the bitmap index using the Boolean expression and generate a bitmap vector. The system may output a first data subset represented by the generated bitmap vector to a graphical user interface. The bitmap index may include probabilistic entries, and the system may validate the probabilistic entries by receiving a second data stream, identifying one or more entries correlated to the probabilistic entries, determining a divergence between the identified entries and the probabilistic entries, and updating parameters of a classifier model associated with the probabilistic entries.
Algorithm for the Non-exact Matching of Large Datasets
A two-step algorithm for conducting near real-time fuzzy searches of a target on one or more large data sets is described. This algorithm includes the simplification of the data by removing grammatical constructs to bring the target search term (and the stored database) to their base elements and then performing a Levenstein comparison to create a subset of the data set that may be a match. Then performing a scoring algorithm while comparing the target to the subset of the data set to identify any matches.
Search and ranking of records across different databases
A search system performs a federated search across multiple databases and generates a ranked combined list of found genealogical records. The system receives a user query with one or more specified characteristics. The system may determine expanded characteristics derived from the specified characteristics. The system searches the various databases with the characteristics retrieving records according to the characteristics. The system combines the retrieved records and ranks them using a machine learning model. The machine learning model is configured to assign a weight to the records returned from each of the genealogical databases based on the characteristics specified in the user query. The machine learning model may be trained by any combination of one or more of: a Nelder-Mead method, a coordinate ascent method, and a simulated annealing method. The ranked combined results are provided in response to the user query.
Technologies for tuning performance and/or accuracy of similarity search using stochastic associative memories
Technologies for tuning performance and/or accuracy of similarity search using stochastic associative memories (SAM). Under a first subsampling approach, columns associated with set bits in a search key comprising a binary bit vector are subsampled. Matching set bits for the subsampled columns are aggregated on a row-wise basis to generate similarity scores, which are then ranked. A similar scheme is applied for all the columns with set bits in the search key and the results for top ranked rows are compared to evaluate a tradeoff between throughput boost versus lost accuracy. A second approach called continuous column read, and iterative approach is employed that continuously scores the rows as each new column read is complete. The similarity scores for an N-1 and Nth-1 iteration are ranked, a rank correlation is calculated, and a determination is made to whether the rank correlation meets or exceeds a threshold.
Automatic Fuzz Testing Framework
Various aspects related to methods, systems, and computer readable media for automatic fuzz testing. An example method of automatic software fuzz testing can include, receiving a description of a target software application, determining, based on the description, a type of fuzzing, identifying one or more fuzzers based on the type of fuzzing, executing the one or more fuzzers on the target software application, extracting prioritized results of the executing of the one or more fuzzers, and, presenting the prioritized results.
INCREMENTAL UPDATES OF CONFLATED DATA RECORDS
According to examples, an apparatus may include a processor and a memory on which are stored machine-readable instructions that when executed by the processor, may cause the processor to receive an updated data record from a data source and may determine a first conflated data record. The first conflated data record may be associated with the updated data record and include data records in a first grouping from among a plurality of data sources. The processor may identify the data records included in the first conflated data record and may generate a second conflated data record that updates conflations among the updated data record and the identified data records. The second conflated data record may include a second grouping of data records. The processor may replace the first conflated data record with the second conflated data record to incrementally update a set of conflated data records.
Fuzzy data operations
A method for clustering data elements stored in a data storage system includes reading data elements from the data storage system. Clusters of data elements are formed with each data element being a member of at least one cluster. At least one data element is associated with two or more clusters. Membership of the data element belonging to respective ones of the two or more clusters is represented by a measure of ambiguity. Information is stored in the data storage system to represent the formed clusters.
Fuzzy logic modeling for detection and presentment of anomalous messaging
Disclosed is an approach that applies a fuzzy logic model that may involve fuzzy-matching a plurality of address fields to determine a common physical address, and determining a number of communiques directed to that address with reference to a threshold that may determine an excessive number of communiques. The plurality of address fields may also be fuzzy-matched to information in a fraud-risk database which may comprise a fraud-risk address. One or more matches may be presented to a user who may adjust the views of the various matches, track various trends within the data, and harmonize the various address fields relating to a physical address.
Method and Apparatus for Asset Management
Various embodiments of the teachings herein include a method for asset management. The method includes: acquiring an asset management request including a list of asset information of assets to be managed; searching the acquired asset information in an asset information database; performing rule matching based on a rule library using the asset information, if the acquired asset information is not found in the asset information database; performing fuzzy matching using the asset information, if the rule matching is unsuccessful; and inserting assets obtained in the fuzzy matching into an asset database, if the fuzzy matching is successful.