G06F16/2468

PROBABILISTIC INDICES FOR ACCESSING AUTHORING STREAMS
20220318294 · 2022-10-06 ·

In various implementations, streams are sent and received by a computing device engaged in an authoring session with respect to an electronic document. The computing device stores the streams in multiple container files associated with an instance of the electronic document open on the computing device. The device maintains an indices file to reflect a presence of the streams in the container files, such that fast access can be provided to the streams at a later time. The indices file comprises multiple probabilistic data structures corresponding to the container files that each indicate on a probabilistic basis whether a given stream is present in a corresponding one of the container files. The computing device uses the indices file to retrieve the streams from the container files.

Efficient similarity search
11645292 · 2023-05-09 · ·

A system for measuring similarity between a binary query vector and a plurality of binary candidate vectors includes a storage unit and a processor. The storage unit stores the binary query vector and the plurality of candidate vectors, and the processor performs Tanimoto calculations in terms of Hamming distances. The processor includes a Tanimoto to Hamming threshold converter, a Hamming measurer, and a Hamming comparator. The Tanimoto to Hamming threshold converter converts a Tanimoto threshold into a Hamming threshold. The Hamming measurer measures the Hamming distances between the candidate vectors and the query vector. The Hamming comparator selects candidate vectors whose Hamming distance from the query vector is less than or equal to the Hamming threshold.

Query execution utilizing probabilistic indexing
11645273 · 2023-05-09 · ·

A method for execution by at least one processor of a database system includes indexing a first column via a probabilistic indexing scheme. An IO pipeline that includes a probabilistic index-based IO construct for access of the first column is determined based on a query including a query predicate indicating the first column. The probabilistic index-based IO construct is applied in conjunction with execution of the query via the IO pipeline by applying an index element of the probabilistic index-based IO construct to identify a first subset of rows based on index data of the probabilistic indexing scheme for the first column. A filter element of the probabilistic index-based IO construct is applied to identify ones of a first subset of the plurality of column values corresponding to the first subset of rows that compare favorably to the query predicate.

RELATING COLLECTIONS IN AN ITEM UNIVERSE

Disclosed are various embodiments for identifying related collections of items within an item universe. Related collections of items can be identified based upon title similarity or a degree of overlap between collections of items. Additionally, relationships between collections of items can be generated if the collections have identical or nearly identical collection titles.

Intelligent Cascading Linkage Machine for Fuzzy Matching in Complex Computing Networks

This disclosure is directed to an intelligent cascading linkage machine for transforming input signals into comparable signals, and cascading through matching operations, including but not limited to a fuzzy matching comparison technique, to link transformed input signals (comparable signals) to those stored signals in a database which match it. The fuzzy matching technique may use a random forest processing technique and/or a logistic regression technique. Also, the machine is able to calibrate its matching technique, based on the linking of a comparable or input signal to a stored signal in a database, in order to calculate an accuracy indicator.

PERSONAL DATA ASSOCIATION METHOD
20230205790 · 2023-06-29 ·

A computer-implemented method of identifying an individual independently of the individual’s personally identifying information includes providing independent data stores for elements of personal identifying information for a population and fuzzy searching the data stores independently for the elements. Each data store associates each element value and its known variations with a unique static code. The search returns the unique static code associated with each of the elements found and a new independent code is generated if no code is found. The returned codes are concatenated to form a person code. The person codes link information to produce a relationship between disparate data without a master database of people and PII.

Data retrieving apparatus, method, and program

A data search apparatus according to an embodiment includes: an input unit; and a storage apparatus configured to store master data names managed with master data. The data search apparatus calculates edit distances between master data names stored in the storage apparatus and input data names input in the input unit, calculates degrees of similarity between the master data names and the input data names based on term frequency and inverse document frequency of the master data names and the input data names, performs processing for narrowing down candidates for the data name being searched for in the master data names based on the calculation results and adjacency information indicating adjacency relationships between the master data names and the input data names, and outputs information indicating correspondence between the master data names and the input data names based on the candidate for the data name being searched for, the candidate for the data name being obtained through the narrowing-down processing.

Standardization of addresses and location information

Computer program products, methods, systems, apparatus, and computing entities are provided for standardizing addresses and providing information associated with geographic areas/points of interest. For example, location data can be collected for serviceable points. From the collected location data, addresses can be standardized, location-based searches can be performed, correct locations of serviceable points can be confirmed, and geographic representations can be generated.

Algorithm for the non-exact matching of large datasets

A two-step algorithm for conducting near real-time fuzzy searches of a target on one or more large data sets is described. This algorithm includes the simplification of the data by removing grammatical constructs to bring the target search term (and the stored database) to their base elements and then performing a Levenstein comparison to create a subset of the data set that may be a match. Then performing a scoring algorithm while comparing the target to the subset of the data set to identify any matches.

Probabilistic indices for accessing authoring streams

In various implementations, streams are sent and received by a computing device engaged in an authoring session with respect to an electronic document. The computing device stores the streams in multiple container files associated with an instance of the electronic document open on the computing device. The device maintains an indices file to reflect a presence of the streams in the container files, such that fast access can be provided to the streams at a later time. The indices file comprises multiple probabilistic data structures corresponding to the container files that each indicate on a probabilistic basis whether a given stream is present in a corresponding one of the container files. The computing device uses the indices file to retrieve the streams from the container files.