Patent classifications
G06F16/2237
Encoding and decoding tree data structures as vector data structures
Systems, computer-implemented methods, and computer program products that can facilitate encoding a tree data structure into a vector based on a set of constraints are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a constraint former that can form a set of constraints based on a first tree data structure and a vector encoder that can encode the first tree data structure into a vector based on the set of constraints.
AGGREGATING METRICS IN DISTRIBUTED FILE SYSTEMS
Embodiments are directed to managing file systems over a network. A hierarchical index may be provided based on a file system and a plurality of objects stored in the file system A token index may be generated based on the hierarchical index. Each token may be a portion of the path of the objects Metric indices may be generated based on the hierarchical index and a plurality of metrics associated with the objects such that the metrics indices include one or more rows that corresponds to a place position for a metric value. Employing the token index and the metric indices to generate query results based on the plurality of metrics associated with the objects.
Technologies for tuning performance and/or accuracy of similarity search using stochastic associative memories
Technologies for tuning performance and/or accuracy of similarity search using stochastic associative memories (SAM). Under a first subsampling approach, columns associated with set bits in a search key comprising a binary bit vector are subsampled. Matching set bits for the subsampled columns are aggregated on a row-wise basis to generate similarity scores, which are then ranked. A similar scheme is applied for all the columns with set bits in the search key and the results for top ranked rows are compared to evaluate a tradeoff between throughput boost versus lost accuracy. A second approach called continuous column read, and iterative approach is employed that continuously scores the rows as each new column read is complete. The similarity scores for an N-1 and Nth-1 iteration are ranked, a rank correlation is calculated, and a determination is made to whether the rank correlation meets or exceeds a threshold.
REDUCING MATCHING DOCUMENTS FOR A SEARCH QUERY
The technology described herein provides for identifying matching documents for a search query using a bit vector search index. When a search query is received, a term is identified from the search index, and a number of bit vectors corresponding to the term are identified. Each bit vector comprises an array of bits in which at least one bit in each bit vector indicates that a corresponding document includes the term. Each bit vector also includes other bits indicating other documents include other terms. A determination is made that an unacceptable number of possible matching documents is likely to be returned. In response to the determination, a strengthening row bit vector is selected to reduce the number of possible matching documents. The identified bit vectors and the selected strengthening row are intersected to identify matching documents that contain the term.
Set operations using multi-core processing unit
Performing set operations using sparse matrix operations offered by a multi-core processing unit (such as a graphics processing unit). The set operation is converted into operand matrices, and sparse matrix operations, foregoing the use of hash tables. The input set is converted into a matrix, a matrix operation corresponding to the set operation is identified, and one or more operands of the set operation are also represented within a matrix. The matrix operation is then performed on these matrices to obtain an output matrix, which is then converted to an output set.
Systems and methods for removing identifiable information
Systems and methods for censoring text characters in text-based data are provided. In some embodiments, an artificial intelligence system may be configured to receive text-based data and store the text-based data in a database. The artificial intelligence system may be configured to receive a list of target pattern types identifying sensitive data and receive censorship rules for the target pattern types determining target pattern types requiring censorship. The artificial intelligence system may be configured to assemble a computer-based model related to a received target pattern type in the list of target pattern types. The artificial intelligence system may be configured to use a computer-based model to identify a target data pattern corresponding to the received target pattern type within the text-based data, identify target characters within the target data pattern, and to assign an identification token to the target characters.
In-document search method and device for query
The present invention relates to an in-document search method and device for a query vector, and an object of the present invention is to improve the accuracy of a response by generating sentence data corresponding to data in a table form stored in database. The in-document search method for a query vector includes a step A of receiving a user query from a user terminal, a step B of generating a user query vector for the user query, a step C of extracting candidate table data based on the user query vector in a data storage module, a step D of searching for a response corresponding to the user query vector in the candidate table data, and a step E of providing the response to the user terminal.
APPARATUSES, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR GENERATING AND PRESENTING COMPONENT SCORE INTERFACES WITHIN A MULTI-COMPONENT SYSTEM
Methods, apparatuses, or computer program products provide for providing a component score interface within a multi-component system. A component metadata vector associated with a first component identifier may be retrieved from a component metadata repository. One or more component metadata values are extracted from the component metadata vector based at least in part on a first component score type. A first component score may be generated based at least in part on the one or more component metadata values extracted from the component metadata vector. Additionally, the first component score generation may comprise applying a unique weight value to each component metadata value in accordance with the first component score type. A component score interface comprising instructions for rendering a first component score interface element representing the first component score may be generated. Furthermore, the component score interface may be transmitted to a first client device.
Automatically suggesting completions of text
A user may respond to a request of another user by entering text, such as a customer service representative responding to a customer. As the user enters text, a suggested completion of the text may be suggested to the user so that the user may select the suggested completion instead of continuing to enter text. Previous messages between the two users and other information may be used to determine an appropriate suggested completion to the entered text. A neural network language model and a search graph may be used to select a suggested completion from a search graph of possible suggested completions.
Database key identification
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for database key identification. One of the methods includes receiving an identification of a first field in a first data set, the first data set including records. The method includes identifying a set of values, the set including, for each record, a value associated with the field. The method includes generating a filter mask based on the set of values, where application of the filter mask is capable of determining that a given value is not in the set of values. The method includes receiving a second data set including a second field, the second data set including records. The method includes determining a count of a number of records in the second data set having a value associated with the second field that passes the filter mask. The method also includes storing the count in a profile.