Patent classifications
G06F16/31
Machine-learning model for resource assessments
A centralized system may collect and aggregate assessments from multiple websites. An aggregate score may be calculated for the resource that cumulatively considers assessments from a plurality of different websites from which assessments are received from users. Text descriptions associated with each of the assessments may be provided to a machine-learning system that uses a trained model to assign identifiers to the assessments as they are received. These identifiers may include common words or text that are descriptive of different facets of user experiences related to receiving and using the resource. After selecting one or more identifiers, assessments associated with that identifier may be included or excluded from the display. Additionally, the overall aggregate score for the resource may be recalculated by removing components of that score that are based on assessments with identifiers that have been selected for exclusion.
LDAP query optimization with smart index selection
The present disclosure relates generally to Lightweight Directory Access Protocol (LDAP), and more particularly, to techniques for improving query performance on an LDAP server. One particular technique includes receiving a LDAP query having search criteria, identifying one or more search filters within the search criteria, determining candidate indices based on the identified one or more search filters, evaluating the candidate indices based on statistics collected for the candidate indices, selecting one or more indices from the candidate indices based on the evaluating, and executing the LDAP query on an LDAP directory using the selected one or more indices.
Identifying similar documents in a file repository using unique document signatures
Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.
Contextual span framework
A phrase that includes a trigger word that modifies a meaning within the phrase is received. The trigger word is identified. The words of the phrase that are modified by the trigger word are identified by analyzing features of the phrase that link the trigger word to other words. The phrase is interpreted by modifying the second subset of words according to the modification of the trigger word.
Method for automatically indexing an electronic document
Generating unique document identifiers from content within a selected page region is disclosed. A selection of a first region within a first page of the documents is received from a user, and is defined by a set of first boundaries relative to the first page. A text string of a first base selection page content within the first region is retrieved from the first page. Then the retrieved text string is assigned to a page location index associated with the first page. A text string of a first replicated selection page content is retrieved from a second page. The first replicated selection page content is included in the same first region defined by the set of first boundaries relative to the second page. The retrieved text string of the first replicated selection page content is assigned to a page location index of the second page.
Enhanced natural language query segment tagging
Computer-implemented techniques for enhanced tagging of natural language queries that are initially segmented and tagged by a named entity recognition system. By doing so, enhanced tagging of a natural language query that represents a deeper understanding of the query is provided. The enhanced tagging improves the operation of search engines that use the enhanced tags by enabling the search engine to identify and return more relevant search results in answers to natural language queries.
DNA alignment using a hierarchical inverted index table
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.
PARALLELIZATION OF ELECTRONIC DISCOVERY DOCUMENT INDEXING
A system and method for parallelizing document indexing in a data processing system. The data processing system includes a primary processor for receiving a list of data having embedded data associated therewith, at least one secondary processor to process the data as provided by the primary processor, a data processor to determine a characteristic of the embedded data and process the embedded data based upon the characteristic, and a messaging module to exchange at least one status message between the primary processor and the at least one secondary processor.
Method and system for generating conversation summary
Methods and systems for generating and using a conversation summary model. The method comprises receiving at least one training dataset. The at least one training dataset comprises data samples, each data sample comprising a text comprising text segments. The text is labelled with a conversation summary comprising any of the text segments which summarize the text. The at least one training dataset includes a dataset from a specific source. Using the at least one training dataset and the pre-trained model, the method further comprises generating the conversation summary model by fine-tuning the pre-trained model. The generated conversation summary model may be used to generate conversation summaries for chat conversations.
Look-up table initialize
A digital data processor includes an instruction memory storing instructions specifying a data processing operation and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and to an instruction decoder to perform a data processing operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the data processing operation. The operational unit is configured to perform a table write in response to a look up table initialization instruction by duplicating at least one data element from a source data register to create duplicated data elements, and writing the duplicated data elements to a specified location in a specified number of at least one table and a corresponding location in at least one other table.