G06F16/35

Generating and visualizing bias scores representing bias in digital segments within segment-generation-user interfaces
11556567 · 2023-01-17 · ·

This disclosure relates to methods, non-transitory computer readable media, and systems that generate and visualize bias scores within segment-generation-user interfaces prior to executing proposed actions with regard to target segments. For example, the disclosed systems can generate a bias score indicating a measure of bias for a characteristic within a segment of users selected for a proposed action and visualize the bias score and corresponding characteristic in a segment-generation-user interface. In some implementations, the disclosed systems can further integrate detecting and visualizing bias as a bias score with selectable options for a segmentation-bias system to generate and modify segments of users to reduce detected bias.

Analyzing documents using machine learning

A document analysis device that includes a memory operable to store a machine learning model configured to receive a sentence as an input and to output a classification identifier that is associated with a sentence type for the received sentence. The device further includes an artificial intelligence (AI) processing engine configured to receive a document comprising text, to sentences within the document, and to classify the sentences using the machine learning model. The AI processing engine is further configured to identify tagging rules for the document and to annotate one or more sentences from the document with a sentence type that matches a sentence type that is identified by the tagging rules for the document.

Descriptor uniqueness for entity clustering

A mechanism is provided in a data processing system to implement a cognitive natural language processing (NLP) system with descriptor uniqueness identification to support named entity mention clustering. The mechanism annotates a set of documents from a corpus of documents for entity types and mentions, collects descriptor usages from all documents in the corpus of documents, analyzes the descriptor usages to classify the descriptors as base terms or modifier terms, generates compatibility scores for the descriptors, and performs entity merging of entity clusters based on the compatibility scores.

Machine-learning model for resource assessments
11550836 · 2023-01-10 · ·

A centralized system may collect and aggregate assessments from multiple websites. An aggregate score may be calculated for the resource that cumulatively considers assessments from a plurality of different websites from which assessments are received from users. Text descriptions associated with each of the assessments may be provided to a machine-learning system that uses a trained model to assign identifiers to the assessments as they are received. These identifiers may include common words or text that are descriptive of different facets of user experiences related to receiving and using the resource. After selecting one or more identifiers, assessments associated with that identifier may be included or excluded from the display. Additionally, the overall aggregate score for the resource may be recalculated by removing components of that score that are based on assessments with identifiers that have been selected for exclusion.

Machine-learning model for resource assessments
11550836 · 2023-01-10 · ·

A centralized system may collect and aggregate assessments from multiple websites. An aggregate score may be calculated for the resource that cumulatively considers assessments from a plurality of different websites from which assessments are received from users. Text descriptions associated with each of the assessments may be provided to a machine-learning system that uses a trained model to assign identifiers to the assessments as they are received. These identifiers may include common words or text that are descriptive of different facets of user experiences related to receiving and using the resource. After selecting one or more identifiers, assessments associated with that identifier may be included or excluded from the display. Additionally, the overall aggregate score for the resource may be recalculated by removing components of that score that are based on assessments with identifiers that have been selected for exclusion.

SYSTEM AND METHOD FOR RECOMMENDATION OF PRODUCTS AND SERVICES
20230043314 · 2023-02-09 ·

This invention pertains to a system and method for a recommendation engine that helps customers find the answers and services. The engine understands the context of what the user is looking at and recommends products and services to meet the needs of the user. The engine is operated through a search index that gives access to more data and better scalability to provide answers and optimal solutions to the user. The recommendation engine functions as a search tool, whereby the user can search by fields, including title, abstract, claims, inventor, specification or assignee. The recommendation engine can also recommend products and services provided by platform partners or third party providers, and implement a tracking system for referrals to third party providers for revenue generation.

STRING ENTROPY IN A DATA PIPELINE
20230040648 · 2023-02-09 ·

Various embodiments comprise systems and methods to determine entropy in strings generated by a data pipeline. In some examples, data monitoring circuitry monitors a data pipeline that ingests input data, processes the input data, and responsively generates and transfers a data string that comprises character groups. The data monitoring circuitry receives the data string, identifies character groups in the data string, identifies group types for the character groups, and assigns numbers to the character groups based on the group types. The data monitoring circuitry determines a probability distribution for the numbers, calculates entropy for the data string based on probability distribution, and generates an entropy histogram based on the entropy. The data monitoring circuitry compares the entropy histogram of the data string to another entropy histogram for another data string, determines a change in entropy, and reports the change in entropy.

STRING ENTROPY IN A DATA PIPELINE
20230040648 · 2023-02-09 ·

Various embodiments comprise systems and methods to determine entropy in strings generated by a data pipeline. In some examples, data monitoring circuitry monitors a data pipeline that ingests input data, processes the input data, and responsively generates and transfers a data string that comprises character groups. The data monitoring circuitry receives the data string, identifies character groups in the data string, identifies group types for the character groups, and assigns numbers to the character groups based on the group types. The data monitoring circuitry determines a probability distribution for the numbers, calculates entropy for the data string based on probability distribution, and generates an entropy histogram based on the entropy. The data monitoring circuitry compares the entropy histogram of the data string to another entropy histogram for another data string, determines a change in entropy, and reports the change in entropy.

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.

Identifying similar documents in a file repository using unique document signatures
11593439 · 2023-02-28 · ·

Methods, systems, and non-transitory computer readable storage media are disclosed for determining clusters of similar digital documents using unique document signatures. Specifically, the disclosed system processes digital text in a digital document to tokenize character strings (e.g., words) in the digital document by combining a subset of character values and string lengths in the character strings. Additionally, the disclosed system generates a document signature for the digital document by combining subsets of tokens generated for the digital document into a token sequence indicative of the digital text in the digital document. The disclosed system determines a cluster of similar digital documents including the digital document by comparing the document signature of the digital document to document signatures corresponding to a plurality of digital documents.