G06F16/3346

Searching for ideograms in an online social network

In one embodiment, a method includes receiving a first query comprising one or more n-grams inputted by the first user at the client system, accessing by the client system a plurality of ideograms each being associated with one or more tags matching at least one of the n-grams of the first query, where the one or more ideograms are ranked in an order based on their respective user-probabilities each being calculated based at least in part on a frequency of use associated with the respective ideogram, presenting one or more of the ideograms at the client system in ranked order where each ideogram is selectable by the first user, and receiving a selection of one of the presented ideograms at the client system.

System and method for free space estimation

A system and method for estimating free space and assigning free space probabilities in point cloud data associated with an autonomous vehicle traveling on a surface, including taking into account sensor noise, sensor availability, obstacle heights, and distance of obstacles from the sensor. System and method can include determining surface planes and classifying point cloud points according to whether or not the points fall on surface planes, among other factors.

MACHINE LEARNING METHODS AND SYSTEMS FOR PROTECTION AND REDACTION OF PRIVACY INFORMATION
20220027508 · 2022-01-27 ·

Methods, systems and computer-program products are directed to a Privacy Engine for evaluating initial electronic documents to identify document content categories for portions of content within the electronic documents, with respect to extracted document structures and document positions, that may include privacy information for possible redaction via visual modification. The Privacy Engine builds a content profile based on detecting information at respective portions of electronic document content that indicate one or more pre-defined categories and/or sub-categories. For each respective portion of electronic document content, the Privacy Engine applies a machine learning model that corresponds with the indicated category (or categories and sub-categories) to determine a probability value of whether the respective portion of content includes data considered likely to be privacy information. The Privacy Engine recreates the one or more initial electronic documents according to one or more privacy information redactions at respective locations of the portions of content.

PERFORMING CROSS-DATASET FIELD INTEGRATION
20210357442 · 2021-11-18 ·

There is a need for more effective and efficient cross-dataset field integration. In one example, a method comprises determining a primary integration feature vector for a primary dataset field; for each secondary dataset field of a plurality of secondary dataset fields, determining a secondary integration feature vector; determining, based at least in part on the primary integration feature vector and each secondary integration feature vector, an integration space; determining, for each secondary dataset field of the plurality of secondary dataset fields and based at least in part on the integration space, a distance measure between the primary dataset field and the secondary dataset field; determining, based at least in part on each distance measure between the primary dataset field and a secondary dataset field of the plurality of secondary dataset fields, a predefined number of the plurality of secondary dataset fields; and performing the cross-dataset field integration based at least in part on the a predefined number of the plurality of secondary dataset fields.

SEARCHING ELECTRONIC DOCUMENTS BASED ON EXAMPLE-BASED SEARCH QUERY
20210357455 · 2021-11-18 ·

A computer implemented method for searching electronic documents, and associated system and computer program product. The method includes receiving an input representing an example-based search query and processing the input. The method also includes determining, for each of the electronic documents, a relevance score between the input and the corresponding electronic document. The determination is based on, at least, textual similarity between the input and the corresponding electronic document, topical similarity between the input and the corresponding electronic document, as well as linkage relationship in a linkage network of the plurality of electronic documents. The method also includes determining, based on the determined relevance scores, a search result containing one or more of the electronic documents. The search results will be provided to the user.

Chat delta prediction and cognitive opportunity system

A thread delta prediction and cognitive conversation opportunity generation. A prediction model may be generated based on chat space conversation data, and predicts a chat thread delta associated with a current chat conversation data. A data package may be created based on the chat thread delta. Based on the data package, a conversation opportunity may be determined for a given topic for given user in the chat space, to conduct during a time period of the chat thread delta in real-time.

Text entity extraction method for extracting text from target text based on combination probabilities of segmentation combination of text entities in the target text, apparatus, and device, and storage medium

A text entity extraction method, apparatus, and storage medium are provided. The method includes determining candidate text entities in a target text. Portions of the candidate text entities are combined to generate candidate segmentation combinations corresponding to the target text, the candidate text entities in each candidate segmentation combination being different. A combination probability corresponding to each candidate segmentation combination is calculated, where the combination probability is a probability that grammar is correct when the target text uses the candidate segmentation combination. A target segmentation combination corresponding to the target text is determined according to the combination probabilities. A text entity is extracted from the target text according to the target segmentation combination.

DATA RECOVERY THROUGH REVERSAL OF HASH VALUES USING PROBABILISTIC DATA STRUCTURES
20220004621 · 2022-01-06 · ·

Systems and methods for recovering passwords from a hash value input are provided. A password space may be segmented into password sets, and a digest set may be generated for each password set. Probabilistic data structures representing the digest sets may be generated. One of the probabilistic data structures may be queried with the hash value input to determine whether the hash value input is likely included in the digest sets. In response to the hash value input being determined to be likely included in the digest set, the passwords constituting the password set corresponding to the digest set may be regenerated, and the hash values constituting the digest set may be regenerated. The generated hash values may be compared to the hash value input to determine a hash value from the digest set that matches the hash value input to recover the password associated with the matched hash value.

DOCUMENT SEARCH METHOD, DOCUMENT SEARCH SYSTEM, PROGRAM, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
20220004570 · 2022-01-06 ·

A similar document is retrieved in units of blocks of a document. Highly accurate document search is performed. A specific text block is searched for in a plurality of text blocks created by dividing each of a plurality of search target documents. A first search text block, which is a part of a search document, is prepared; full-text search is performed by using at least some of the plurality of text blocks as a first target and using the first search text block as a search criterion to calculate first relevance of each text block included in the first target to the first search text block; a second target is determined from the first target depending on a level of the first relevance; first similarities of each sentence included in the first search text block to sentences included in the second target are calculated; and at least one text block similar to the first search text block is retrieved using the first similarities.

Method and apparatus to more quickly classify additional text entries

A control circuit obtains and utilizes a first set of rules that determines a distance, as a function of a modified Jaccard distance, between each of a number of predetermined classes. The control circuit utilizes that information to generate groups of the predefined classes by evaluating the distance between each of the predetermined classes. First probabilities for each of the additional text entries regarding membership in each of the groups so generated is calculated along with second probabilities regarding membership in a particular one of the predefined classes in its corresponding group. A resultant probability is then calculated using the first and second probabilities that a given additional text entry belongs to a particular one of the predefined classes. The control circuit then utilizes that resultant probability information to automatically classify additional text entries by associating the given additional text entry to a particular one of the predefined classes.