G06F40/205

Extracting structured data from weblogs
11556598 · 2023-01-17 · ·

Methods and apparatus for extracting structured data from weblogs are disclosed. In some examples, the methods and apparatus include a web crawler to access a home page of a weblog, and identify a feed associated with the weblog. The methods and apparatus also include a feed finder to determine whether items in the feed contain sufficient content for feed-guided segmentation. The methods and apparatus also include a feed classifier to determine whether the items in the feed contain full content of the weblog. The methods and apparatus also include a wrapper to map data found in the feed into a representation of a weblog post, and screen scrape the weblog into the representation of the weblog post.

Text autocomplete using punctuation marks

A dataset comprising text-based messages can be accessed. Tokens for words and punctuation marks contained in the text-based messages can be generated. Each token corresponds to one word or one punctuation mark. A vector representation for each of a plurality of the tokens can be generated using natural language processing. A sequence of tokens corresponding to the text-based message can be generated for each of a plurality of the text-based messages in the dataset. Ones of the tokens that represent punctuation marks can be identified. An artificial neural network can be trained to predict use of the punctuation marks in sentence structures. The training uses the generated sequence of tokens and the vector representations for the tokens, in the sequence of tokens, that represent the punctuation marks.

Text autocomplete using punctuation marks

A dataset comprising text-based messages can be accessed. Tokens for words and punctuation marks contained in the text-based messages can be generated. Each token corresponds to one word or one punctuation mark. A vector representation for each of a plurality of the tokens can be generated using natural language processing. A sequence of tokens corresponding to the text-based message can be generated for each of a plurality of the text-based messages in the dataset. Ones of the tokens that represent punctuation marks can be identified. An artificial neural network can be trained to predict use of the punctuation marks in sentence structures. The training uses the generated sequence of tokens and the vector representations for the tokens, in the sequence of tokens, that represent the punctuation marks.

System and method for performing a meaning search using a natural language understanding (NLU) framework

The present disclosure is directed to an agent automation framework that is capable of extracting meaning from user utterances and suitably responding using a search-based natural language understanding (NLU) framework. The NLU framework includes a meaning extraction subsystem capable of detecting multiple alternative meaning representations for a given natural language utterance. Furthermore, the NLU framework includes a meaning search subsystem that enables elastic confidence thresholds (e.g., elastic beam-width meaning searches), forced diversity, and cognitive construction grammar (CCG)-based predictive scoring functions to provide an efficient and effective meaning search. As such, the disclosed meaning extraction subsystem and meaning search subsystem improve the performance, the domain specificity, the inference quality, and/or the efficiency of the NLU framework.

System and method for performing a meaning search using a natural language understanding (NLU) framework

The present disclosure is directed to an agent automation framework that is capable of extracting meaning from user utterances and suitably responding using a search-based natural language understanding (NLU) framework. The NLU framework includes a meaning extraction subsystem capable of detecting multiple alternative meaning representations for a given natural language utterance. Furthermore, the NLU framework includes a meaning search subsystem that enables elastic confidence thresholds (e.g., elastic beam-width meaning searches), forced diversity, and cognitive construction grammar (CCG)-based predictive scoring functions to provide an efficient and effective meaning search. As such, the disclosed meaning extraction subsystem and meaning search subsystem improve the performance, the domain specificity, the inference quality, and/or the efficiency of the NLU framework.

VOICE INFORMATION PROCESSING METHOD AND ELECTRONIC DEVICE
20230010969 · 2023-01-12 ·

A voice information processing method and an electronic device are provided. The voice information processing method may include: a first device (1100) obtains first voice information, and when the first voice information includes a wakeup keyword, the first device (1100) sends a voice assistant wakeup instruction to a second device (1200), such that the second device (1200) launches a voice assistant; then the first device (1100) obtains second voice information and sends the second voice information to the second device (1200), the second device (1200) determines a voice triggered event corresponding to the second voice information by using the voice assistant, and feeds target information associated with performance of the voice triggered event back to the first device (1100), such that the first device (1100) performs the voice triggered event based on the target information. The method can reduce the computing burden of the first device (1100).

VOICE INFORMATION PROCESSING METHOD AND ELECTRONIC DEVICE
20230010969 · 2023-01-12 ·

A voice information processing method and an electronic device are provided. The voice information processing method may include: a first device (1100) obtains first voice information, and when the first voice information includes a wakeup keyword, the first device (1100) sends a voice assistant wakeup instruction to a second device (1200), such that the second device (1200) launches a voice assistant; then the first device (1100) obtains second voice information and sends the second voice information to the second device (1200), the second device (1200) determines a voice triggered event corresponding to the second voice information by using the voice assistant, and feeds target information associated with performance of the voice triggered event back to the first device (1100), such that the first device (1100) performs the voice triggered event based on the target information. The method can reduce the computing burden of the first device (1100).

Structured adversarial, training for natural language machine learning tasks

A method includes obtaining first training data having multiple first linguistic samples. The method also includes generating second training data using the first training data and multiple symmetries. The symmetries identify how to modify the first linguistic samples while maintaining structural invariants within the first linguistic samples, and the second training data has multiple second linguistic samples. The method further includes training a machine learning model using at least the second training data. At least some of the second linguistic samples in the second training data are selected during the training based on a likelihood of being misclassified by the machine learning model.

Structured adversarial, training for natural language machine learning tasks

A method includes obtaining first training data having multiple first linguistic samples. The method also includes generating second training data using the first training data and multiple symmetries. The symmetries identify how to modify the first linguistic samples while maintaining structural invariants within the first linguistic samples, and the second training data has multiple second linguistic samples. The method further includes training a machine learning model using at least the second training data. At least some of the second linguistic samples in the second training data are selected during the training based on a likelihood of being misclassified by the machine learning model.

SYSTEMS AND PROCESSES OF POSITION FULFILLMENT

The present disclosure relates generally to systems and processes for position fulfillment and, more particularly, to systems and methods of identifying and matching human resources to an open employment position within an organization. The method includes: obtaining, by a computer system, one or more profiles from one or more data sources; analyzing, by the computer system, the one or more profiles to parse attributes and find similarities and/or recurring occurrences in the parsed attributes; normalizing the parsed attributes based on the at least one similarities and recurring occurrences; and matching the normalized attributes to attributes of an open position.