G06F40/242

SYSTEMS AND METHODS FOR DYNAMICALLY REMOVING TEXT FROM DOCUMENTS

Disclosed are techniques for building a dynamic dictionary and using the dictionary to remove phrases or words appearing in and out of context in a document. The techniques include, for example, receiving electronic health record (EHR) data, determining, using natural language processing (NLP), an instance of a personal health information (PHI) phrase in the EHR data based on a NLP system confidence metric being above a threshold, determining another instance of the PHI phrase in the EHR data that does not have the same context as the first context, removing the instances of the PHI phrase from the EHR data to produce cleaned EHR data, and taking an action based on the cleaned EHR data. The confidence metric can indicate likelihood that the PHI phrase is a PHI phrase and the metric can be based at least in part on a first context of the PHI phrase.

SYSTEMS AND METHODS FOR DYNAMICALLY REMOVING TEXT FROM DOCUMENTS

Disclosed are techniques for building a dynamic dictionary and using the dictionary to remove phrases or words appearing in and out of context in a document. The techniques include, for example, receiving electronic health record (EHR) data, determining, using natural language processing (NLP), an instance of a personal health information (PHI) phrase in the EHR data based on a NLP system confidence metric being above a threshold, determining another instance of the PHI phrase in the EHR data that does not have the same context as the first context, removing the instances of the PHI phrase from the EHR data to produce cleaned EHR data, and taking an action based on the cleaned EHR data. The confidence metric can indicate likelihood that the PHI phrase is a PHI phrase and the metric can be based at least in part on a first context of the PHI phrase.

DATA PROCESSING METHOD AND APPARATUS
20230048031 · 2023-02-16 ·

Relating to the field of artificial intelligence, and specifically relating to the field of natural language processing, a data processing method includes and an apparatus performs: determining original text samples, where masking processing is not performed on the original text samples; and performing mask processing on the original text samples to obtain mask training samples, where the mask processing makes mask proportions of the mask training samples unfixed, and the mask training samples each are used to train a pretrained language model PLM. Training the PLM by using the mask training samples whose mask proportions are unfixed can enhance mode diversity of the training samples of the PLM. Therefore, features learned by the PLM are also diversified, a generalization capability of the PLM can be improved, and a natural language understanding capability of the PLM obtained through training can be improved.

Predictive time series data object machine learning system

Provided is a method including obtaining a first data object including a first set of data entries, wherein each data entry of the first set of data entries includes text content associated with a time entry. The method includes generating a first data object score using the text content and the time entries included in the first set of data entries and using scoring parameters, determine that the first data object score satisfies a data object score condition; perform in response to the first data object score satisfying the data object score condition, a condition-specific action associated with the data object score condition.

Tracking specialized concepts, topics, and activities in conversations

Embodiments are directed to organizing conversation information. A tracker vocabulary may be provided to a universal model to predict a generalized vocabulary associated with the tracker vocabulary. A tracker model may be generated based on the portions of the universal model activated by the tracker vocabulary such that a remainder of the universal model may be excluded from the tracker model. Portions of a conversation stream may be provided to the tracker model. A match score may be generated based on the track model and the portions of the conversation stream such that the match score predicts if the portions of the conversation stream may be in the generalized vocabulary predicted for the tracker vocabulary. Tracker metrics may be collected based on the portions of the conversation and the match scores such that the tracker metrics may be included in reports or notifications.

Systems and methods for detecting documentation drop-offs in clinical documentation

In clinical documentation, mere documentation of a condition in a patient's records may not be enough. To be considered sufficiently documented, the patient's record needs to show that no documentation drop-offs (DDOs) have occurred over the course of the patient's stay. However, DDOs can be extremely difficult to detect. To solve this problem, the invention trains time-sensitive deep learning (DL) models on a per condition basis using actual and/or synthetic patient data. Utilizing an ontology, grouped concepts can be generated on the fly from real-time hospital data and used to generate time-series data that can then be analyzed by trained time-sensitive DL models to determine whether a DDO for a condition has occurred during the stay. Non-time-sensitive models can be used to detect all the conditions documented during the stay. Outcomes from the models can be compared to determine whether to notify a user that a DDO has occurred.

Systems and methods for detecting documentation drop-offs in clinical documentation

In clinical documentation, mere documentation of a condition in a patient's records may not be enough. To be considered sufficiently documented, the patient's record needs to show that no documentation drop-offs (DDOs) have occurred over the course of the patient's stay. However, DDOs can be extremely difficult to detect. To solve this problem, the invention trains time-sensitive deep learning (DL) models on a per condition basis using actual and/or synthetic patient data. Utilizing an ontology, grouped concepts can be generated on the fly from real-time hospital data and used to generate time-series data that can then be analyzed by trained time-sensitive DL models to determine whether a DDO for a condition has occurred during the stay. Non-time-sensitive models can be used to detect all the conditions documented during the stay. Outcomes from the models can be compared to determine whether to notify a user that a DDO has occurred.

Automated malware analysis that automatically clusters sandbox reports of similar malware samples

A system and a method for automatically clustering sandbox analysis reports of similar malware samples. An automated malware analysis process includes receiving from a sandbox server the sandbox analysis reports of the similar malware samples at an application programming interface (API) of the clustering server, clustering similar Uniform Resource Locators (URLs) together and clustering the sandbox analysis reports of events in sandbox reports clusters (1-n) based on the URL clustering, static properties of the malware samples and dynamic properties of the malware samples.

Automated malware analysis that automatically clusters sandbox reports of similar malware samples

A system and a method for automatically clustering sandbox analysis reports of similar malware samples. An automated malware analysis process includes receiving from a sandbox server the sandbox analysis reports of the similar malware samples at an application programming interface (API) of the clustering server, clustering similar Uniform Resource Locators (URLs) together and clustering the sandbox analysis reports of events in sandbox reports clusters (1-n) based on the URL clustering, static properties of the malware samples and dynamic properties of the malware samples.

Corpus cleaning method and corpus entry system
11580299 · 2023-02-14 · ·

The present disclosure provides a corpus cleaning method and a corpus entry system. The method includes: obtaining an input utterance; generating a predicted value of an information amount of each word in the input utterance according to the context of the input utterance using a pre-trained general model; and determining redundant words according to the predicted value of the information amount of each word, and determining whether to remove the redundant words from the input utterance. In such a manner, the objectivity and accuracy of corpus cleaning can be improved.