EXPERT BOARD CASE SELECTION SYSTEM AND METHOD
20220399130 · 2022-12-15
Inventors
Cpc classification
G16H80/00
PHYSICS
G16H20/40
PHYSICS
G16H50/70
PHYSICS
G16H50/20
PHYSICS
G16H10/60
PHYSICS
International classification
Abstract
The present embodiments relate to generation of a docket of feature groups based on a plurality of identified features from a dataset. For instance, the docket can be used by a board of experts to efficiently review and provide insights to a larger number of cases. Data relating to each of the plurality of scenarios can be processed to extract textual features for each of the plurality of scenarios. Feature vectors can be populated with textual features from the plurality of scenarios, and feature groups can be derived by a docket generation model. The docket generation model can generate a docket comprising a listing of the set of feature groups. The listing of the set of feature groups can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups.
Claims
1. A method comprising: receiving, at a computer system, data relating to a plurality of scenarios, wherein each of the scenarios comprise one or more corresponding supporting files; processing the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios; initializing, for each of the plurality of scenarios, a feature vector comprising a plurality of null values; populating values of each feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios; processing, by a docket generation model, each of the feature vectors to: identify a set of feature groups, each feature group comprising a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios; and generate a docket comprising a listing of the set of feature groups, the listing arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups; and transmitting the docket to a plurality of client devices.
2. The method of claim 1, wherein processing the data relating to each of the plurality of scenarios to extract the set of textual features for each of the plurality of scenarios further comprises converting each supporting file into unstructured text, wherein the processing is performed on the unstructured text of each supporting file.
3. The method of claim 2, wherein processing the data relating to each of the scenarios on the unstructured text of each supporting file further comprises using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
4. The method of claim 1, wherein each textual feature is populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file.
5. The method of claim 1, further comprising: obtaining a response for each of the set of feature groups listed in the docket from any of the client devices.
6. The method of claim 5, further comprising: training the docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors.
7. The method of claim 6, further comprising: processing each textual feature part of each of the feature groups using the feature hierarchy to generate a cumulative feature score of each feature group, wherein the arrangement of the feature groups is based at least on the cumulative feature score of each feature group.
8. The method of claim 1, wherein the docket generation model includes a random forest model.
9. The method of claim 1, wherein the scenarios relate to medical cases, and the supporting files comprise medical documents.
10. The method of claim 1, wherein the scenarios comprise automatically-generated tickets relating to a datacenter, and wherein the supporting files comprise data relating to the automatically-generated ticket.
11. The method of claim 1, wherein the listing of feature groups are presented as sentences that include the features common to the feature groups.
12. A system comprising: a processor; and a computer-readable medium comprising instructions that, when executed by the processor, cause the processor to: receive data relating to a plurality of scenarios, wherein each of the scenarios comprise one or more corresponding supporting files; process the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios, wherein processing the data further comprises converting each supporting file into unstructured text; populate, for each of the plurality of scenarios, values of a corresponding feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios, wherein each textual feature is populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file; process each of the feature vectors to: identify a set of feature groups, each feature group comprising a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios; and generate a docket comprising a listing of the set of feature groups, the listing arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups; and transmit the docket to a plurality of client devices.
13. The system of claim 12, wherein processing the data relating to each of the scenarios on the unstructured text of each supporting file further comprises using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
14. The system of claim 12, wherein the instructions further cause the processor to: process each textual feature part of each of the feature groups using a feature hierarchy to generate a cumulative feature score of each feature group, wherein the arrangement of the feature groups is based at least on the cumulative feature score of each feature group.
15. The system of claim 12, wherein the instructions further cause the processor to: obtain a response for each of the set of feature groups listed in the docket from any of the client devices.
16. The system of claim 15, wherein the instructions further cause the processor to: train a docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors.
17. A computer-implemented method comprising: receiving at data relating to a plurality of scenarios, wherein each of the scenarios comprise one or more corresponding supporting files; processing the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios; populating, for each of the plurality of scenarios, values of a corresponding feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios; processing, by a docket generation model, each of the feature vectors to: identify a set of feature groups, each feature group comprising a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios; and generate a docket comprising a listing of the set of feature groups, the listing arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups; transmitting the docket to a plurality of client devices; obtaining a response for each of the set of feature groups listed in the docket from any of the client devices; and training the docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors, wherein the feature hierarchy is configured to be used in generating the arrangement of the listing of the set of feature groups of the docket.
18. The computer-implemented of claim 17, wherein processing the data relating to each of the plurality of scenarios to extract the set of textual features for each of the plurality of scenarios further comprises converting each supporting file into unstructured text, wherein the processing is performed on the unstructured text of each supporting file.
19. The computer-implemented of claim 18, wherein processing the data relating to each of the scenarios on the unstructured text of each supporting file further comprises using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
20. The computer-implemented method of claim 17, wherein each textual feature is populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
DETAILED DESCRIPTION OF ONE OR MORE EMBODIMENTS
[0047] A computing device or a connected computer network can obtain and maintain large datasets. For example, the dataset can provide medical data relating to a series of subjects, such as cancer patients. The data included in the datasets can include various types of data, which can be processed to derive insights into the dataset. For example, in a dataset providing medical records for cancer patients, the dataset can be processed to identify features common to groups of patients for further processing.
[0048] In many cases, a group of experts can be organized to review and discuss various topics. In one illustrative example, a group of medical doctors in a medical specialty can discuss cases of a specific type (e.g., a type of cancer) and potential treatment options for such cases. Such discussions across the group of experts can be referred to as an “expert board” or a “tumor board.” Further, in this example, a problem for these groups of experts is to find a time-efficient way for the experts to discuss and consult on a large number of related scenarios quickly and thoroughly.
[0049] As another example, dataset can relate to tickets specifying various aspects for a datacenter. For example, in responsive to an inability to access a component/application in the datacenter or if network metrics (e.g., data throughput) exceed corresponding thresholds, a ticket can be generated. A ticket can specify various metadata, such as a timestamp, a specific application, module, computer, VM, etc., related to the ticket, and a nature of the ticket (e.g., network-related issue, down server, unable to access an application). In a large datacenter or a cloud computing network, a large volume of tickets can be generated in a specific time duration. Tickets can be generated by computing systems monitoring network metrics or by users interacting with the datacenter. Insights derived from the volume of tickets can include defective applications, code, nodes/devices, etc. Further, the insights can provide details relating to attempted cyber-attacks, such as malware embedded in the datacenter or a distributed denial of service (DDOS) attack at one or more nodes in the datacenter.
[0050] In such examples, a group of experts can be convened to specify actions with respect to various aspects of the datasets. For example, a group of experts can interact on client devices over a network (e.g., the Internet) and discuss groups of scenarios included in the dataset (e.g., relating to cancer patients, relating to tickets generated in a datacenter). Further, the experts can retrieve data (e.g., documents, tables) part of the dataset from a computing node to derive further insights into aspects of the dataset.
[0051] However, in many instances, processing the volume of data in the datasets can be resource-intensive and time-intensive. In such meetings, experts may have to manually process data in the datasets to discuss various topics. For example, experts may have to review numerous documents, records, etc., for multiple scenarios to gather information for a topic. As another example, experts may manually review each generated ticket for a datacenter (e.g., chronologically) and attempt to identify aspects of the tickets. This process can comprise an inefficient use of computing/computing network resources (e.g., by retrieving multiple pieces of data from the dataset). Further, such processes can miss identifying critical features in the dataset. For example, reviewing a small portion of tickets in a datacenter may not identify an attempted cyber attack (e.g., DDOS attack) to a datacenter.
[0052] The present embodiments relate to processing datasets to automatically generate dockets comprising a set of topics derived from the datasets. For example, the dataset comprising data relating to a plurality of scenarios can be processed by a computer system to derive textual features for the plurality of scenarios. The textual features can represent various aspects relating to each scenario, such as conditions, values, etc., identified from medical records relating to a patient. The set of textual features can be populated into a corresponding feature vector. Textual feature can be added in the feature vector according to a type of data for each textual feature.
[0053] A docket generation model can further process the feature vectors to derive a docket comprising identified feature groups. For instance, a set of feature vectors can be processed to identify a set of feature groups. Each feature group can include a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios. For example, a feature group can include a group of scenarios with a common condition (e.g., a medical condition), a common value, a common identified computing node in a datacenter, etc. The docket generation model can further generate a docket comprising a listing of the set of feature groups. The listing of the set of feature groups can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups. For instance, the docket generation model can generate a hierarchical structure of feature groups structured based on various aspects of the feature groups, such as a number of scenarios in a feature group. The generated docket can include a listing of topics based on the feature groups. The docket can be transmitted to a plurality of client devices (e.g., devices associated with a plurality of experts). In response, various actions or responses to the topics in the docket can be obtained. Further, in some instances, the docket generation model can be trained using the responses/actions for the corresponding docket.
[0054] In some embodiments, an example of the generated docket can relate to determining topics of discussion of tumor cases by a medical tumor board. It will be appreciated, however, that the present embodiments are not limited to such examples. For instance, the present embodiments can be used to determine cases/scenarios/topics for discussion by any expert panel or group in any area of expertise. Furthermore, the present embodiments may be implemented using other technologies than those disclosed below and those changes are within the scope of the disclosure.
[0055] In some instances, the present embodiments can implement a bucketing schema that can be mutually exclusive and collectively exhaustive in such a way that 1) each submitted case can be classified into exactly one bucket and 2) each of the cases in the same bucket share enough similarities that they can be discussed in a scalable, parallel way. For example, in the oncology tumor board use case, suppose 35 independent tumors were submitted for review. Without any a prior knowledge of what is similar and different about the tumors and their treatment protocols, each one may need to be discussed fully and at length in order to completely characterize the recommended medical response for each tumor. However, if there were some way to know beforehand which tumor classes have very similar treatment regimens that deviate only slightly based on individual tumor characteristics, then the tumors could be grouped together and discussed much more efficiently. This is exactly what the bucketing subprocess in the intake process can accomplish, as discussed in more detail below. The intake process can also provide the ability to upload additional unstructured files that allows for the intake form to be as short as possible. Thus, rather than requiring users to fill out pages of tedious paperwork every time they submit a case for review, the system can request a minimal set of information needed to characterize the case for discussion/review. Then, the system can allow for the convenient submission of raw additional files that it can automatically ingest and extract additional key contextual details from those files. As the ability to integrate and ingest unstructured data improves over time, the intake form will get smaller and smaller.
[0056]
[0057] The front end 102 may further comprise a case submission client 102A, a discussion moderation client 102B and an expert client of each expert. Each of the case submission client 102A, the discussion moderation client 102B, and the expert client may be a computing device with a processor, memory, display and connectivity circuits that allow each client to couple to and communicate with the back end 104. For example, each computing device may be a smartphone, a laptop computer, a terminal, a tablet device, etc. that stores a plurality of lines of instructions that are executed by the processor that can cause the processor to be configured to perform the operations of each different client. For example, the case submission client 102A may allow each user of the system who has a scenario to submit to the expert panel to submit that scenario and any other files relevant to the submitted scenario. The user that submits a scenario may be an individual user, a representation of the individual user, a provider who is working with the individual user or providing advice to the individual user. The case submission client 102A may also present/display the results of the expert panel analysis of the scenario. Unlike other systems that may host a live audio and/or video calls about the results of an expert board's second opinion, the system may generate a video after the expert board meets, tag the video with metadata, transcribe the video, and generate summary reports, and the video and summary reports may be sent back to each case submission client 102A.
[0058] The discussion moderation client 102B may be used by an employee of the system (or a consultant) to manage the discussion process of one or more scenarios by the expert panel. Each expert client allows each expert in the expert panel to review each selected scenario and then provide their analysis for each scenario wherein the analysis varies depending on the type of scenario being submitted. For example, if each scenario is a patient cancer case, then the resultant analysis may be a recommended oncological treatment plan for that particular patient. As another example, if the scenario is an architectural plan for a new house for a user, the analysis may be one or more recommendations about that architectural plan. In all cases, the system provides more people with access to the expert panel and thus allows each user to benefit from the recommendations or second opinions of each of the experts on the expert panel.
[0059] In one embodiment, the backend 104 may further include a cloud web server 104A, business logic 104B, and cloud data storage 104C that are all implemented using cloud computing resources. These elements of the backend 104 implement processes to receive each scenario submission and one or more files, process each scenario submission and file, select the scenarios to be discussed by the expert panel, and manage the expert panel discussion as discussed below.
[0060] In one implementation of the system, each end user accesses the system's content and services via an Internet web application (an implementation of the client 102A) via mobile, tablet, or PC end points. This application hosts video, text, other multimedia content, as well as forms for digital case submission. The client 102A allows each user to submit cases for review through an online submission form that may consist of two parts: a structured intake form and unstructured additional file uploads. Each user submitting cases will fill out an intake form created by our team of field experts to quickly extract all of the key relevant details needed to assess and characterize the case. In addition, providers may upload as many additional files as they would like (.pdf, .docx, .jpeg, etc.) in any unstructured format. The information on the intake form can: 1) uniquely characterize the case into one of several “buckets” (designed through expert review for each type of domain (medical, architecture, etc.) to be a mutually exclusive and collectively exhaustive partition of the possibility space in the domain), 2) create a case record for storage in the cloud database, and 3) properly tag the case record with the bucket designation. In addition, various OCR, text analytics, and computer vision algorithms can try to pull various additional contextual pieces of information from the other submitted unstructured files to append to the case record in the database in the appropriate tables.
[0061] The system may have a number of experts who participate in the expert boards. Each expert may have a particular expertise and is chosen by the system to participate in certain expert boards based on the particular expertise of the expert and the topics being discussed by the expert board. In one embodiment, the system may tag each expert to one or more buckets in the area of expertise based on the expertise of the expert. Thus, the panel of experts may be selected by the system using the tags when a particular bucket in an area of expertise (such as HER2 breast cancer in a medical oncology expertise area) is being discussed by the expert panel.
[0062] The bucketing for each area of expertise (placing each scenario into a bucket) can create the collectively exhaustive partition of the possibility space for a particular area of expertise. An example of the bucketing model is shown in
[0063] After submission of the scenarios and bucketing of the scenarios, a discussion/analysis of the scenarios for a specific bucket may be scheduled for the panel of experts. The moderator will visit the web application (the discussion moderator client 102B) using their own administrative account with privileged access and perform a “docket generation” process that can pull all submitted scenarios from the selected bucket that have not yet been processed/discussed previously and organize them into a suggested list of questions/discussion points/expected discussion times for the moderator to use when leading the round table discussion. As a non-limiting example, the selected bucket for a medical tumor board may be patients who have breast cancer and the scenarios in the bucket are discussed by experts in the field of breast cancer who provide recommended oncological treatments for each patient scenario.
[0064] A roundtable discussion may involve the discussion of several buckets, so the full docket creation process may involve segmenting the conversation into buckets. Next, within each bucket, the process may suggest a high-level introductory question to begin the discussion that will apply to all cases within that bucket. For example, if the discussion is about breast cancer tumors and the bucket is HER2+ cancer, then the initial question may be something like “What is the state-of-the-art in the treatment of HER2+ breast tumors?”
[0065] The above process may then search through the unique additional contextual facts of the cases being discussed within that bucket to identify commonalities and distinction points to generate additional follow-up questions. For the previous example, the docket may then suggest “How does your recommendation change if a patient is diabetic?” or “What if the tumor does not respond to X?”. By identifying these commonalities, the moderator can effectively allow the experts at the table to discuss multiple cases simultaneously, significantly decreasing the amount of time needed to review all submitted cases. However, the selection process is designed such that it will guarantee that all of the relevant details from any given specific case in the bucket will be discussed at some point so that each patient's scenario is fully discussed by the expert panel.
[0066]
[0067] Once the scenario submission is completed, the method then can perform scenario processing (204). This process 204, like the submission process 202, is performed for each scenario that is being submitted. In one embodiment shown in
[0068] The second subprocess, like the above OCR process, is implemented by a plurality of lines of computer code/instructions executed by a processor on a computer of the back end 104 to configure the processor to perform a process. Using the business logic 104B of the backend 104, the second subprocess performs a feature extraction process in which one or more features of interest are extracted from the scenario data. The one or more extracted features of interest are stored in the storage 104C with the other scenario data. An example of this feature extraction subprocess for medical oncology data scenario submissions is shown in
[0069] Returning to
[0070] In some instances, the above method can scale to handle a large number of scenarios in each bucket without having to change/adjust the above process. For illustration, an example of the scalability for oncology is now provided, but this same scalability would exist for all other use cases. In traditional tumor boards, the session lasts approximately 1 hour and anywhere from 6-10 cases may be discussed. In many of the previous solutions discussed above, providers may discuss their cases with their second opinion provider for anywhere from 10 to 30 minutes. In general, various methods can allow for a discussion of anywhere between 2-10 cases per hour, and there is no clear opportunity for scaling. Assuming the best possible case scenario of 10 cases per hour, which is an average of 6 minutes per case. For 30 cases, discussions can last 180 minutes or 3 hours.
[0071] In the above disclosed method, if there is a bucket with 30 cases in it, the first 2 minutes of the round table discussion can be dedicated to discussing high-level treatment recommendations that are applicable to all 30 cases. Now, let's assume that in this bucket, there are four additional binary contextual variables that would affect the recommended treatment protocol (e.g., the presence of comorbidity X, whether the stage is higher or lower than 3, the presence of a specific mutation, etc.) These variables would have been identified by a similarity-difference process' analysis of the contextual information stored about each case in the bucket. There can be 2.sup.1′4=16 possible combinations of these binary variables, so at most 16 possible aberrations from the general recommended treatment need to be discussed. Assuming that it takes about 1 minute to discuss one of these 16 possibilities and we have to discuss all of them, all 30 possible scenarios can be discussed in 2+16=18 minutes. This is a 10× speed-up over the traditional approach—an entire order of magnitude. Whether these possibilities are discussed in a branching, tree-like fashion or each of the 16 possibilities is just discussed sequentially won't affect the length of the discussion; it will still be 2+16 minutes. Often, many of these 2{circumflex over ( )}4 combinations of binary variables won't be represented by any of the scenarios in the bucket, further decreasing the time taken to discuss all 30 cases. Thus, the disclosed method provides scalability and can perform the task more rapidly than many processes due to the technical processes described above that allow the backend 104 to achieve these benefits.
[0072] The similarity-difference process may receive as input the known variables in an area of expertise and those known variables may affect a recommendation/outcome of the expert panel. The similarity-difference process may identify scenarios that match the known variables and scenarios that do not match the known variables. In the medical oncology example, the similarity-difference process may take the known variables that may affect oncology treatment recommendations and scan the cases in a particular bucket to see which cases match (on those variables) and which ones are different. In a simple example, if the only two important variables are diabetes and allergies and if there are 4 cases in the bucket, the similarity-difference process determines how many cases exist of [non-diabetic, non-allergic], [diabetic, non-allergic], [non-diabetic, allergic], [diabetic, allergic] since these will be the four branches/topics that need to be discussed by the expert panel. However, it is possible that there are only cases in the [non-diabetic, allergic] and [non-diabetic, non-allergic] groups for a particular bucket and then the agenda generated for the particular bucket (and used by the moderator) will not ask about diabetes that simplifies the discussion by the expert panel. In many systems, the diabetes topic would first be discussed and then the expert panel would realize that there are no cases being discussed that have the diabetes variable.
[0073]
[0074] First, as shown in
[0075] Second, the non-PII information for each scenario may be used by a bucketing and case creation process 702. This process also may be implemented as a plurality of lines of computer code/instructions executed by a processor of a computer system of the back end 104 that configures the computer system or processor to perform the bucketing process. This process may also be called bucket tagging in which each scenario may be assigned a bucket, based on the information provided in the intake form, and thus all case data rows are tagged with a bucket ID to allow for fast generation of roundtable dockets as discussed below. Note that for each different use case, a different bucket schema will be needed. For example, in the oncology use case, the buckets differentiate different types of tumors as well as unique characteristics of each type of tumor. In a use case for a different area of expertise, the bucket schema may be different.
[0076] Third, the system and process 202 may perform an automatic feature extraction process 704 from the unstructured one or more files submitted with the scenario. This process also may be implemented as a plurality of lines of computer code/instructions executed by a processor of a computer system of the back end 104 that configures the computer system or processor to perform the feature extraction process. For each of the one or more file(s) submitted for a scenario, the system may store the raw file (with the necessary security levels) as well as an OCR-generated (process 302) version of the contents.
[0077] To take one example use case, 50 additional features may be identified that are potentially relevant to a discussion for an area of expertise but are not included in the intake form and, for each feature, this process may perform one of multiple text analytics/natural language processing (NLP) techniques to attempt to extract that feature from unstructured text wherein each feature is particular data that may be extracted from each scenario. An example of a feature that may be extracted using the above techniques in the medical area of expertise may be whether or not the patient was hospitalized. This feature is not explicitly on the intake form, but the system can analyze the one or more additional files submitted using the above techniques to find words/phrases/other indications that the patient was hospitalized. The system can then add another column/feature (hospitalized=true or false in this example) to the data. The one or more feature(s) discovered using the above techniques may be applicable to a single bucket of multiple buckets in the area of expertise. As will be apparent, the features for each different area of expertise may be very different so that hospitalization would not be a feature for the finance or architecture area of expertise. For example, in the architecture example, a feature may be the presence of a heat pump in a residence that again may not be explicitly in the intake form but can be discovered by the system.
[0078] The process 704 may use one or more simple regex keyword searches, complex language models based on deep neural networks, Bayesian topic models, FastText, lexical network analysis, Word2Vec, GloVe, etc. that together include neural networks, topic models, Bayesian models, graphical models, and other vector-space models. The one or more text analytics methods selected may be custom selected for each feature. In one embodiment, the text analytics method for each feature may be manually created to best capture that feature from the unstructured data of the one or more additional files. Then, for each submitted scenario, the process will initialize a feature vector (of length 50, in this example) of all NULL values. Next, we will loop over each submitted file, apply OCR to it, and then run each feature extraction algorithm over the text, and if we are able to extract information related to that feature, we will update the feature vector in the corresponding index. Otherwise, we will leave it NULL. We will end up with feature vector for each case that is a mixture of NULL values (features that were not detected) and feature values of various data types (e.g., dictionaries for family histories, Boolean values for the presence of comorbidities, etc.) These vectors will then be combined with the contextual data provided in the intake form to create entries in the Case Data tables.
[0079]
[0080]
[0081] To generate a docket, the process may begin by asking a general question soliciting recommendations for any case in bucket B (see HER2 overview and treatment question in
[0082] Using the threshold k, the method may then select the k features with the highest variable importance scores from the random forest model to construct a subvector fi for each case ci. The jth element of fi is the ith case's value for the jth most important feature from the random forest. Let f be the matrix with row i of f=fi. The method, based on the above, then generates topics for the features in all of the cases in the bucket (again see example in
[0083] After the panel has completed its recommendations for the cases in the bucket, the method may codify the expert recommendations as new wi vectors and retrain the random forest to generate a fresh set of “most important variables”. This training loop will continually improve the ability of the process to only ask questions about relevant features. For example, in the oncology use case, the system performs an analysis by taking all of the features from all of the cases that the system has already done to build a model to predict the recommended treatment(s). These recommended treatments may be as simple as a three element boolean vector such [radiation=false, chemo=true, surgery=false]. In building this model, the system may discover that some features are more predictive than others (i.e., some features have a larger impact on the treatment regimen) and these features are the key variables. Moving forward, the docket creation process knows that these are the key variables to discuss when creating topic agendas. Then, the system hosts more tumor boards and collects more data. With this additional data, the analysis above is updated potentially discovering more or better key variables.
[0084] Method for Docket Generation
[0085] As described above, the present embodiments relate to generation of a docket of feature groups based on a plurality of identified features from a dataset.
[0086] At 1202, the method can include receiving data relating to a plurality of scenarios. Each scenario can include a portion of data, such as data relating to a patient or a ticket generated for a datacenter, for example. Each of the scenarios can include one or more corresponding supporting files. Example supporting files can include medical records, metadata relating to a ticket, graph or chart data, etc. In some instances, the scenarios relate to medical cases, and the supporting files comprise medical documents. In other instances, the scenarios comprise automatically generated tickets relating to a datacenter, and the supporting files comprise data relating to the automatically-generated ticket.
[0087] At 1204, the method can include processing the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios. Each extracted textual feature can include a feature specific to a scenario. An example textual feature can include a condition (e.g., a medical condition) or measured value (e.g., a test result) relating to a first scenario (e.g., a patient). Another example textual feature can relate to an application or network device within a datacenter that corresponds to a given ticket.
[0088] In some instances, processing the data relating to each of the plurality of scenarios to extract the set of textual features for each of the plurality of scenarios further comprises converting each supporting file into unstructured text. The processing can be performed on the unstructured text of each supporting file. Further, processing the data relating to each of the scenarios on the unstructured text of each supporting file further can include using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
[0089] At 1206, the method can include initializing, for each of the plurality of scenarios, a feature vector comprising a plurality of null values. The null values can be replaced with values as relevant textual features are identified.
[0090] At 1208, the method can include populating values of each feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios. In some instances, a feature vector can include both null values and populated values. The portions of the feature vector comprising null values can include textual feature types not identified for a specific scenario.
[0091] In some instances, each textual feature is populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file. For example, a mapping can be generated specifying a feature type assigned to each value in the feature vector. As an example, a first value can be assigned a device ID specifying a device in a datacenter relating to a ticket.
[0092] At 1210, the method can include processing each of the feature vectors to identify a set of feature groups. A docket generation model can process the feature vectors as described herein. Each feature group can include a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios. A first example feature group can include a group of scenarios comprising a common medical condition across the group of scenarios. Another example feature group can include a group of tickets comprising a commonly identified network device that originated a ticket in a datacenter. The feature groups can be identified based on comparing feature vector values in the set of feature vectors for each scenario.
[0093] At 1212, the docket generation model can generate a docket comprising a listing of the set of feature groups. The listing of the set of feature groups can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups.
[0094] In some instances, processing each textual feature part of each of the feature groups can use a feature hierarchy to generate a cumulative feature score of each feature group. A feature hierarchy can include a multi-level hierarchy of features, with each feature assigned weights and/or values for use in identifying a priority of each feature group. In some instances, the feature hierarchy can utilize a tree structure, and the arrangement of feature groups on the docket can be based on a location of each feature group in the tree structure. The arrangement of the feature groups can be based at least on the cumulative feature score of each feature group. In some instances, the docket generation model includes a random forest model. In some instances, the listing of feature groups are presented as sentences that include the features common to the feature groups.
[0095] At 1214, the method can include transmitting the docket to a plurality of client devices. In some instances, the method can include obtaining a response for each of the set of feature groups listed in the docket from any of the client devices.
[0096] In some instances, the method can include training the docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors.
[0097] In another example embodiment, a system is provided. The system can include a processor and a computer-readable medium. The computer-readable medium can comprise instructions that, when executed by the processor, cause the processor to receive data relating to a plurality of scenarios. Each of the scenarios can comprise one or more corresponding supporting files.
[0098] The instructions can further cause the processor to process the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios. Processing the data further can include converting each supporting file into unstructured text.
[0099] The instructions can further cause the processor to populate, for each of the plurality of scenarios, values of a corresponding feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios. Each textual feature can be populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file.
[0100] The instructions can further cause the processor to process each of the feature vectors to identify a set of feature groups, each feature group comprising a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios. The instructions can further cause the processor to generate a docket comprising a listing of the set of feature groups. The listing can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups.
[0101] The instructions can further cause the processor to transmit the docket to a plurality of client devices.
[0102] In some instances, processing the data relating to each of the scenarios on the unstructured text of each supporting file further comprises using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
[0103] In some instances, the instructions further cause the processor to process each textual feature part of each of the feature groups using a feature hierarchy to generate a cumulative feature score of each feature group, wherein the arrangement of the feature groups is based at least on the cumulative feature score of each feature group.
[0104] In some instances, the instructions further cause the processor to obtain a response for each of the set of feature groups listed in the docket from any of the client devices.
[0105] In some instances, the instructions further cause the processor to train a docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors.
[0106] In another example embodiment, a computer-implemented method is provided. The computer-implemented method can include receiving at data relating to a plurality of scenarios. Each of the scenarios can comprise one or more corresponding supporting files.
[0107] The computer-implemented method can also include processing the data relating to each of the plurality of scenarios to extract a set of textual features for each of the plurality of scenarios.
[0108] The computer-implemented method can also include populating, for each of the plurality of scenarios, values of a corresponding feature vector with corresponding textual features extracted from the data relating to each of the plurality of scenarios.
[0109] The computer-implemented method can also include processing, by a docket generation model, each of the feature vectors to identify a set of feature groups, each feature group comprising a set of scenarios that include one or more extracted textual features that are common across feature vectors corresponding to each scenario of the set of scenarios and generate a docket comprising a listing of the set of feature groups. The listing can be arranged by a number of scenarios corresponding to each of the feature groups and/or a number of extracted textual features for each of the set of feature groups.
[0110] The computer-implemented method can also include transmitting the docket to a plurality of client devices.
[0111] The computer-implemented method can also include obtaining a response for each of the set of feature groups listed in the docket from any of the client devices. The computer-implemented method can also include training the docket generation model using the feature vectors, the set of feature groups, and the responses for each of the set of feature groups to generate a feature hierarchy of the extracted textual features in the feature vectors. The feature hierarchy can be used in arranging the listing of the set of feature groups.
[0112] In some instances, processing the data relating to each of the plurality of scenarios to extract the set of textual features for each of the plurality of scenarios further comprises converting each supporting file into unstructured text. The processing can be performed on the unstructured text of each supporting file.
[0113] In some instances, processing the data relating to each of the scenarios on the unstructured text of each supporting file further comprises using a text analytics process to programmatically extract each textual feature from the unstructured text of each supporting file.
[0114] In some instances, each textual feature is populated in each corresponding feature vector according to a location of the textual feature identified from each corresponding supporting file.
Tumor Board Use Case
[0115] In a use case in which the above described system and method is being used for a tumor board that is discussing oncology treatment recommendations for different tumors, the PII separation process is very important at the intake stage. Oncology providers who submit the different scenarios for the tumor of each patient will use the online intake form to answer questions about complex tumor cases, attaching additional Electronic Medical Records (EMRs). The intake process carefully separates case data from PII and store the values accordingly as discussed above. In this use case, each bucket may correspond to different tumor subtypes within individual sites (e.g., Stage II Breast, etc.). The review panel will consist of one or more experts who are considered leaders in the field of oncology research and hail from prestigious institutions.
[0116] The foregoing description, for purpose of explanation, has been with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.
[0117] The system and method disclosed herein may be implemented via one or more components, systems, servers, appliances, other subcomponents, or distributed between such elements. When implemented as a system, such systems may include and/or involve, inter alia, components such as software modules, general-purpose CPU, RAM, etc. found in general-purpose computers. In implementations where the innovations reside on a server, such a server may include or involve components such as CPU, RAM, etc., such as those found in general-purpose computers.
[0118] Additionally, the system and method herein may be achieved via implementations with disparate or entirely different software, hardware and/or firmware components, beyond those set forth above. With regard to such other components (e.g., software, processing components, etc.) and/or computer-readable media associated with or embodying the present inventions, for example, aspects of the innovations herein may be implemented consistent with numerous general purpose or special purpose computing systems or configurations. Various exemplary computing systems, environments, and/or configurations that may be suitable for use with the innovations herein may include, but are not limited to: software or other components within or embodied on personal computers, servers or server computing devices such as routing/connectivity components, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, consumer electronic devices, network PCs, other existing computer platforms, distributed computing environments that include one or more of the above systems or devices, etc.
[0119] In some instances, aspects of the system and method may be achieved via or performed by logic and/or logic instructions including program modules, executed in association with such components or circuitry, for example. In general, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular instructions herein. The inventions may also be practiced in the context of distributed software, computer, or circuit settings where circuitry is connected via communication buses, circuitry or links. In distributed settings, control/instructions may occur from both local and remote computer storage media including memory storage devices.
[0120] The software, circuitry and components herein may also include and/or utilize one or more type of computer readable media. Computer readable media can be any available media that is resident on, associable with, or can be accessed by such circuits and/or computing components. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and can accessed by computing component. Communication media may comprise computer readable instructions, data structures, program modules and/or other components. Further, communication media may include wired media such as a wired network or direct-wired connection, however no media of any such type herein includes transitory media. Combinations of the any of the above are also included within the scope of computer readable media.
[0121] In the present description, the terms component, module, device, etc. may refer to any type of logical or functional software elements, circuits, blocks and/or processes that may be implemented in a variety of ways. For example, the functions of various circuits and/or blocks can be combined with one another into any other number of modules. Each module may even be implemented as a software program stored on a tangible memory (e.g., random access memory, read only memory, CD-ROM memory, hard disk drive, etc.) to be read by a central processing unit to implement the functions of the innovations herein. Or the modules can comprise programming instructions transmitted to a general-purpose computer or to processing/graphics hardware via a transmission carrier wave. Also, the modules can be implemented as hardware logic circuitry implementing the functions encompassed by the innovations herein. Finally, the modules can be implemented using special purpose instructions (SIMD instructions), field programmable logic arrays or any mix thereof which provides the desired level performance and cost.
[0122] As disclosed herein, features consistent with the disclosure may be implemented via computer-hardware, software, and/or firmware. For example, the systems and methods disclosed herein may be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Further, while some of the disclosed implementations describe specific hardware components, systems and methods consistent with the innovations herein may be implemented with any combination of hardware, software and/or firmware. Moreover, the above-noted features and other aspects and principles of the innovations herein may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various routines, processes and/or operations according to the invention or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the invention, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.
[0123] Aspects of the method and system described herein, such as the logic, may also be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (“PLDs”), such as field programmable gate arrays (“FPGAs”), programmable array logic (“PAL”) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor (“MOSFET”) technologies like complementary metal-oxide semiconductor (“CMOS”), bipolar technologies like emitter-coupled logic (“ECL”), polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.
[0124] It should also be noted that the various logic and/or functions disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconductor storage media) though again this does not include transitory media. Unless the context clearly requires otherwise, throughout the description, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of “including, but not limited to” Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words “herein,” “hereunder,” “above,” “below,” and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word “or” is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
[0125] Although certain presently preferred implementations of the invention have been specifically described herein, it will be apparent to those skilled in the art to which the invention pertains that variations and modifications of the various implementations shown and described herein may be made without departing from the spirit and scope of the invention. Accordingly, it is intended that the invention be limited only to the extent required by the applicable rules of law.
[0126] While the foregoing has been with reference to a particular embodiment of the disclosure, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the disclosure, the scope of which is defined by the appended claims.