SYSTEM AND METHOD FOR DYNAMIC MODEL TRAINING WITH HUMAN IN THE LOOP
20230121299 · 2023-04-20
Inventors
Cpc classification
G06F18/214
PHYSICS
G06F18/2113
PHYSICS
International classification
G06F18/21
PHYSICS
G06F18/2113
PHYSICS
G06F18/214
PHYSICS
Abstract
An improved neural network is disclosed that supports rapid retraining using human feedback. A weighted ensemble of Extreme Learning Machines (ELMs) is used to implement a model. The ensemble of ELMs may be trained in parallel with a variation in individual parameters gridding a parameter set selected to achieve consistent accurate model results when the model is trained and subsequently retrained when user feedback data become available. An exemplary application is the scoring of resumes.
Claims
1. A method, comprising: training, in parallel, an ensemble of Extreme Learning Machines (ELMs) to implement a data analysis model, each ELM in the ensemble of ELMs being assigned a different set of ELM parameters to grid an ELM parameter space; performing a validation test on each trained ELM in the ensemble of ELMs using a validation data set; and assigning a weight to each ELM in the ensemble of ELMs based on results of the validation test to form a weighted output of the ensemble of ELMs.
2. The method of claim 1, wherein the data analysis model comprises classification.
3. The method of claim 2, wherein the classification comprises scoring media items.
4. The method of claim 3, wherein the classification comprises scoring resumes.
5. The method of claim 1, wherein the parameter space that is gridded includes a number of neurons, regularization coefficients, and initialization of random weights.
6. The method of claim 1, further comprising re-training the ensemble of ELMs using additional training data from user feedback, performing the validation test on each ELM in the ensemble of ELMs, and re-weighting the ensemble of ELMs.
7. The method of claim 3, further comprising re-training the ensemble of ELMs using additional training data from user feedback, performing the validation test on each ELM in the ensemble of ELMs, and re-weighting the ensemble of ELMs.
8. The method of claim 6, where the user feedback comprises voting on scored media items.
9. A computer-implemented method, comprising: receiving a user query for searchable media items; extracting features from the user query; training an ensemble of Extreme Learning Machines (ELMs) to score the searchable media items based at least in part on the extracted features and available training data, each ELM being assigned a different set of ELM parameters to grid an ELM parameter space; performing a validation test on each trained ELM; assigning a weight to each ELM based on results of the validation test; scoring the searchable media items using the weighted output of the ensemble of ELMs; and returning search results to the user query based on the scoring of the searchable media items.
10. The computer-implemented method of claim 9, further comprising receiving user feedback on the scoring, using the user feedback as an additional form of training data, re-training the ensemble of ELMs, performing the validation test on the retrained ensemble of ELMs, re-weighting the ensemble of trained ELMs based on the validation test.
11. The computer-implemented method of claim 10, further comprising re-scoring the searchable media items using the re-weighted and re-trained ensemble of trained ELMs.
12. The computer-implemented method of claim 9, wherein the parameter space that is gridded includes a number of neurons, regularization coefficients, and initialization of random weights.
13. The computer-implemented method of claim 10, wherein user feedback comprises positive votes and negative votes.
14. The computer-implemented method of claim 9, wherein the method further comprises generating a feature dictionary from extracted features and using the feature dictionary in at least one subsequent search query to train the ensemble of ELMs.
15. The computer-implemented method of claim 9, wherein the searchable media items comprise resumes, and the ensembles of ELMs is trained to score resumes.
16. A computer-implemented method, comprising: receiving a user query to search resumes; extracting features from the user query; training an ensemble of Extreme Learning Machines (ELMs) to score the resumes based at least in part on the extracted features and available training data, each ELM being assigned a different set of ELM parameters to grid an ELM parameter space; performing a validation test on each trained ELM using a validation data set; assigning a weight to each ELM based on results of the validation test; scoring the resumes using the weighted output of the ensemble of ELMs; and providing a ranked listing of the resumes to the user.
17. The computer-implemented method of claim 16, further comprising receiving user feedback on the scoring of the resumes, using the user feedback as an additional form of training data, re-training the ensemble of ELMs, performing the validation test on the retrained ensemble of ELMs, re-weighting the ensemble of trained ELMs based on the validation test, re-scoring the resumes using the re-weighted and re-trained ensemble of trained ELMs, and providing a re-ranked listing of the resumes to the user.
18. The computer-implemented method of claim 16, wherein the parameter space that is gridded includes a number of neurons, regularization coefficients, and initialization of random weights.
19. The computer-implemented method of claim 17, wherein user feedback comprises positive votes and negative votes.
20. The computer-implemented method of claim 16, wherein the method further comprises generating a feature dictionary from extracted features and using the feature dictionary in at least one subsequent search query to train the ensemble of ELMs.
21. A non-transitory computer readable medium having instructions which when executed on a processor implement a method to generate a trained machine learning model, comprising: training, in parallel, an ensemble of Extreme Learning Machines (ELMs) to implement a data analysis model, each ELM in the ensemble of ELMs being assigned a different set of ELM parameters to grid an ELM parameter space; performing a validation test on each trained ELM in the ensemble of ELMs using a validation data set; and assigning a weight to each ELM in the ensemble of ELMs based on results of the validation test to form a weighted output of the ensemble of ELMs.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] The present disclosure is illustrated by way of example, and not by way of limitation in the figures of the accompanying drawings in which like reference numerals are used to refer to similar elements.
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
DETAILED DESCRIPTION
[0016] The present disclosure describes systems and methods for using a weighted ensemble of Extreme Learning Machines (ELMs) to achieve rapid training of a model with consistently high accuracy. The individual ELMs in the ensemble may grid a parameter space for the ELMs, with a validation test being used to determine a weight for each trained ELM in the ensemble. Thus, the more accurate ELMs in the ensemble are assigned a higher weight than less accurate ELMs. The gridding of the parameter space may be chosen to achieve a consistently high accuracy of the weighted ensemble of trained ELMs.
[0017] The model implemented by the weighted ensemble of trained ELMs could be a classification model, but more generally may be other types of models.
[0018] In one implementation, a human in the loop provides feedback on the results of the model. In the case of a classification model, the human user can provide feedback on the accuracy of the scoring performed by the model by voting on search result items. The votes are used to form positive/negative training data. As ELMs can be rapidly trained, the ELM ensemble can be trained in parallel when a search query is initiated, using the same training data. The ELM ensemble can also be retrained based on voting data. In some implementations, a feature dictionary is formed from feature data extracted from search queries. Feature data from a query may be used to train an ELM ensemble.
[0019] ELM Background
[0020] An individual Extreme Learning Machine (ELM) is a particular type of feedforward neural network that has various advantages and disadvantages compared with other machine learning approaches. A review of ELM technology is provided in the article by Mustafa Abbas Abbod Albadr et al, “Extreme Learning Machine: A Review”, International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 12, No. 14 (217), pp 4610-4623, the contents of which are hereby incorporated by reference.
[0021] Additional background on ELMs on determining weights is found in a paper by G. B. Huang, et al, “Extreme Learning Machine: Theory and Applications,” Neurocomputing, 70(10: 489-501 (2006), the contents of which are hereby incorporated by referenced.
[0022] ELMs have many different applications, including classification. See, e.g., G. B. Huang, et al., “Extreme Learning Machine for Regression and Multiclass Classification,” IEEE Transactions on Systems, Man, and Cybernetics-Part B, Cybernetics, 42(2): 513-529 (2012).
[0023] ELMs have been used for applications such as regression, classification, sparse coding, and compression features learning. An ELM may be implemented to have a single layer of hidden nodes corresponding to a single layer feedforward network. In ELM, the input layer weights are randomly assigned and the output layer weight may be obtained by using a generalized inverse of the hidden layer output matrix
[0024] ELMs typically have a single hidden layer of nodes, although some forms of ELM have more than one hidden layer of nodes. The parameters of hidden nodes (not just the weights connecting inputs to hidden nodes) need not be tuned.
[0025] One aspect of ELMs is that in most cases, the output weights of hidden nodes are learned in a single step. ELMs can be trained extremely quickly compared with many other types of neural networks.
[0026] There is a vast number of academic papers describing ELMs theory, the applications of ELMs, and variations on ELM. However, while ELMs are fast to train, they are also known to have problems achieving consistently high accuracy. That is, an individual ELM used as a single hidden layer feedforward network may achieve a high accuracy. But accuracy with consistency can be a barrier in many applications.
[0027] ELM Ensemble System Examples
[0028]
[0029]
[0030] The system 102 may be implemented as computer software code with hardware supports such as network interface, memory, processor(s), and databases 160. The system 102 may be implemented in various ways. As some possibilities, it may be implemented as a server based system operating in an enterprise environment, a web-based network service, via a cloud-based, or via a cloud-assisted service.
[0031] One application of the system 102 is to perform classification (scoring) of searchable media items, with resumes being an example of media items that are searchable by entering search queries based on criteria such as college degrees, years of work experience, a distance range, etc. The searchable media may be a populated database, such as a set of resumes uploaded to a company's database. Alternatively the searchable media could be, for example, media obtainable from other sources such as public databases, websites, web services, etc.
[0032] As illustrated in
[0033] The ELM ensemble 220 is trained based on extracted features and the feature dictionary. Vote feedback data 234, if available, may also be used. Other media training data 232 may be used. The training engine 240 includes a grid parameter selection module 244 to grid a parameter space. A validation and weighting module 246 performs a validation test on each trained ELM based on a validation test and weights each trained ELM based on the validation test. General rules for training or retraining the ELM ensemble 242 may be selected, such as rules for using extracted features and features in the feature dictionary during training.
[0034] Additionally, specific conditions for triggering training/retraining of the ELM ensemble 248 may be provided. As one example, training of the ELM ensemble may be triggered for each new search query. However, there may be scenarios where a new search query is merely a minor variation of earlier search queries such that retraining would be unlikely to change search results. In some implementations, a user may be provided with options to request reevaluation of their search query after they have provided one or more votes. The retraining may thus be triggered in response to a user command. However, other options are possible, such as automatically performing retraining after a selected number of user votes.
[0035] As illustrated in
[0036] However, in an HR application, the searchable media items do not necessarily have to be only text documents or conventional resumes. For example, some professional network websites permit users to upload videos, which may include, for example, people giving talks at technical conferences about their work. In some industries, video resumes are becoming increasingly common as a supplement or even a replacement for conventional resumes. An HR department may, for example, find video information on professional networking sites that may be converted to a text equivalent using natural language processing.
[0037] In a general use scenario, individual users utilize a user device 115 to enter search queries for the searchable media items. This results in scoring the searchable media items and may include presenting the scored media items in a ranked order. The score may also optionally be displayed. An individual user who submitted a query may vote on one or more of the search results. The simplest voting system is a positive or negative (thumbs up or thumbs down) vote about an individual media item. However, more generally, a user can vote on as many media items as they wish. Other types of voting systems can also be used, but a thumbs up/thumbs down voting system is easiest to implement.
[0038] The votes are used as feedback training data. The retraining of the ELM ensemble model 104 can be in different ways. For example, a user interface may provide a button for a user to request retraining of the model (e.g., a “reevaluate” button). Other conditions could be selected to trigger retraining of the model, such as after a pre-selected number of votes. When the ELM ensemble model is retrained, the training engine 102 uses the votes as an additional source of training data. The features extracted from queries by a feature extractor stored in the feature dictionary may be used to train the ensemble of ELMs. In the training of the ensemble of ELMs, key parameters in a parameter space are gridded, as will be discussed below in more detail. Each trained ELM in the ensemble is tested using a validation data set, with the validation results being used to assign a weight to each ELM in the ensemble.
[0039] Referring to
Example Methods
[0040] Examples are now provided for two different types of methods. At a high level, general methods related to training and using an ensemble of ELMs for classifying searchable media items are presented in
[0041]
[0042] In block 514, the ELMs in the ensemble are retrained in parallel. Each ELM is trained with the same feature set, the same voting data, etc. In block 516, an updated scoring is performed using retrained ELM ensemble.
[0043] The process can optionally be performed over multiple search queries, if desired. For example, in the case of resumes, a recruiter may make minor variations of a search query to find a candidate. Over the course of several search queries for a candidate, the feature dictionary will build up, along with positive and negative votes for individual resumes.
[0044]
[0045] In block 610, extracted features are accessed from which the ensemble of ELMs is trained. More generally, the features may come from a feature dictionary built up from features extracted over a series of search queries. If there is voting feedback data, this may be accessed in block 615. Other training data/sample data that is available is accessed in block 620. The ELM ensemble has a variation in ELM parameter that grids a parameter space. This could be implemented in block 625 with a pre-selected number of ELMs in the ensemble along with a pre-selected gridding of the parameter space in terms of factors such as a number of neurons, regularization coefficients, and initialization of random weights in a first layer. The gridding could be “fine” enough to ensure that at least one ELM in the ensemble will provide accurate results. However, more generally, the gridding could be optimized based on empirical studies, recent search result validation test, etc.
[0046] In block 630, each ELM in the ensemble is trained in parallel. The data loading can be performed in parallel and the ELM supports fast training.
[0047] In block 635, a validation test is performed for each trained ELM in the ensemble using a validation data set. This results, effectively, in a confidence score for each ELM. In block 640, each ELM is weighted based on the results of its validation test results. This results in a weighted score. The use of a weighted score helps to achieve consistent accuracy in model results compared with using a single ELM.
[0048]
[0049] The previous methods may be customized to be specific to the problem of servicing queries for resumes.
[0050]
[0051]
[0052] One of ordinary skill in the art would understand that the gridding strategy could be optimized for particular problems and for specific aspects of a training set. Using the same gridding strategy of the ELM parameter space for a wide variety of problems is not ideal in practice. In other words, determining an optimum gridding strategy is problem dependent and also depends on aspects of the training set.
[0053] The gridding of the ELM parameter space may be customized for a particular problem and for aspects of the training set. For example, if the problem is classifying resumes, then the gridding of the ELM parameter space is chosen based on this problem (classifying resumes) and aspects of the training set used for classifying resumes. In this example, optimizing the gridding strategy can be used, for example, to increase the consistency that the ensemble of ELMs produces generates an accurate classification of resumes based on the training set. However, there may also be other practical considerations on computing and memory resources in order to quickly train/retrain the ensemble of ELMs within time frames that provide an acceptable user experience. Thus the gridding strategy might also, for example also take into consideration keeping the number, M, of ELMs in the ELM ensemble within a reasonable number that need be trained/retrained.
[0054] As previously discussed, the ELM ensemble generates a score. In the context of a classification problem, the score (or corresponding probability) is a natural outcome of using an ELM ensemble in which learner votes with its weight. This aspect of the ELM ensemble facilitates using the score in classification problems.
[0055] In the above description, for purposes of explanation, numerous specific details were set forth. It will be apparent, however, that the disclosed technologies can be practiced without any given subset of these specific details. In other instances, structures and devices are shown in block diagram form. For example, the disclosed technologies are described in some implementations above with reference to user interfaces and particular hardware.
[0056] Reference in the specification to “one embodiment”, “some embodiments” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least some embodiments of the disclosed technologies. The appearances of the phrase “in some embodiments” in various places in the specification are not necessarily all referring to the same embodiment.
[0057] Some portions of the detailed descriptions above were presented in terms of processes and symbolic representations of operations on data bits within a computer memory. A process can generally be considered a self-consistent sequence of steps leading to a result. The steps may involve physical manipulations of physical quantities. These quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. These signals may be referred to as being in the form of bits, values, elements, symbols, characters, terms, numbers, or the like.
[0058] These and similar terms can be associated with the appropriate physical quantities and can be considered labels applied to these quantities. Unless specifically stated otherwise as apparent from the prior discussion, it is appreciated that throughout the description, discussions utilizing terms, for example, “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0059] The disclosed technologies may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
[0060] The disclosed technologies can take the form of an entirely hardware implementation, an entirely software implementation or an implementation containing both software and hardware elements. In some implementations, the technology is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
[0061] Furthermore, the disclosed technologies can take the form of a computer program product accessible from a non-transitory computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[0062] A computing system or data processing system suitable for storing and/or executing program code will include at least one processor (e.g., a hardware processor) coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
[0063] Input/output or I/O devices (including, but not limited to, keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
[0064] Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
[0065] Finally, the processes and displays presented herein may not be inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the disclosed technologies were not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the technologies as described herein.
[0066] The foregoing description of the implementations of the present techniques and technologies has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the present techniques and technologies to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the present techniques and technologies be limited not by this detailed description. The present techniques and technologies may be implemented in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, routines, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the present techniques and technologies or its features may have different names, divisions and/or formats. Furthermore, the modules, routines, features, attributes, methodologies and other aspects of the present technology can be implemented as software, hardware, firmware or any combination of the three. Also, wherever a component, an example of which is a module, is implemented as software, the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future in computer programming. Additionally, the present techniques and technologies are in no way limited to implementation in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure of the present techniques and technologies is intended to be illustrative, but not limiting.