DATA SET CLEANING FOR ARTIFICIAL NEURAL NETWORK TRAINING
20210383210 · 2021-12-09
Assignee
Inventors
Cpc classification
G06F18/214
PHYSICS
International classification
Abstract
A technology for cleaning a training data set for a neural network using dirty training data starts by accessing a labeled training data set that comprises relatively dirty labeled data elements. The labeled training data set is divided into a first subset A and a second subset B. The procedure includes cycling between the subsets A and B, including producing refined model-filtered subsets of subsets A and B to provide a cleaned data set. Each refined model-filtered subset can have improved cleanliness and increased numbers of elements.
Claims
1. A computer-implemented method for cleaning training data for a neural network, comprising: accessing a labeled training data set (S); using a first subset (A) of the labeled training data set to train a first model (Model_A) of the neural network; filtering a second subset (B) of the labeled training data set using the first model to provide a first model-filtered subset (B1F) of the second subset; using the model-filtered subset of the second subset to train a first refined model (Model_B1F) of the neural network; filtering the first subset using the first refined model to provide a first refined model-filtered subset (A1F) of the first subset; using the first refined model-filtered subset of the first subset to train a second refined model (Model_A1F) of the neural network; and filtering the second subset of the labeled training data set using the second refined model to provide a second refined model-filtered subset (B2F) of the second subset.
2. The method of claim 1, including: combining the first refined model-filtered subset and the second refined model-filtered subset of the second subset to provide a filtered training set (A1F+B2F), training an output model of a target neural network using the filtered training set, and saving the output model in memory.
3. The method of claim 1, wherein the second refined model-filtered subset (B2F) has a greater number of elements than the first model-filtered subset (B1F).
4. The method of claim 1, wherein the first and second subsets do not overlap.
5. The method of claim 1, wherein said filtering the first subset using the first refined model includes: executing the neural network using the first refined model (Model_B1F) over the first subset (A) to produce classification data classifying data elements of the first subset (A); selecting data elements of the first subset (A) having labels matching the classification data to provide the first refined model-filtered subset (A1F) of the first subset (A).
6. The method of claim 2, loading the output model in an instance of the target neural network in an inference engine.
7. The method of claim 1, including iteratively (i) using a previously provided refined model-filtered subset of one of the first subset and second subset to train an instant refined model of the neural network; (ii) filtering another of first subset and second subset using the instant refined model to provide an instant refined model-filtered subset of the other of the first subset and the second subset; and (iii) determining whether an iteration criterion is met, and if not, then executing (i) to (iii), and if so, then using a combination of a selected one of the refined model-filtered subsets of the first subset (A) and a selected one of the refined model-filtered subsets of the second subset (B) to produce a trained model for the neural network.
8. The method of claim 7, loading the output model [no antecedent] in an instance of the target neural network [no antecedent] in an inference engine.
9. A computer system configured to clean training data for a neural network, comprising: one or more processors and memory storing computer program instructions configured to execute a process comprising: accessing a labeled training data set (S); using a first subset (A) of the labeled training data set to train a first model (Model_A) of the neural network; filtering a second subset (B) of the labeled training data set using the first model to provide a first model-filtered subset (B1F) of the second subset; using the model-filtered subset of the second subset to train a first refined model (Model_B1F) of the neural network; filtering the first subset using the first refined model to provide a first refined model-filtered subset (A1F) of the first subset; using the first refined model-filtered subset of the first subset to train a second refined model (Model_A1F) of the neural network; and filtering the second subset of the labeled training data set using the second refined model to provide a second refined model-filtered subset (B2F) of the second subset.
10. The system of claim 9, the process including: combining the first refined model-filtered subset and the second refined model-filtered subset of the second subset to provide a filtered training set (A1F+B2F), training an output model of a target neural network using the filtered training set, and saving the output model in memory.
11. The system of claim 9, wherein the second refined model-filtered subset (B2F) has a greater number of elements than the first model-filtered subset (B1F).
12. The system of claim 9, wherein said filtering the first subset using the first refined model includes: executing the neural network using the first refined model (Model_B1F) over the first subset (A) to produce classification data classifying data elements of the first subset (A); selecting data elements of the first subset (A) having labels matching the classification data to provide the first refined model-filtered subset (A1F) of the first subset (A).
13. The system of claim 9, the process including iteratively (i) using a previously provided refined model-filtered subset of one of the first subset and second subset to train an instant refined model of the neural network; (ii) filtering another of first subset and second subset using the instant refined model to provide an instant refined model-filtered subset of the other of the first subset and the second subset; and (iii) determining whether an iteration criterion is met, and if not, then executing (i) to (iii), and if so, then using a combination of a selected one of the refined model-filtered subsets of the first subset (A) and a selected one of the model-filtered subsets of the second subset (B) to produce a trained model for a target neural network.
14. The system of claim 13, the process loading the trained model in an instance of the target neural network in an inference engine.
15. A computer program product configured to support cleaning training data for a neural network, comprising a non-transitory computer readable memory storing computer program instructions configured to execute a process comprising: accessing a labeled training data set (S); using a first subset (A) of the labeled training data set to train a first model (Model_A) of the neural network; filtering a second subset (B) of the labeled training data set using the first model to provide a first model-filtered subset (B1F) of the second subset; using the model-filtered subset of the second subset to train a first refined model (Model_B1F) of the neural network; filtering the first subset using the first refined model to provide a first refined model-filtered subset (A1F) of the first subset; using the first refined model-filtered subset of the first subset to train a second refined model (Model_A1F) of the neural network; and filtering second subset of the labeled training data set using the second refined model to provide a second refined model-filtered subset (B2F) of the second subset.
16. The computer program product of claim 15, wherein the second refined model-filtered subset (B2F) has greater number of elements than the first model-filtered subset (B1F).
17. The computer program product of claim 15, wherein the first and second subsets do not overlap.
18. The computer program product of claim 15, the process including iteratively (i) using a previously provided refined model-filtered subset of one of the first subset and second subset to train an instant refined model of the neural network; (ii) filtering another of first subset and second subset using the instant refined model to provide an instant refined model-filtered subset of the other of the first subset and the second subset; and (iii) determining whether an iteration criterion is met, and if not, then executing (i) to (iii), and if so, then using a combination of a selected one of the refined model-filtered subsets of the first subset (A) and a selected one of the model-filtered subsets of the second subset (B) to produce a trained model for the neural network.
19. The computer program product of claim 18, the process loading the trained model in an instance of the target neural network in an inference engine.
20. The computer program product of claim 15, wherein said filtering the first subset using the first refined model includes: executing the neural network using the first refined model (Model_B1F) over the first subset (A) to produce classification data classifying data elements of the first subset (A); selecting data elements of the first subset (A) having labels matching the classification data to provide the first refined model-filtered subset (A1F) of the first subset (A).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
DETAILED DESCRIPTION
[0030] A detailed description of embodiments of the present invention is provided with reference to the
[0031]
[0032] As mentioned above, a method for training a neural network to classify defects in a manufacturing line, or for other classification functions can include a computer implemented process of cleaning the training data set, by removing mislabeled elements.
[0033] Images of defects on integrated circuit assemblies taken in a manufacturing assembly line can be classified in many categories, usable as elements of a training data set. These defects vary significantly in counts for a given manufacturing process, and so the training data can have an uneven distribution, and includes large data sizes. Also, the labeling process for images like this may be done by a person, who can make significant numbers of errors. For example, to build up a new neutral network model to classify defect categories or types, first we need to provide a labeled image database for training. The image database includes the defect information. One might have 50,000 defect images in the database, and with each image labeled by human with a classification. So one image in the set might be classified as category 9, and another image in the set might be classified as category 15 . . . , etc. However, human error and ambiguous cases result in mislabeling. For example, one image in the set which should be classified as defect category 7, might be erroneously classified the into category 3. A data set with erroneously classified elements can be referred to as a dirty data set, or a noisy data set.
[0034] An embodiment of the technology described herein can be used to clean a dirty data set, and use the cleaned data set to train an ANN to recognize and classify the defects, improving the manufacturing process. This trained ANN can be used to monitor in-line process defects used, for example, to evaluate the stability and quality of in-line products, or the life of manufactured tools.
[0035]
[0036] The computer implemented process accesses the database to retrieve a first Subset A and a second Subset B of the training data set S (101). In one approach, Subset A and Subset B are selected so that the distribution of dirty data elements in the subset is about equal to the distribution in the overall data set S. Also, the Subset A and Subset B can be selected so that the numbers of data elements in each of the subsets is about the same. As it is desirable to maximize the number of clean data elements utilized in a training algorithm, Subset A and Subset B can be selected by dividing the training data set S equally, randomly selecting the elements for Subset A and Subset B so as to at least statistically maintain the distribution of dirty elements relatively equal in the two subsets.
[0037] Next in the flowchart (cycle A), one of the two subsets, such as Subset A, is used to train the neural network to produce a model MODEL_A (102). Using the model MODEL_A, Subset B is filtered to produce, and store in memory, a first model-filtered Subset B1F of Subset B (103) (first Subset B filtering). An example of a technique for filtering a subset using a model is illustrated in
[0038] Next (cycle AB), the model-filtered Subset B1F is used to train the neural network to produce a refined model MODEL_B1F (104). As used herein, the term “refined” is used to indicate that the model was produced using a model-filtered subset (or a refined model-filtered Subset A as in instances below), and does not indicate any relative quality measure of the model. Then, Subset A is filtered, using the refined model MODEL_B1F, to produce and store in memory, a refined model-filtered Subset A1F of Subset A (105) using, for example, the technique described with reference to
[0039] In a next iteration (cycle ABA), the refined model-filtered Subset A1F is used to train the neural network to produce, and store in memory, a refined model MODEL_A1F (106). Then, the refined model MODEL_A1F is used to filter Subset B to produce, and store in memory, a second refined model-filtered Subset B2F of Subset B (107), using for example a technique like that described in
[0040] In this example, no additional filtering cycles may be needed to provide a cleaned training data set to be used in producing a final output model. For example, the cleaned training data set at this stage can comprise a combination of the second refined-model-filtered Subset B2F of Subset B and the first refined-model-filtered Subset A1F of Subset A.
[0041] If no additional filtering cycles are executed, then the computer implemented algorithm can train a neural network using the combination of refined model-filtered subsets, such as a union of Subset A1F and Subset B2F, to produce an output model for the neural network (108). The neural network trained at this stage using the cleaned data set can be the same neural network as used in steps 102, 104 and 106 to produce the refined model-filtered subsets, or it can be a different neural network. The output model can then be stored in an inference engine to be applied in the field, or in memory such as a database, for later use (109).
[0042] In the training steps of
[0043]
[0044] The computer implemented process accesses the database to retrieve a first Subset A and a second Subset B of the training data set S (151). In one approach, Subset A and Subset B are selected so that the distribution of dirty data elements in the subset is about equal to this distribution in the overall data set S. Also, the Subset A and Subset B can be selected so that the numbers of data elements in each of the subsets are the same, or about the same. As it is desirable to maximize the number of clean data elements utilized in a training algorithm, Subset A and Subset B can be selected by dividing the training data set S equally, randomly selecting the elements for Subset A and Subset B so as to at least statistically tend to maintain the distribution of dirty elements relatively equal in the two subsets. Other techniques for selecting the elements of Subset A and Subset B can be applied taking into account the numbers of elements in each category, and other data-content-aware selection techniques.
[0045] Next, in the flowchart, one of the two subsets, such as Subset A is used to train the neural network to produce a model MODEL_A(0), and indexes for tracking the cycles are set (n=1 and m=1) (152). Using the model MODEL_A(n−1), Subset B is filtered to produce, and store in memory, a first model-filtered Subset BmF of Subset B (103). An example of a technique for filtering a subset using a model is illustrated in
[0046] Next, the model-filtered Subset BmF is used to train the neural network to produce a refined model MODEL_BmF (154). Then, Subset A is filtered, using the refined model MODEL_BmF, to produce, and store in memory, a refined model-filtered Subset AnF of Subset A (155), using, for example, the technique described with reference to
[0047] At this stage, the procedure determines whether an iteration criterion is met. For example, an iteration criterion can be a maximum number of cycles, as indicated by whether the index n or the index m exceeds a threshold. Alternatively, the iteration criterion can be whether the sizes (i.e. numbers of elements) of the refined model-filtered subsets AnF and BmF converge with the sizes of the filtered subsets A(n−1)F and B(m−1)F, respectively (156). Convergence can be indicated for example if the difference in sizes is less than a threshold, where the threshold can be selected according to the particular application and training data set used. For example, the threshold can be on the order of 0.1% to 5%.
[0048] As explained with reference to
[0049] In the case of
[0050] The procedure continues until the iteration criterion of step 156 is met. If the criterion is met at step 156, then refined model-filtered subsets of Subset A and of Subset B are selected. For example, the refined model-filtered subsets having the largest numbers of elements can be selected. The selected model-filtered subsets of Subset A and Subset B are combined to provide a cleaned data set, and the combination is used to train a target neural network to produce an output model (159). The target neural network trained at this stage using the cleaned data set can be the same neural network as used in steps 152, 154 and 157 to produce the refined model-filtered subsets, or it can be a different neural network.
[0051] Then the output model can be stored in an inference engine to be applied in the field, or in memory such as a database, for later use (160).
[0052] In the training steps of
[0053] In general, the procedure shown in
[0054] (i) using a previously provided refined model-filtered subset of one of the first subset and second subset to train an instant refined model of the neural network;
[0055] (ii) filtering another of first subset and second subset using the instant refined model to provide an instant refined model-filtered subset of the other of the first subset and the second subset; and
[0056] (iii) determining whether an iteration criterion is met, and if not, then executing (i) to (iii), and if so, then using a combination of a selected one of the refined model-filtered subsets of the first subset (A) and a selected one of the model-filtered subsets of the second subset (B) to produce a trained model for the neural network.
[0057]
[0058] Assuming that a MODEL_X is provided, the process uses MODEL_X (trained using one subset of the training data set) and executing the neural network over subset Y (170). MODEL_X can be MODEL_A, MODEL_B1F, MODEL_A1F or, more generally, MODEL_A(n)F or MODEL_B(m)F. The subset Y is the subset (the other subset) not used to train the MODEL_X.
[0059] Then, elements of the subset Y having labels that match the classification data output by the neural network are selected as members of the model-filtered subset of subset Y (171).
[0060] The technology can be further described with reference to
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070] This cycling can continue as discussed above. However, it is seen that the number of elements in the model-filtered subsets is converging on the maximum of 80% for this training data set. So, the cycling can be stopped, and a final training set can be selected.
[0071]
[0072]
[0073] User interface input devices 1238 can include a keyboard; pointing devices such as a mouse, trackball, touchpad, or graphics tablet; a scanner; a touch screen incorporated into the display; audio input devices such as voice recognition systems and microphones; and other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and ways to input information into computer system 1200.
[0074] User interface output devices 1276 can include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem can include an LED display, a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide a non-visual display such as audio output devices. In general, use of the term “output device” is intended to include all possible types of devices and ways to output information from computer system 1200 to the user or to another machine or computer system.
[0075] Storage subsystem 1210 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein to train models for ANNs. These models are generally applied to ANNs executed by deep learning processors 1278.
[0076] In one implementation, the neural networks are implemented using deep learning processors 1278 which can be configurable and reconfigurable processors, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and/or coarse-grained reconfigurable architectures (CGRAs) and graphics processing units (GPUs) other configured devices. Deep learning processors 1278 can be hosted by a deep learning cloud platform such as Google Cloud Platform™, Xilinx™, and Cirrascale™. Examples of deep learning processors 1278 include Google's Tensor Processing Unit (TPU)™, rackmount solutions like GX4 Rackmount Series™, GX149 Rackmount Series™, NVIDIA DGX-1™, Microsoft′ Stratix V FPGA™, Graphcore's Intelligent Processor Unit (IPU)™, Qualcomm's Zeroth Platform™ with Snapdragon Processors™, NVIDIA's Volta™, NVIDIA's DRIVE PX™, NVIDIA's JETSON TX1/TX2 MODULE™, Intel's Nirvana™, Movidius VPU™, Fujitsu DPI™, ARM's DynamicIQ™, IBM TrueNorth™, and others.
[0077] The memory subsystem 1222 used in the storage subsystem 1210 can include a number of memories including a main random access memory (RAM) 1234 for storage of instructions and data during program execution and a read only memory (ROM) 1232 in which fixed instructions are stored. The instructions include procedures for cleaning a training data set and procedures for training a neural network using the cleaned data set as described with reference to
[0078] A file storage subsystem 1236 can provide persistent storage for program and data files, including the program and data files described with reference to
[0079] Bus subsystem 1255 provides a mechanism for letting the various components and subsystems of computer system 1200 communicate with each other as intended. Although bus subsystem 1255 is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple busses.
[0080] Computer system 1200 itself can be of varying types including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a widely-distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system 1200 depicted in
[0081] Embodiments of the technology described herein include computer programs stored on non-transitory computer readable media deployed as memory accessible and readable by computers, including for example, the program and data files described with reference to
[0082] Other implementations of the method described in this section can include a non-transitory computer readable storage medium storing instructions executable by a processor to perform any of the methods described above. Yet another implementation of the method described in this section can include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform any of the methods described above.
[0083] Any data structures and code described or referenced above are stored according to many implementations on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, volatile memory, non-volatile memory, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed. Many other configurations of computer system 1200 are possible having more or less components than the computer system depicted in
[0084] A thin platform inference engine can include a processor such as CPU 1272, such as a microcomputer, optionally coupled with deep learning processors 1278 storing the parameters of the output trained model, and an input and output port for receiving inputs and transmitting outputs produced by executing the model. The processor may include for example, a LINUX kernel and an ANN program implemented using executable instructions stored in non-transitory memory accessible by the processor and the deep learning processors, and configured to use the model parameters, during inference operations.
[0085] A device used by, or including, an inference engine as described herein, comprises logic to implement ANN operations over input data and a trained model, where the model comprises a set of model parameters, and memory storing the trained model operably coupled to the logic, the trained set of parameters having values computed using a training algorithm that compensates for a dirty training set as described herein.
[0086]
[0087] A number of flowcharts illustrating logic for cleaning training data sets and for training neural networks are included herein. The logic can be implemented using processors programmed using computer programs stored in memory accessible to the processors and executable by the processors, by dedicated logic hardware, including field programmable integrated circuits, and by combinations of dedicated logic hardware and computer programs. With all flowcharts herein, it will be appreciated that many of the steps can be combined, performed in parallel or performed in a different sequence without affecting the functions achieved. In some cases, as the reader will appreciate, a rearrangement of steps will achieve the same results only if certain other changes are made as well. In other cases, as the reader will appreciate, a rearrangement of steps will achieve the same results only if certain conditions are satisfied. Furthermore, it will be appreciated that the flow charts herein show only steps that are pertinent to an understanding of the invention, and it will be understood that numerous additional steps for accomplishing other functions can be performed before, after and between those shown.
[0088] While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.