DEVICE AND METHOD FOR IMAGE PROCESSING
20230033458 · 2023-02-02
Inventors
- Zeju LI (Shenzhen, CN)
- Liang Chen (Shenzhen, CN)
- Gregory SLABAUGH (London, GB)
- Liu Liu (Beijing, CN)
- Zhongqian FU (Beijing, CN)
Cpc classification
G06V10/774
PHYSICS
G06T3/4015
PHYSICS
G06V10/22
PHYSICS
International classification
G06V10/774
PHYSICS
G06T3/40
PHYSICS
G06V10/22
PHYSICS
Abstract
A device comprising an image processor, the image processor being configured to implement: a first machine learning model for performing restoration processing on degraded image data; and a second machine learning model for recognizing areas of an image requiring processing emphasis during the restoration processing; wherein the output of the second machine learning model is an input to the first machine learning model to optimize the restoration processing.
Claims
1. A device comprising an image processor, the image processor being configured to implement: a first machine learning model (f.sub.θ) for performing restoration processing on degraded image data (102); and a second machine learning model (g.sub.ω) for recognizing areas of an image requiring processing emphasis during the restoration processing; wherein the output of the second machine learning model is an input to the first machine learning model to optimize the restoration processing.
2. The device according to claim 1, wherein the first machine learning model is trained according to the steps of: receiving training data comprising the degraded image data and corresponding optimum image data and providing the degraded image data as an initial input to the system; passing the degraded image data to the first machine learning model configured to create reconstructed image data (106) by performing the restoration processing of the degraded image data; determining loss data (402) by comparing the reconstructed image data to the corresponding optimum image data; combining (404) the loss data with a weight map (204) to form weighted loss data (406); and updating the first machine learning model based on the weighted loss data.
3. The device according to claim 1, wherein the second machine learning model is trained according to the steps of: receiving the weighted loss data at the second machine learning model; determining by the second machine learning model a spatial distribution of the loss based on the weighted loss data; and updating the weight map to account for the spatial distribution of the loss derived from the weighted loss data.
4. The device according to claim 1, wherein the second machine learning model is trained to: identify which spatially distributed regions of a degraded image are more susceptible to degradation based on one or more image features; and generate a weight map for use in performing restoration processing on the degraded image such that a greater weighting is applied to the identified regions.
5. A method of training an image processing system, the image processing system comprising a first machine learning model (f.sub.θ), and the method comprising training the first machine learning model by executing the steps of: receiving training data comprising degraded image data (102) and corresponding optimum image data and providing the degraded image data as an input to the system; passing the degraded image data to a first machine learning model configured to create restored image data (106) by restoring the degraded image data; determining loss data (402) by comparing the restored image data to the corresponding optimum image data; combining the loss data with a weight map (204) to form weighted loss data (406) comprising the spatial distribution of the loss data; and updating the first machine learning model based on the weighted loss data.
6. The method according to claim 5, wherein the image processing system comprises a second machine learning model (g.sub.ω) and the method comprises training the second machine learning model by implementing an updating process executing the steps of: receiving the weighted loss data at a second machine learning model; determining by the second machine learning model a spatial distribution of the loss data based on the weighted loss data; and updating the weight map to account for the spatial distribution of the loss derived from the weighted loss data.
7. The method according to claim 6, wherein the updating process is repeated so as to iteratively update the weight map based on weighted loss data generated from a previous weight map and the first machine learning model.
8. The method according to claim 7, wherein in at least some iterations of the method the training data ({T.sub.L, T.sub.H}) is different from the training data received in the previous iteration of the method.
9. The method according to claim 5, wherein the method comprises modifying the first machine learning model by combining the first machine learning model (f.sub.θ) with the second machine learning model (g.sub.ω) to create a modified first machine learning model (f.sub.θ′) such that the modified first machine learning model is trained to focus on regions of a degraded image which are more susceptible to degradation.
10. The method according to claim 9, wherein the method comprises: receiving test data ({V.sub.L, V.sub.H}) comprising degraded image data and corresponding optimum image data and providing the degraded image data as an input to the modified first machine learning model; creating reconstructed image data by restoration processing of the degraded image data; determining loss data by comparing the reconstructed image data to the corresponding optimum image data; and optimizing the second machine learning model based on the loss data.
11. The method according to claim 6, wherein the weight map is generated by the optimized second machine learning model.
12. The method of claim 11, wherein the method comprises updating the optimized second machine learning model by implementing an updating process executing the steps of: receiving weighted loss data at the optimized second machine learning model; determining by the optimized second machine learning model a spatial distribution of the loss data based on the weighted loss data; and updating the optimized second machine learning model to generate a weight map to account for the spatial distribution of the loss derived from the weighted loss data.
13. The method according to claim 12, wherein the method comprises modifying the modified first machine learning model (f.sub.θ′) by combining the updated first machine learning model (f.sub.θ) with the updated optimized second machine learning model to create a second modified first machine learning model such that the second modified first machine learning model is trained to focus on regions of a degraded image which are more susceptible to degradation.
14. The method according to claim 5, wherein the restoration processing is a joint denoising and demosaicing processing and the received degraded image data is RAW image data comprising a red, green or blue value for each sampled pixel, such that the first machine learning model is trained to infer a denoised and demosaiced RGB image from the received RAW image data.
15. A device comprising an image processor, the image processor being configured to implement a method of training an image processing system, the image processing system comprising a first machine learning model (f.sub.θ), and the method comprising training the first machine learning model by executing the steps of: receiving training data comprising degraded image data (102) and corresponding optimum image data and providing the degraded image data as an input to the system; passing the degraded image data to a first machine learning model configured to create restored image data (106) by restoring the degraded image data; determining loss data (402) by comparing the restored image data to the corresponding optimum image data; combining the loss data with a weight map (204) to form weighted loss data (406) comprising the spatial distribution of the loss data; and updating the first machine learning model based on the weighted loss data.
16. The device according to claim 15, wherein the image processing system comprises a second machine learning model (g.sub.ω) and the method comprises training the second machine learning model by implementing an updating process executing the steps of: receiving the weighted loss data at a second machine learning model; determining by the second machine learning model a spatial distribution of the loss data based on the weighted loss data; and updating the weight map to account for the spatial distribution of the loss derived from the weighted loss data.
17. The device according to claim 16, wherein the updating process is repeated so as to iteratively update the weight map based on weighted loss data generated from a previous weight map and the first machine learning model.
18. The device according to claim 17, wherein in at least some iterations of the method the training data ({T.sub.L, T.sub.H}) is different from the training data received in the previous iteration of the method.
19. The device according to claim 15, wherein the method comprises modifying the first machine learning model by combining the first machine learning model (f.sub.θ) with the second machine learning model (g.sub.ω) to create a modified first machine learning model (f.sub.θ′) such that the modified first machine learning model is trained to focus on regions of a degraded image which are more susceptible to degradation.
20. The device according to claim 19, wherein the method comprises: receiving test data ({V.sub.L, V.sub.H}) comprising degraded image data and corresponding optimum image data and providing the degraded image data as an input to the modified first machine learning model; creating reconstructed image data by restoration processing of the degraded image data; determining loss data by comparing the reconstructed image data to the corresponding optimum image data; and optimizing the second machine learning model based on the loss data.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0026] The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION OF THE INVENTION
[0035] The proposed approach aims to emphasize the important characteristics of the training data and as a result improve the model's performance.
[0036] There is proposed a solution to improve image restoration processing performance through better data sampling of training data. Specifically, by using an end-to-end learning method that considers each training image pixel with a different weight. The different weights are implemented as a weight map. The weight map of each training image is learned by a gradient-based meta-task, herein also referred to as a second machine learning model g.sub.ω.
[0037] The proposed approach comprises an image processing machine learning model (or first machine learning model), learning different weights for different image samples in training based on a parallel meta-learning step using the second machine learning model. The weights are encoded on a per-pixel basis and may therefore be used to form a weight map. The first machine learning model may then be further optimized based on the performance of the machine learning model on another independent dataset.
[0038] The proposed approach comprises training the machine learning model based on the required weights for different pixels of the training images. Existing image restoration methods calculate the loss function of an image sample pair for the network f.sub.θ according to the equation:
[0039] Here, .sub.train the loss on the training set {T.sub.L, T.sub.H}
is the pixel wise loss criterion, and it is usually L1 or L2 norm. H and W are the height and width of the image sample. T.sub.L(h, w) and T.sub.H (h, w) are the intensities of the low quality, L, and high quality, H, images at pixel (h, w), respectively. Our method aims to gain a weight for each pixel. Therefore, the modified loss function
.sub.train would become:
[0040] Where W (h, w) is the weight of the pixel.
[0041] A norm is a function that measures difference between inputs. In this case, we are measuring the difference between a ground truth image, and one that is restored by the approach.
[0042] The L1 norm is a sum of the absolute difference between each matching color of each matching pixel in the ground truth and restored images. The L2 norm is a sum of the squared difference between each matching color of each matching pixel in the ground truth and restored images. In either case, if the image is perfectly restored, it will match the ground truth at every pixel, so the L1 or the L2 norm will be zero.
[0043] The norm may be used as an error signal and may be back-propagated through the network during training to adjust the network weights.
[0044]
[0045] A plurality of sample squares 206 are shown on both weight maps 202 and 204. During training machine learning models for image processing tasks it is known to implement samples of training images (and possibly also test images like those described herein), for the purposes of minimizing processing cost during training. This can make the training more computationally efficient and faster end-to-end. Samples 206 may be taken from the training and test image data based on a standard sample size defined for the specific image processing task the machine learning model is being trained for.
[0046]
[0047] The first structure 302 is the weight generator structure. The weight generator model g.sub.ω is a neural network which is trained to reweight the image pixels, also referred to herein as the second machine learning model. g.sub.ω is optimized in an outer loop 302 of the training framework. The parameters w are learned during training.
[0048] The second structure 304 is the restoration network or first machine learning model f.sub.θ. The restoration network is the neural network which reconstructs the high quality image from the corresponding low quality image. f.sub.θ is trained on an image restoration task in the inner loop 304.
[0049] The third structure 306 is the gradient-based meta-learning scheme 306 which steers the process of the outer loop 302 and the inner loop 304. In the third structure 306 the first machine learning model 104 and the second machine learning model are combined to modify the first machine learning model 308. The second machine learning model g.sub.ω has been updated using the training data set in order to improve the first model's performance in this next phase which comprises processing previously unseen held-out data, also called the meta-test data set. The created weight map 204 is also optimized in the meta-learning scheme of the third structure 306 by way of a backwards pass to the second machine learning model based on the loss from the modified first machine learning model 308. The training data which has a high chance of leading to good first model performance on the test data may be assigned with high weighting.
[0050] In
[0051] The processes may be initialized with a uniform weight map, which in one implementation of the training process may then be iteratively updated by repeating the updating process to produce an increasingly updated weight map each time until a sufficient level of convergence is reached.
[0052] The next step may then be the processing loop illustrated in the third structure 306, where the first machine learning model and the second machine learning model are combined to provide a modified first machine learning model which is additionally trained to focus on regions of a degraded image which are more susceptible to degradation. This focusing ability results from the modified model now comprising some training directly obtained from the second machine learning model. The modified first model can then be tested on test data, and the resulting loss from the modified first model may be used to further tune the second machine learning model.
[0053] In an alternative implementation, the iterative process of updating the weight map may be performed such that each iteration of the updating of the weight map is performed only after a respective iteration of the process in the third structure 306. That is, the processes of the first and second structures are performed once, and then the processes of the third structure are performed before the processes of the first and second structure are performed again.
[0054] In between iterations of either of the above implementation options the training data may or may not be changed. For example, the tiger image in the example of
[0055]
[0056] The first step of the proposed training process is to use a weight map 204 during the training of the first machine learning model, otherwise known as the image restoration network. The first iteration may comprise a weight map 204 which has a pre-defined distribution of weights, for example a uniform distribution of weights, or a distribution with a specific shape or pattern. However, in later iterations training may use a weight map 204 derived from the training data 102.
[0057] One iteration of the core training process is illustrated on the left of .sub.train(f.sub.θ (T.sub.L), T.sub.H) is calculated.
[0058] The weight map is applied 404 to the standard loss function given below, to train the image restoration network.
L′.sub.train(f.sub.θ(T.sub.L),T.sub.H)=L.sub.train(fθ(T.sub.L),T.sub.H).Math.g.sub.ω(T.sub.L). (3)
[0059] Different from the normal training procedure, the loss L.sub.train (f.sub.θ (T.sub.L, T.sub.H) is weighted by the weight map 204 and becomes L′.sub.train (f.sub.θ(T.sub.L, T.sub.H), as illustrated in equation (3) and in
[0060] Based on the weighted loss 406, it is possible to calculate a new state of the restoration network, as shown in step (3) of
θ′=θ−α∇.sub.θ(.sub.train(f.sub.θ(T.sub.L),T.sub.H).Math.g.sub.ω(T.sub.L)). (4)
[0061] Here α is the learning rate of f.sub.θ. Note that the updated parameter θ′ is a function of g.sub.ω so it is possible to update θ′ via g.sub.ω.
[0062] Thirdly, V.sub.L is input to the updated restoration network f.sub.θ′, and the meta-learner g.sub.ω is then trained to minimize the loss on the meta-test set (V.sub.L, V.sub.H) with respect to w based on the second-order gradient. This is illustrated in the right most loop of
[0063] In order to optimize g.sub.ω, there is proposed a meta-learning scheme where g.sub.ω is trained based on the gradient from the meta-test data set (V.sub.L,V.sub.H). Specifically, with the guidance of g.sub.ω, the restoration network f.sub.θ as trained with the meta-training data set is driven to perform better on the meta-test data set. That is, the second machine learning model may be trained using the output loss 408 from the test data set {VL, VH} as processed by the modified restoration network.
[0064] Finally, after g.sub.ω is updated, a new iteration of the training process may be started, and the restoration network can then be further updated and modified with the optimized weight map.
[0065] The training process may also be summarized as in the below example code:
[0066] Require: [0067] {T.sub.L, T.sub.H}: meta training data, {V.sub.L, V.sub.H}: meta test data. [0068] g.sub.ω(T.sub.L): training set weight generator, f.sub.θ(T.sub.L): restoration network.
1: initialize g.sub.ω and f.sub.θ
2: for each iteration do
3: Sample a batch of meta data {T.sub.L, T.sub.H} and {V.sub.L, V.sub.H}.
4: Compute the weight map g.sub.ω (T.sub.L) for data T.sub.L.
5: θ′=θ. >Inner loop, one iteration may be adequate
6: for a sufficient number of times do
7: Calculate a new θ′ with gradient: θ′=θ′−α∇.sub.θ.sub.train((f.sub.θ(T.sub.L), T.sub.H).Math.g.sub.ω(T.sub.L)).
8: Update g.sub.ω with meta-gradient upon θ′ with respect to ω: >Outer loop. ω←ω−β∇.sub.ω L.sub.val(f.sub.θ′(V.sub.L), V.sub.H).
9: Update f.sub.θ with renewed weight map: θ′=θ′— α∇.sub.θL.sub.train((f.sub.θ(T.sub.L), T.sub.H).Math.g.sub.ω(T.sub.L)).
[0069] Although the calculation of the second-order gradient requires high computation, it can be calculated efficiently using the finite difference approximation. Specifically, the parameter of g.sub.ω is updated as
ω′=ω−β∇.sub.ω(L.sub.val(f.sub.θ′(V.sub.L),(V.sub.H))). (5)
[0070] Here, β is the right most loop of
[0071] According to the chain rule, the gradient in the second term of Eq. 4 can be rewritten as follows.
∇.sub.ω(L.sub.val(f.sub.θ′(V.sub.L),V.sub.H))=−α∇.sub.ω,θ.sup.2(L.sub.train(f.sub.θ(T.sub.L),T.sub.H).Math.g.sub.ω(T.sub.L))∇.sub.θ′L.sub.val(f.sub.θ′(V.sub.L),V.sub.H). (7)
[0072] With the finite difference approximation, the right side of Eq. 7 can be rewritten as
[0073] The small scalar ∈ is emprically chosen as
[0074] As a result of the approximation, the gradient in Eq. 4 can be calculated with two forward and two backward passes. The computation complexity may be reduced from O(θω) to O(θ+ω).
[0075] The above series of mathematical steps of the training process are described again below in a structure by structure format similar to the structures of
[0076] The first step may be considered as training the first machine learning model f.sub.θ by using training data comprising degraded image data and corresponding optimum image data where the degraded image data is provided as the input to the first machine learning model. The degraded image data, having been provided to the first machine learning model, is restored based on the restoration processing configured be provided by the first machine learning model in order to create restored image data. The image processing system may then determine loss data by comparing the restored image data to the corresponding optimum image data. The loss data may then be combined with a weight map to form weighted loss data which comprises the spatial distribution of the loss data. A first backwards pass of the training process updates the first machine learning model based on the calculated weighted loss data. This process is shown in
[0077] The training of the second machine learning model g.sub.ω may be achieved by implementing an updating process. The updating process is indicated in
[0078] As described elsewhere herein, the weight map updating process may be repeated so as to iteratively update the weight map based on weighted loss data generated from a previous weight map and the first machine learning model. In a yet further iteration the first machine model may be an updated first machine learning model which has been updated to account for a previously updated weight map. It should be appreciated that in at least some iterations of the above described method, the training data may be different from the training data received in the previous iteration of the method. For example, in
[0079] The next step in the method comprises modifying the first machine learning model by combining the first machine learning model with the second machine learning model. This step is shown in
[0080] The modified first machine learning model may then be tested on unseen test data. In a similar process to the initial training of the first machine learning model, test data comprising degraded image data is provided as an input to the modified first machine learning model. The modified first machine learning model is then implemented to create reconstructed image data by restoration processing of the degraded image data. Loss data can subsequently be determined by comparing the reconstructed image data to corresponding optimum image data. However, in the training of the modified first machine learning model with test data, the loss data is not combined with weight map data. This is because, as explained above, the weight map is now intrinsically part of the modified first model. The loss data from the test data may instead be used in a second backwards pass to optimize the second machine learning model. This backwards pass step is shown in
[0081] The updated first machine learning model may be further trained by generating weight maps for further training image data using the now optimized second machine learning model. That is, the updated first machine learning model may be trained according to the above described method of step (1) but wherein the weight map is generated by the optimized second machine learning model having being previously been trained according to step (4) of the method described above.
[0082] Again, a further round of the above described training loops may ensue, where the optimized second machine learning model is updated by implementing the updating process described above in relation to step (2) of
[0083] Ultimately the above described training method and its various loops may be combined together to result in modifying the modified first machine learning model in a similarly iterative manner, combining the modified first machine learning model with the updated optimized second machine learning model to create a second modified first machine learning model. The second modified machine learning model trained to focus its image restoration processing on regions of a degraded image which are more susceptible to degradation.
[0084] In a specific implementation of the above described training method the restoration processing may be a joint denoising and demosaicing processing. In this specific case the received degraded image data may be RAW image data comprising a red, green or blue value for each sampled pixel. Thus, the first machine learning model may be trained to infer a denoised and demosaiced RGB (red, green, blue) image from the received RAW image data.
[0085]
[0086] In one example implementation, the restoration network f.sub.θ may be a convolutional neural network. In this implementation the residual network may comprise sixteen residual blocks with a convolution layer and a rectified linear unit (ReLU) activation layer.
[0087] The machine learning model g.sub.ω may also be formulated as a convolutional neural network in an encoder-decoder architecture, with four downsampling layers and four upsampling layers. To ensure that the generated weight map is always non-negative, a ReLU function may be applied on the output of the machine learning model g.sub.ω.
[0088] As discussed above, during the training process, the training dataset may be split into two subsets: the meta-training set (T.sub.L, T.sub.H) and the meta-test set (V.sub.L, V.sub.H). The sets (T.sub.L, T.sub.H) and (V.sub.L, V.sub.H) may be swapped between iterations and in some implementations they may be swapped between every iteration.
[0089] The proposed approach as described above may have multiple advantages over previous approaches. For example, the proposed approach may provide an improved image processing performance without extra computation during inference. This is because compared with conventional methods, the proposed approach only requires extra computation in training.
[0090] The proposed approach may also have improved robustness on imbalanced training data. In low-level vision tasks, it is difficult to balance the training data regarding image characteristics since the image characteristics are hard to describe or quantify and they are likely to be local. A model could overfit on the basic patterns in the dataset but overlook the hard or rare patterns. The proposed approach may reweight the training data and thus result in a more robust model.
[0091] The proposed approach learns how to infer a weight map in an end-to-end fashion without using a separate or pre-training process in the training. The training is instead performed in a nested loop configuration, with loops placed in parallel portions of the training structure.
[0092] The present approach is widely applicable for many low-level vision problems which can be rectified with restoration image processing, including joint denoising and demosaicing, super-resolution, and deblurring.
[0093] The proposed image restoration processing method has been applied to multiple low-level vision tasks including image demosaicing, denoising, super-resolution, and deblurring.
[0094]
[0095]
[0096]
[0097] The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.