Image data processing

Abstract

A method of configuring an image encoder emulator. Input image data is encoded at an encoding stage comprising a network of inter-connected weights, and decoded at a decoding stage to generate a first distorted version of the input image data. The first distorted version is compared with a second distorted version of the input image data generated using an external encoder to determine a distortion difference score. A rate prediction model is used to predict an encoding bitrate associated with encoding the input image data to a quality corresponding to the first distorted version. A rate difference score is determined by comparing the predicted encoding bitrate with an encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version. The weights of the encoding stage are trained based on the distortion difference score and the rate difference score.

Claims

1. A computer-implemented method of configuring an image encoder emulator for use in image data processing, the method comprising: receiving input image data representing at least one image; encoding, at an encoding stage of the image encoder emulator, the input image data to generate encoded image data, the encoding stage comprising a network of inter-connected weights; decoding, at a decoding stage of the image encoder emulator, the encoded image data to generate a first distorted version of the input image data; receiving a second distorted version of the input image data generated using an external image encoder; comparing the first distorted version of the input image data with the second distorted version of the input image data to determine a distortion difference score; predicting, using a rate prediction model, a first encoding bitrate associated with encoding the input image data to a quality corresponding to the first distorted version; receiving a second encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; determining a rate difference score by comparing the predicted encoding bitrate with the received second encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; and training the weights of the encoding stage based on the distortion difference score and the rate difference score, thereby to configure the image encoder emulator to emulate behavior of the external image encoder.

2. The method of claim 1, further comprising using an output of the image encoder emulator to train an image preprocessing network configured to preprocess images prior to encoding the preprocessed images with an external image encoder.

3. The method of claim 1, further comprising training the weights of the encoding stage using a back-propagation method to minimize the distortion difference score and/or the rate difference score.

4. The method of claim 1, wherein the decoding stage comprises a set of inter-connected learnable weights, the method comprising training the weights of the decoding stage using a back-propagation method to minimize the distortion difference score and/or the rate difference score.

5. The method of claim 1, wherein the encoding stage and the decoding stage each comprise a convolutional neural network having a U-Net architecture, where layers of the encoding stage are mirrored by layers of the decoding stage.

6. The method of claim 1, wherein the rate prediction model comprises an artificial neural network configured to compress input features from the input image data into feature maps of smaller dimensions and combine the feature maps with a fully-connected neural network to predict the bitrate needed to compress the input image data.

7. The method of claim 1, wherein the distortion difference score is obtained using a distance measure comprising one or more of: a mean absolute error, MAE, a mean squared error, MSE, a normalized MAE, and a normalized MSE.

8. The method of claim 1, wherein the input image data represents a sequence of images, the method comprising using features from neighboring images in the sequence of images to generate the first distorted version and/or the predicted encoding bitrate.

9. The method of claim 1, wherein the comparing the first distorted version of the input image data with the second distorted version of the input image data is performed using a learnable discrimination model that is configured to distinguish between distortions generated via the encoding stage and the decoding stage, and ground truth distortions produced using the external encoder.

10. The method of claim 9, further comprising training the weights of the encoding stage and the discrimination model sequentially, in a series of iterations.

11. The method of claim 1, further comprising: generating a distortion map comprising, for each of a plurality of spatial regions in the first distorted version of the input image data, a distinguishability measure indicating a confidence level in a fidelity of observed distortions in the first distorted version to learned features derived from ground truth distortions; and determining the distortion difference score using the distortion map.

12. The method of claim 1, further comprising quantifying distortions in the first distorted version of the input image data using an image quality score comprising one or more of: peak-signal-to-noise ratio, structural similarity index, SSIM, multiscale quality metrics such as a detail loss metric or multiscale SSIM, metrics based on multiple quality scores and data-driven learning and training, such as a video multi-method assessment fusion, VMAF, and aesthetic quality metrics.

13. The method of claim 1, wherein the external encoder comprises a standards-based image encoder such as an ISO JPEG or ISO MPEG encoder, or a proprietary or royalty-free encoder, such an AOMedia encoder.

14. A computing device comprising: a processor; and a memory, wherein the computing device is arranged to perform, using the processor, a method comprising: receiving input image data representing at least one image; encoding, at an encoding stage of an image encoder emulator, the input image data to generate encoded image data, the encoding stage comprising a network of inter-connected weights; decoding, at a decoding stage of the image encoder emulator, the encoded image data to generate a first distorted version of the input image data; receiving a second distorted version of the input image data generated using an external image encoder; comparing the first distorted version of the input image data with the second distorted version of the input image data to determine a distortion difference score; predicting, using a rate prediction model, a first encoding bitrate associated with encoding the input image data to a quality corresponding to the first distorted version; receiving a second encoding, bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; determining a rate difference score by comparing the predicted encoding bitrate with the received second encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; and training the weights of the encoding stage based on the distortion difference score and the rate difference score, thereby to configure the image encoder emulator to emulate behavior of the external image encoder.

15. A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by a processor of a computing device, cause the computing device to perform a method, the method comprising: receiving input image data representing at least one image; encoding, at an encoding stage of an image encoder emulator, the input image data to generate encoded image data, the encoding stage comprising a network of inter-connected weights: decoding, at a decoding stage of the image encoder emulator, the encoded image data to generate a first distorted version of the input image data; receiving a second distorted version of the input image data generated using an external image encoder; comparing the first distorted version of the input image data with the second distorted version of the input image data to determine a distortion difference score; predicting, using a rate prediction model, a first encoding bitrate associated with encoding the input image data to a quality corresponding to the first distorted version; receiving a second encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; determining a rate difference score by comparing the predicted encoding bitrate with the received second encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version; and training the weights of the encoding stage based on the distortion difference score and the rate difference score, thereby to configure the image encoder emulator to emulate behavior of the external image encoder.

Description

DESCRIPTION OF THE DRAWINGS

(1) Embodiments of the present disclosure will now be described by way of example only with reference to the accompanying schematic drawings of which:

(2) FIG. 1 is a schematic diagram showing a method of processing image data in accordance with embodiments;

(3) FIGS. 2(a) to 2(c) are schematic diagrams showing a preprocessing network in accordance with embodiments;

(4) FIG. 3 is a schematic diagram showing a preprocessing network in accordance with embodiments;

(5) FIG. 4 is a schematic diagram showing a training process in accordance with embodiments;

(6) FIG. 5 is a schematic diagram showing a training process in accordance with embodiments;

(7) FIG. 6 is a schematic diagram showing a learnable rate-distortion model in accordance with embodiments;

(8) FIG. 7 is a table showing distortion and rate estimates in accordance with embodiments;

(9) FIG. 8 is a flowchart showing the steps of a method in accordance with embodiments; and

(10) FIG. 9 is a schematic diagram of a computing device in accordance with embodiments.

DETAILED DESCRIPTION

(11) Embodiments of the present disclosure are now described.

(12) FIG. 1 is a schematic diagram showing a method of processing image data, according to embodiments. In particular, FIG. 1 shows the components of an image or video precoding (or ‘preprocessing’) system, followed by standard encoding and decoding with a third-party (external) encoder and decoder. Embodiments are applicable to batch processing, i.e. processing a group of images or video frames together without delay constraints (e.g., an entire video sequence), as well as to stream processing, i.e. processing only a limited subset of a stream of images or video frames, or even a select subset of a single image, e.g. due to delay or buffering constraints.

(13) The precoder can be a deep video precoding system [18] with quality-rate loss (shown as ‘Q-R loss’ in the leftmost box of FIG. 1). This precoding typically consists of a resizer and a deep quality-rate optimizer component. The resizer can downscale or upscale the input using a non-linear filter, or an artificial neural network. The effect of the resizer is inverted at the post-decoding resizing component on the right-hand side of FIG. 1 and the recovered pixel groups can form a recovered image of the original resolution to be displayed to a viewer after an optional post-processing component, which can be a linear or non-linear filter or an artificial neural network that enhances aesthetic or perceptual aspects of the recovered image. In-between the output of the deep video precoding with Q-R loss and the decoder, an external image or video encoder is used, which may comprise any ISO PEG or ISO MPEG, or AOMedia encoder or other proprietary encoder. In addition, as shown in FIG. 1, the produced bitstream from the encoder can be stored or transmitted over a network to the corresponding decoder.

(14) The deep quality-rate optimizer (DQRO) shown in FIG. 1 comprises any combination of weights connected in a network and having a non-linear function (akin to an activation function of an artificial neural network). An example of such weights is shown in FIG. 2(a), which depicts a combination of inputs x.sub.0, . . . , x.sub.3 with weight coefficients θ and non-linear activation function g( ). The trained embodiments of DQRO instantiations comprise multiple layers of weights and activation functions. An example of the connectivity between weights and inputs is shown in FIG. 2(b), which depicts schematically layers of interconnected weights, forming an artificial neural network. Such embodiments are trained with back-propagation of errors computed at the output layer, using gradient descent methods, as shown in the embodiment of FIG. 2(c). In particular, FIG. 2(c) shows diagram of back-propagation of errors from an intermediate layer (right side of FIG. 2(c)) to the previous intermediate layer using gradient descent.

(15) An example of a trainable component of DQRO is shown in FIG. 3. It consists of a cascade of convolutional (Cony (k×k)) and parametric ReLu (pReLu) layers of weights and activation functions mapping input pixel groups to transformed output pixel groups, e.g. as described with reference to FIG. 2(b). Convolutional layers extend the embodiment in FIG. 2(b) to multiple dimensions, by performing convolution operations between multi-dimensional filters of fixed kernel size (k×k) with learnable weights and the inputs to the layer. Each activation in the output of the convolutional layer only has local (not global) connectivity to a local region of the input. The connectivity of the cascade of convolutional layers and activation functions can also include skip connections, as shown by the connection from the output of the leftmost “Cony (3×3)” layer of FIG. 3 to the summation point of FIG. 3. In addition, the entirety of the cascade of multiple layers (also known as a deep neural network) is trainable end-to-end based on back-propagation of errors from the output layer backwards (e.g. as shown in FIG. 2(c)), using gradient descent methods.

(16) In order to optimize the design of the DQRO according to rate and distortion induced by an external video encoder, a deep generative rate-distortion modeler (DGRDM) is designed and deployed as a system. An example of the interaction between the DGRDM and the DQRO is shown in FIG. 4, where it is emphasized that both systems are trainable based on data. As such, FIG. 4 is a schematic diagram showing iterative training of the deep precoding model and the deep generative rate-distortion model. The terms “Standard Encoder” and “Standard Decoder” used in FIG. 4 refer to any external (third-party) encoder/decoder, the implementation details of which may not be known, but they can be used as black boxes. The training can happen in a number of iterations, e.g. where the DQRO is trained with the weights and instantiation of DGRDM being frozen, and vice-versa. The remainder of the detailed description focuses on the training and use of the DGRDM, as well as its interaction with the DQRO.

(17) An example illustration of the framework for training the DGRDM is shown in FIG. 5. That is, FIG. 5 is a schematic diagram detailing the training process of a generative rate-distortion model. The generative rate-distortion and rate prediction models are trained separately, where both are blind to all operations occurring outside the dashed lines. Dashed lines indicate loss signals used for training different components of the rate-distortion model. The fidelity of generated distortion predictions x.sub.d is measured by a discrimination mod& D(x, x.sub.d, x.sub.e) and an added absolute error term |x.sub.d−x.sub.e|.

(18) The generative model predicts codec distortions as learned directly from sets of representative data, and is complementary to the training process of the DQRO. That is, the generative rate-distortion model is trained separately, and is agnostic to all training operations of the DQRO. Training the generative rate-distortion model comprises multiple components, which will now be described with reference to FIG. 5.

(19) The generative model comprises two stages: (i) an encoding stage, where input images x, are mapped onto a low-dimensional space as latent codes z, and (ii) a decoding stage, where latent codes are mapped hack to a space of equal dimensionality to that of x, to represent distorted frames x.sub.d after predicted codec distortions are applied. Produced distortion predictions are then evaluated by a complementary distortion accuracy assessment model, which uses actual video codecs to produce frame encodings x.sub.c that represent ground truths of frame distortions (i.e. predictions are considered entirely accurate wherever x.sub.d=x.sub.e). FIG. 5 provides an overview of an example architecture which may be used for predicting x.sub.d, where encoding and decoding are respectively implemented as convolutional and deconvolutional architectures. The generative model adopts a U-Net architecture, where feature maps of layers preceding the latent code z are mirrored into counterpart layers occurring after the latent code z. In other words, when the total number of layers is N, the output of each i.sup.th layer is additionally forwarded to the (N−i).sup.th layer as input. Alternatively, the same architecture can be used without skip connections between layers, where this can be done to trade-off the accuracy of generated distortions for reduced complexity of generation.

(20) A distortion accuracy assessment function is used to evaluate predicted codec distortions. In embodiments, predicted codec distortions are evaluated with two measures, where: (i the first measure is provided by a learnable discrimination model that attempts to distinguish generated distortions x.sub.d from ground truth distortions x.sub.e as produced from a specified video coding standard (e.g., AVC/H264, HEVC, or VP9), and (ii) the second measure is calculated as the mean absolute distance |x.sub.d-x.sub.e| between generated frames x.sub.d and ground truth frames x.sub.e that include distortions as produced by a specified video coder. The measures of (i) and (ii) are then translated to a training signal to use for training the generative model. Similar to the generative model G(x), the discrimination model D(x, x.sub.d, x.sub.e) is implemented using a convolutional neural network architecture. The discrimination model produces two-dimensional maps L.sub.D(x, x.sub.d, x.sub.e) that describe the likelihood of generated distortions being sampled from the output of a standard video coder. The two-dimensional maps of L.sub.D(x, x.sub.d, x.sub.e) produce a distinguishability measure for each local spatial region of generated frames x.sub.d. That is, in each respective local region, points in two-dimensional maps express the confidence of the model in the fidelity of the observed distortions to learned features from distortion ground truths. By increasing the dimensions of output maps, higher frequencies of input distortions can be assessed. The confidence map is then averaged to yield the overall confidence score of the distortion assessment model in deciding whether or not observed distortions were generated. The overall confidence measure is subsequently combined with an L1 loss between ground truth distortions (from x.sub.e) and generated distortions (from x.sub.d).

(21) The generative model and distortion assessment models are trained sequentially, where each step of training iterates between updating the weights of each model. Optionally, the ratio of steps dedicated to training each model can be tuned such that the generative model receives more weight updates than the distortion assessment model. The latter option can be used to ensure the generative model exhaustively learns to surpass the knowledge of the distortion assessment model. It should also be noted that all components of the learnable rate-distortion model may be fully trained before training the precoder (DQRO) embodiment.

(22) The example rate prediction model R(x) shown in FIG. 5 predicts coding rates directly from input frames (x). Rate predictions are produced as outputs of a learnable mod& trained via a signal that measures the Mean Square Error (MSE) between predicted rates and ground truth rates returned by a specified video encoder (e.g., AVC/H264 or HEM. The rate prediction model is implemented as a convolutional neural network architecture to gradually compress layer feature maps until the last layer, where learned features are combined with a multi-layer-perceptron to predict a scalar value approximating the bitrate required for compressing input frames. Optionally, the input of the rate prediction model can take in full groups-of-frames (GoP) to improve rate predictions by assessing both the spatial features in each frame and the spatial features included in neighboring frames (which are commonly used for inter-prediction in the block models of ubiquitous video codecs). The rate prediction mod& is trained separately from the generative and distortion assessment models.

(23) FIG. 6 is a schematic diagram showing the learnable rate-distortion model useable for training the precoder of FIG. 4. FIG. 6 also shows an example of the convolutional architecture used for rate prediction. The discriminator is excluded when generating distortions for a given precoding training environment, and only the components needed for producing x.sub.d are retained. Dashed lines indicate the architecture of the generative distortion model and rate prediction model. Deconvolutional layers of the generative model increase the dimensions of output feature maps, until eventually their outputs reach the appropriate dimensions of x.sub.d (which are equal to the height and width of inputs x). Note that the rates produced by the rate prediction model are eventually used to calculate the precoding rate loss of the precoder. The components described previously serve to predict image or video codec distortions in a differentiable manner.

(24) To test the methods described herein, i.e. by emulating distortion effects from encoding, a generative model was used with a composite loss function. One component of this composite loss function is a discriminatory loss function calculated on local spatial neighborhoods. That is, a separate discrimination loss was produced for each local region of generated frames. To capture low frequencies of distortions, an L1 loss was used, while SSIM and a combined L1+SSIM loss have also been evaluated. Concerning the rate model, an MSE loss was used for training the rate prediction model, which provided for an average at 10% in relative rate estimation error. For the generative model, a U-Net architecture [19] was used, where skip connections are added to later layers. This type of architecture was shown to work well for style transfer, and is amongst the best performing architecture.

(25) A curated version of the “Kinetics-720p” dataset was used for training, in some embodiments. Validation took place on a subset of the XIPH-1080p sequence dataset, so there is a notable domain shift in testing. Under this setup, indicative distortion and rate estimation results for the HEVC encoder (under its x265 implementation) are shown in FIG. 7. In particular, VMAF distortion metric and rate estimation relative error are shown for HEVC encoding (under its x265 implementation) with: “medium” encoding preset, various CRF values, various RPS modes (0=intra, 1=inter prediction with 1 reference frame), and various input modes for the deep generative RD modeler. The results validate that, not only the relative rate estimation error is in the vicinity of 10%, but the VMAF distortion estimation is within 1.5-2 points for the entire range of configurations tested. Beyond the presented embodiments, the methods described herein can be realized with the full range of options and adaptivity described in the previous examples, and al such options and their adaptations are covered by this disclosure.

(26) FIG. 8 shows a method 800 for configuring an image encoder emulator for use in image data processing. The method 800 may be performed by a computing device, according to embodiments. The method 800 may be performed at least in part by hardware and/or software. At item 810, input image data representing at least one image is received. At item 820, the input image data is encoded at an encoding stage comprising a network of inter-connected weights to generate encoded image data. At item 830, the encoded image data is decoded at a decoding stage to generate a first distorted version of the input image data. At item 840, the first distorted version of the input image data is compared with a second distorted version of the input image data generated using an external image encoder (and a decoder) to determine a distortion difference score. At item 850, an encoding bitrate associated with encoding the input image data to a quality corresponding to the first distorted version is predicted, using a rate prediction model. At item 860, a rate difference score is determined by comparing the predicted encoding bitrate with an encoding bitrate used by the external encoder to encode the input image data to a quality corresponding to the second distorted version. At item 870, the weights of the encoding stage are trained (e.g. optimized) based on the distortion difference score and the rate difference score.

(27) In embodiments, the method 800 comprises using an output of the (configured) image encoder emulator to train an image preprocessing network configured to preprocess images prior to encoding the preprocessed images with an external image encoder. In embodiments, the method 800 additionally comprises using the trained preprocessing network to preprocess images prior to encoding with the external image encoder. In embodiments, the method 800 comprises encoding the preprocessed images with the external image encoder. The encoded images may be transmitted, for example to a client device for decoding and subsequent display.

(28) Embodiments of the disclosure include the methods described above performed on a computing device, such as the computing device 900 shown in FIG. 9. The computing device 900 comprises a data interface 901, through which data can be sent or received, for example over a network. The computing device 900 further comprises a processor 902 in communication with the data interface 901, and memory 903 in communication with the processor 902. In this way, the computing device 900 can receive data, such as image data or video data, via the data interface 901, and the processor 902 can store the received data in the memory 903, and process it so as to perform the methods of described herein, including configuring an image encoder emulator, using the configured image encoder emulator to train a preprocessing network for preprocessing image data prior to encoding using an external encoder, preprocessing image data using the preprocessing network, and optionally encoding the preprocessed image data.

(29) Each device, module, component, machine or function as described in relation to any of the examples described herein may comprise a processor and/or processing system or may be comprised in apparatus comprising a processor and/or processing system. One or more aspects of the embodiments described herein comprise processes performed by apparatus. In some examples, the apparatus comprises one or more processing systems or processors configured to carry out these processes. In this regard, embodiments may be implemented at least in part by computer software stored in (non-transitory) memory and executable by the processor, or by hardware, or by a combination of tangibly stored software and hardware (and tangibly stored firmware). Embodiments also extend to computer programs, particularly computer programs on or in a carrier, adapted for putting the above described embodiments into practice. The program may be in the form of non-transitory source code, object code, or in any other non-transitory form suitable for use in the implementation of processes according to embodiments. The carrier may be any entity or device capable of carrying the program, such as a RAM, a ROM, or an optical memory device, etc.

(30) Various measures (including methods, apparatus, computing devices and computer program products) are provided for generating image or video distortions (called “generative method”), with these distortions resembling those imposed in the specific image or video when they are processed by an external image or video encoder. The generation method uses the following five steps: (i) an encoding stage, where the input image or video is encoded into a lower-dimensional representation, also known as a latent code; (ii) a decoding stage, where the latent code, or a set of multiple latent codes generated from the input image or video, is mapped into a space of equal dimensionality to the input to represent a distorted version of the input image or video; (iii) the distorted version of the input image or video is coupled with a rate prediction model that estimates the encoding bitrate in bits-per-second or bits-per-pixel to encode the input image or video to the quality that corresponds to the distorted version when using an external image or video encoder, with the rate prediction model comprising a neural network that gradually compresses input features produced from the input image or video into feature maps of smaller dimensions until the last layer of the system and the learned features are combined with a fully-connected neural network to predict the bitrate needed to compress the input image or video; (iv) the measurement of rate and distortion difference between the cascade of steps (i) and (ii) versus the rate and distortion of an external image or video encoder processing the same image or video, where the rate is measured in bits-per-pixel or bits-per-second and distortion is measured using multiple computer-implemented methods that can optionally include perceptual quality metrics measuring visual quality as assessed by human viewers. Back-propagation of the rate and distortion difference from step (iii) is performed and the weights of the encoding and decoding stages of steps (i) and (ii) adjusted based on gradient descend methods.

(31) In embodiments, the generative method is combined with an image or video pre-processor or precoder system that uses the rate and distortion estimates in order to improve the signal quality or perceptual quality of the input images or video prior to the actual encoding of these by an external image or video encoder.

(32) In embodiments, the encoding and decoding stages comprise one or more convolutional neural networks, the parameters of which are trained with back-propagation methods in order to maximize the similarity score.

(33) In embodiments, the convolutional neural network uses a “U-Net” neural network architecture, where layers of the encoding stage are mirrored in layers of the decoding stage.

(34) In embodiments, the generative method and rate estimation method of steps (i)-(iii) and the rate and distortion difference measurement method of step (iv) are trained sequentially one after the other, in a series of iterations.

(35) In embodiments, the rate and distortion difference measurement of step (iv) is using a distance measure between the generated images from steps (i) and (ii) and the decoded images from an external image or video encoding and decoding system that includes one or more of: the mean absolute error (MAE), the mean squared error (MSE), normalized versions of the MAE or MSE.

(36) In embodiments, distortion maps are generated for areas of the input image or video that express the confidence of the distortion assessment method against known ground-truth distortions; the distortion maps can be assessed manually, or can be numerically processed to generate average scores for different areas of images or video, expressing if distortions are visually annoying and if human viewers find them visually similar to the ones produced by ground-truth results with external image or video encoders.

(37) In embodiments, groups of images are used to improve the distortion and rate estimation by using the features extracted from neighboring frames.

(38) In embodiments, the input is downscaled or upscaled using a linear or non-linear filter, or a learnable method based on data and back-propagation based training with gradient descent methods.

(39) In embodiments, the utilized encoder is a standards-based image or video encoder such as an ISO JPEG or ISO MPEG standard encoder, or a proprietary or royalty-free encoder, such as, but not limited to, an AOMedia encoder.

(40) In embodiments, the generative method is combined with an image preprocessing method that is enhancing the input according to distortion or perceptual optimization prior to actual encoding.

(41) In embodiments, the distortion is quantified by one or more of the following objective, perceptual or aesthetic image quality scores: peak-signal-to-noise ratio, structural similarity index metric (SSIM), multiscale quality metrics such as the detail loss metric or multiscale SSIM, metrics based on multiple quality scores and data-driven learning and training, such as the video multi-method assessment fusion (VMAF), or aesthetic quality metrics, such as those described by Deng, Y., Loy, C. C. and Tang, X., in their article: “Image aesthetic assessment: An experimental survey”. IEEE Signal Processing Magazine, 34(4), pp. 80-106, 2017″ and variations of those metrics.

(42) While the present disclosure has been described and illustrated with reference to particular embodiments, it will be appreciated by those of ordinary skill in the art that the disclosure lends itself to many different variations not specifically illustrated herein.

(43) Where in the foregoing description, integers or elements are mentioned which have known, obvious or foreseeable equivalents, then such equivalents are herein incorporated as if individually set forth. Reference should be made to the claims for determining the true scope of the present invention, which should be construed so as to encompass any such equivalents. It will also be appreciated by the reader that integers or features of the disclosure that are described as preferable, advantageous, convenient or the like are optional and do not limit the scope of the independent claims. Moreover, it is to be understood that such optional integers or features, whilst of possible benefit in some embodiments of the disclosure, may not be desirable, and may therefore be absent, in other embodiments.

REFERENCES

(44) [1] Dong, Jie, and Yan Ye. “Adaptive downsampling for high-definition video coding.” IEEE Transactions on Circuits and Systems for Video Technology 24.3 (2014): 480-488.

(45) [2] Douma, Peter, and Motoyuki Koike. “Method and apparatus for video upscaling.” U.S. Pat. No. 8,165,197. 24 Apr. 2012.

(46) [3] Su, Guan-Ming, et al. “Guided image up-sampling in video coding.” U.S. Pat. No. 9,100,660. 4 Aug. 2015.

(47) [4] Shen, Minmin, Ping Xue, and Ci Wang. “Down-sampling based video coding using super-resolution technique.” IEEE Transactions on Circuits and Systems for Video Technology21.6 (2011): 755-765.

(48) [5] van der Schaar, Mihaela, and Mahesh Balakrishnan. “Spatial scalability for fine granular video encoding.” U.S. Pat. No. 6,836,512. 28 Dec. 2004.

(49) [6] Boyce, Jill, et al. “Techniques for layered video encoding and decoding.” U.S. patent application Ser. No. 13/738,138.

(50) [7] Dar, Yehuda, and Alfred M. Bruckstein. “Improving low bit-rate video coding using spatio-temporal down-scaling.” arXiv preprint arXiv:1404.4026 (2014).

(51) [8] Martemyanov, Alexey, et al. “Real-time video coding/decoding.” U.S. Pat. No. 7,336,720. 26 Feb. 2008.

(52) [9] Nguyen, Viet-Anh, Yap-Peng Tan, and Weisi Lin. “Adaptive downsampling/upsampling for better video compression at low bit rate.” Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on. IEEE, 2008.

(53) [10] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the dimensionality of data with neural networks.” science313.5786 (2006): 504-507.

(54) [11] van den Oord, Aaron, et al. “Conditional image generation with pixelcnn decoders.” Advances in Neural Information Processing Systems. 2016.

(55) [12] Theis, Lucas, et al. “Lossy image compression with compressive autoencoders.” arXiv preprint arXiv:1703.00395(2017).

(56) [13] Wu, Chao-Yuan, Nayan Singhal, and Philipp Krähenbülhl. “Video Compression through Image Interpolation.” arXiv preprint arXiv:1804.06919 (2018).

(57) [14] Rippel, Oren, and Lubomir Bourdev. “Real-time adaptive image compression.” arXiv preprint arXiv:1705.05823 (2017).

(58) [15] Wang, Shiqi, et al. “SSIM-motivated rate-distortion optimization for video coding.” IEEE Transactions on Circuits and Systems for Video Technology 22.4 (2011): 516-529.

(59) [16] Li, Chenglin, et al. “Delay-power-rate-distortion optimization of video representations for dynamic adaptive streaming.” IEEE Transactions on Circuits and Systems for Video Technology 28.7 (2017): 1648-1664.

(60) [17] Helmrich, Christian, et al. “Perceptually Optimized Bit-Allocation and Associated Distortion Measure for Block-Based Image or Video Coding.” 2019 Data Compression Conference (DCC). IEEE, 2019.

(61) [18] Xu, Bin, et al. “CNN-based rate-distortion modeling for H. 265/HEVC.” 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 2017.

(62) [19] Zhu, Shiping, and Ziyao Xu. “Spatiotemporal visual saliency guided perceptual high efficiency video coding with neural network.” Neurocomputing 275 (2018): 511-522.

(63) [20] E. Bourtsoulatze, A. Chadha, I. Fadeev, V. Giotsas, Y. Andreopoulos, “Deep video precoding,” IEEE Trans. on Circ. and Syst. for Video Technol., to appear in 2020.

(64) [21] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

(65) [22] Golub, Gene H., and Charles F. Van Loan. Matrix computations. Vol. 3. JHU Press, 2012.

(66) [23] Deng, Y., Loy, C. C. and Tang, X., “Image aesthetic assessment: An experimental survey,” IEEE Signal Processing Magazine, 34(4), pp. 80-106, 2017″.

Image data processing

Assignee

Inventors

Cpc classification

Classification Explorer

H04N19/103

ELECTRICITY

Classification Explorer

H04N19/176

ELECTRICITY

Classification Explorer

H04N19/12

ELECTRICITY

Classification Explorer

H04N19/85

ELECTRICITY

Classification Explorer

G03G15/5004

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

H04N19/61

ELECTRICITY

Classification Explorer

H04N19/177

ELECTRICITY

Classification Explorer

H04N21/2662

ELECTRICITY

Classification Explorer

H04N19/86

ELECTRICITY

Classification Explorer

H04N21/23439

ELECTRICITY

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/65

ELECTRICITY

Classification Explorer

G06N3/088

PHYSICS

Classification Explorer

H04N19/126

ELECTRICITY

Classification Explorer

H04N19/172

ELECTRICITY

Classification Explorer

H04N19/184

ELECTRICITY

Classification Explorer

H04N19/44

ELECTRICITY

Classification Explorer

H04N19/136

ELECTRICITY

Classification Explorer

H04N19/154

ELECTRICITY

Classification Explorer

H04N19/182

ELECTRICITY

Classification Explorer

H04N19/124

ELECTRICITY

International classification

Classification Explorer

H04N7/12

ELECTRICITY

Classification Explorer

H04N19/126

ELECTRICITY

Classification Explorer

H04N19/65

ELECTRICITY