Encoding and decoding image data
11582481 · 2023-02-14
Assignee
Inventors
- Djordje Djokovic (London, GB)
- Ioannis Andreopoulos (London, GB)
- Ilya Fadeev (London, GB)
- Srdjan Grce (London, GB)
Cpc classification
G06N3/082
PHYSICS
H04N19/85
ELECTRICITY
H04N19/59
ELECTRICITY
H04N19/587
ELECTRICITY
G06N3/049
PHYSICS
H04N19/192
ELECTRICITY
International classification
H04N19/59
ELECTRICITY
H04N19/85
ELECTRICITY
Abstract
Certain aspects of the present disclosure provide techniques for encoding image data for one or more images. In one embodiment, a method includes the steps of downscaling the one or more images, and encoding the one or more downscaled images using an image codec. Another embodiment concerns a computer-implemented method of decoding encoded image data, and a computer-implemented method of encoding and decoding image data.
Claims
1. A computer-implemented method of encoding image data for one or more images using an image codec, wherein the image codec comprises a downscaling process, the method comprising the steps of: downscaling the one or more images in accordance with the initial downscaling process, using an artificial neural network comprising a plurality of layers of neurons; subsequently encoding the one or more downscaled images using the image codec; and iterating, until a stopping condition is reached, the steps of: determining an importance value for a set of neurons or layers of neurons; removing the neurons and/or layers with importance value less than a determined amount from the neural network; and tuning the neural network in accordance with a cost function, wherein the cost function is arranged to determine a redetermined balance between the accuracy of the downscaling performed by neural network and the complexity of the neural network; applying a monotonic scaling function to the weighting of the neurons of the neural network; and tuning the neural network.
2. The method of claim 1, wherein the one or more images are downscaled using one or more filters.
3. The method of claim 2, wherein a filter of the one or more filters is an edge-detection filter.
4. The method of claim 2, wherein a filter of the one or more filters is a blur filter.
5. The method of claim 2, wherein the parameters of a filter used to downscale an image are determined using the results of the encoding of previous images by the image codec.
6. The method of claim 1, wherein in the removal step, the layer with lowest importance value is removed.
7. The method of claim 1, wherein the stopping condition is that the iterated steps have been performed a predetermined number of times.
8. The method of claim 1, wherein the iterated steps and the applying and tuning steps are iterated until a further stopping condition is reached.
9. The method of claim 1, wherein the step of downscaling the one or more images comprises the step of determining the processes to use to downscale an image using the results of the encoding of previous images by the image codec.
10. The method of claim 1, wherein the encoded image data comprises data indicative of the downscaling performed.
11. The method of claim 1, wherein the image codec is lossy.
12. A computer-implemented method of decoding encoded image data for one or more images, wherein the encoded image data was encoded using a method of encoding image data for one or more images using an image codec, wherein the image codec comprises a downscaling process, the method of encoding image data comprising the steps of: downscaling the one or more images in accordance with the initial downscaling process using an artificial neural network comprising a plurality of layers of neurons; and subsequently encoding the one or more downscaled images using the image codec; iterating, until a stopping condition is reached, the steps of: determining an importance value for a set of neurons or layers of neurons; removing the neurons and/or layers with importance value less than a determined amount from the neural network; and tuning the neural network in accordance with a cost function, wherein the cost function is arranged to determine a predetermined balance between the accuracy of the downscaling performed by neural network and the complexity of the neural network; applying a monotonic scaling function to the weighting of the neurons of the neural network; and tuning the neural network, wherein the method of decoding encoded image data comprising the steps of: decoding the encoded image data using the image codec to generate one or more downscaled images; and upscaling the one or more images.
13. The method of claim 12, wherein the encoded image data comprises data indicative of the downscaling performed, and the data indicative of the downscaling performed is used when upscaling the one or more images.
14. The method of claim 1, wherein the image data is video data, the one or more images are frames of video, and the image codec is a video codec.
15. A computing device comprising: a processor; and a memory, wherein the computing device is arranged to perform using the processor a method of encoding image data for one or more images using an image codec, wherein the image codec comprises a downscaling process, the method comprising the steps of: downscaling the one or more images in accordance with the initial downscaling process using an artificial neural network comprising a plurality of layers of neurons; subsequently encoding the one or more downscaled images using the image codec; and iterating, until a stopping condition is reached, the steps of: determining an importance value for a set of neurons or layers of neurons; removing the neurons and/or layers with importance value less than a determined amount from the neural network; and tuning the neural network in accordance with a cost function, wherein the cost function is arranged to determine a predetermined balance between the accuracy of the downscaling performed by neural network and the complexity of the neural network; applying a monotonic scaling function to the weighting of the neurons of the neural network; and tuning the neural network.
16. A non-transitory computer-readable medium comprising instructions that, when executed by a processor of a computing device, cause the computing device to perform a method of encoding image data for one or more images using an image codec, wherein the image codec comprises a downscaling process, the method comprising the steps of: downscaling the one or more images in accordance with the initial downscaling process using an artificial neural network comprising a plurality of layers of neurons; subsequently encoding the one or more downscaled images using the image codec; and iterating, until a stopping condition is reached, the steps of: determining an importance value for a set of neurons or layers of neurons; removing the neurons and/or layers with importance value less than a determined amount from the neural network; and tuning the neural network in accordance with a cost function, wherein the cost function is arranged to determine a predetermined balance between the accuracy of the downscaling performed by neural network and the complexity of the neural network; applying a monotonic scaling function to the weighting of the neurons of the neural network; and tuning the neural network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will now be described by way of example only with reference to the accompanying schematic drawings of which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION
(15) Embodiments of the invention are now described.
(16)
(17) In the first downscaling stage 1 of
(18) Option 1: One or more content-adaptive filters are applied to the IoVF. The content-adaptive filters track edge direction within one or more IoVFs, or flow of pixel or scene activity in time between two or more successive IoVFs, and downscale in Sa/oT by any rational factor
(19)
with a, b any integers greater than zero such that b≥a. The integers a, b are chosen adaptively (i.e. based on the content of the IoVF, which may include the results of the overall process on previous IoVF) in each Sa/oT dimension using the adaptivity mechanisms 2 of
(20) Option 2: An Sa/oT blur filter kernel is applied to the IoVF. The Sa/oT filter may be, but is not limited to, an Sa/oT Gaussian function followed by downscaling as in option 1 above.
(21) Option 3: An autoencoder is applied to the IoVF. The autoencoder comprises a series of layers such as (but not limited to) those shown in
(22) In certain embodiments, the application of the options 1 to 3 may be iterated within a loop, as now described with reference to
(23) The steps of the iteration process are as follows:
(24) (I) The downscaled IoVFs by any factor
(25)
in Sa/oT as per option 1 are upscaled back to their original Sa/oT resolution using an edge-adaptive or autoencoding-based upscaling (designed to approximately invert the downscaling process), and a filter kernel is applied, such as (but not limited to) a fixed or adaptive filter having complementary frequency response to the Sa/oT blur kernel of option 2 (i.e., being an Sa/oT high-frequency—or “detail”—enhancement kernel rather than an Sa/oT blur kernel).
(26) (II) The upscaled IoVFs are downscaled with any of the downscaling processes of options 1-3, and the downscaled error IoVF computed as the difference between the previously-downscaled IoVF and the downscaling of this part.
(27) (III) The downscaled error IoVF is upscaled following any of the options 1-3, and filtered with the filter kernel of step I above in order to be added to the previously-upscaled IoVF of step I above.
(28) (IV) Steps II and III above are repeated until norm-K (also known as L.sub.K) [15] of the downscaled error IoVFs does not change beyond a threshold between successive iterations, or for a fixed number of iterations (e.g., 3 iterations, or any other number).
(29) In embodiments of the invention, the filters of option 1 may include (but are not limited to) edge-adaptive filters that use thresholding to determine edge content in input IoVF or IoVF difference matrices, e.g., local or global thresholding. An example embodiment of such edge-adaptive filters is, but is not limited to, adaptive weighted combinations of adjacent samples from IoVFs in Sa/oT, e.g., for samples A, B, D, E, J, K, M, N of the successive IoVFs 1 & 2 of
R.sub.d=T.sub.t.sub.
with:
∀iϵ{1, . . . ,8}: T.sub.t.sub.
(30) where ∀i ϵ{1, . . . ,8}: t.sub.i, w.sub.i are thresholds and weights set adaptively in order to align the filtering of the equation R.sub.d to Sa/oT edges in the input IoVFs.
(31) In embodiments of the invention, the “blur” filter kernel of option 2 may include (but is not limited to) the 5×5 Gaussian function of the form:
G[x,y,t]=[1/(2πσ.sup.2)]exp(−(x.sup.2+y.sup.2+t.sup.2)/(2σ.sup.2)),
(32) with −2≤x, y, t≤2 being the function's discrete space-time support and a being the standard deviation of the Gaussian function.
(33) In embodiments of the invention, the autoencoder of option 3 may include, but is not limited to, autoencoders that, through a series of layers such as those shown in
(34) Finally, in embodiments of the invention, option 4 may include, but is not limited to, any embodiments of options 1-3 within the iterative loop described, and computing the norm-1 or norm-2 between the error signal of successive iterations in order to establish if the norm is below a threshold.
(35) The adaptivity mechanisms 2 used in option 1 above are now discussed in more detail. The adaptivity mechanisms 2 comprise one or both of the following two parts:
(36) (i) Adaptive tuning of parameters for the downscaling ratio in Sa/oT between successive IoVF (with the downscaling ratio changing in each Sa/oT dimension, or within the same IoVF) and/or adaptively changing the filter or autoencoding parameters in Sa/oT in order to achieve the best results on training data or based on a modelling framework that approximates the cost functions used. Embodiments of the invention include, but are not limited to, this tuning using larger downscaling ratios for high-bitrate encoding and lower ratios for low-bitrate encoding, since it is expected that downscaling ratio will decrease monotonically to the utilized bitrate for the encoding of the low-resolution IoVFs. For example, for bitrates between 3,500 kbps and 7,000 kbps and HD-resolution video content at 25 to 30 frames-per-second (fps), the tuning of parameters can select: spatial downscaling ratios
(37)
for both the horizontal and vertical spatial dimension; temporal downscaling ratio
(38)
the use of the 5×5 blur kernel of eq. 3 and a corresponding 5×5 detail kernel such as a Laplacian of Gaussians with appropriate choice of standard deviation parameters; and three iterations of the iteration process described above.
(39) On the other hand, for bitrates between 100 kbps and 300 kbps, the adaptive tuning can select: spatial downscaling factors
(40)
for the horizontal dimension;
(41)
for the vertical spatial dimension;
(42)
for the temporal dimension; the use of a 6-layer autoencoder with the mean square error after reconstruction; and two iterations of the iteration process described above.
(43) The choice of these parameters can be done based on a combination of offline training with representative IoVFs, as well based on online features extracted from the present IoVFs to be encoded, such as variance between frames, the rate-distortion characteristics of select encoding of certain frames with a fast & single-pass with the utilized video codec, etc. These tuning options are just some of the options that will be apparent to the skilled person.
(44) (ii) In order to control the complexity of the entire process, adaptive tuning of whether to apply the three options described above within areas of each IoVF based on a similarity criterion between successive IoVF, e.g., computing areas of the error between IoVF L.sub.K=f.sub.K(V.sub.i[x,y]−V.sub.i+1[x,y]) with i being the time index, and (x,y) representing an area within the IoVFs and f.sub.K representing a distance function, e.g., norm-K (also known as L.sub.K norm) [15], with K ϵ{0, . . . , inf}. When similarity is above a certain threshold (with the latter itself tuned adaptively), instead of applying any of the three options, the same downscaled output as in the previous IoVF, or the same parameters as used for the options, are used. For example, blocks of 16×16 can be replicated from the reconstruction of previous frames if their difference at low resolution has norm-2 that is below 0.01. The skilled person will appreciate that this is just one example of a threshold rule, and many other possible distance metrics would be apparent to the skilled person.
(45) In the next encoding stage 3 of
(46) In the next transmitting stage 4 of
(47) In the next decoding stage 5 of
(48) In the next upscaling stage 5 of
(49) In this upscaling stage 6, adaptive tuning 7 of parameters for the upscaling ratio in Sa/oT between successive IoVF (with the upscaling ratio changing in each Sa/oT dimension and being any rational number
(50)
with a, b any integers greater man zero), or within the same IoVF and/or adaptively changing the filter or autoencoding parameters in Sa/oT, is performed so as to DtM the parameters used in the downscaling stage 1. This includes the adaptive tuning of whether to apply the DtM steps of the downscaling stage 1, or simply replicate the same content as in the previous IoVF (or the same parameters) are used, as in the adaptivity mechanisms 2.
(51) As noted in the comparison results 8 shown in
(52) It will be appreciated that where video data has a GOP (“groups of pictures”) structure, i.e. it comprises a series of GOPs, the parameters for downscaling a whole GOP could be determined (i.e. the same parameters use to downscale all images in the GOP), as by their nature the images in a GOP are likely to share characteristics. Parameters for subsequent GOPs could then be determined, so that different GOPs are downscaled using different parameters, if appropriate. Each downscaled GOP can then encoded and added to the video bitstream using a known video codec in the usual way. A flowchart showing an example method of encoding such video data is shown in
(53) A method of optimizing an artificial neural network in accordance with an embodiment of the invention is now described. Such a neural network can for example be used as the autoencoder of option 3 of the downscaling stage 1 of the embodiment described above.
(54) The neural network of the present embodiment is a multi-layered neural network a part of which is shown in
(55) Taking such a DNN, the method of the present embodiment provides pruning and complexity-accuracy scaling to optimize the DNN. The DNN may for example be implemented via any commonly-used library such as TensorFlow, caffe2, etc. Using the method, when a core DNN design like Inception or VGG-16 is fine-tuned for a specific dataset and problem context (e.g. training for IoVF SR as in the above embodiment, but also face recognition, ethnicity/gender recognition, etc.), the method can be used to also add resource-precision scaling.
(56) The method is now described in detail with reference to
(57) In a first step 100, the initial DNN to be optimized is obtained. The DNN will comprise (artificial) neurons in layers, and filters within the layers. The following aspects of the method are then performed.
T1.1) Adaptive Filter Pruning
(58) The filters are pruned to maximize computational efficiency while minimizing the drop in predictive accuracy of the neural network. For each convolutional layer i, the relative importance of each filter F.sub.i inside the layer can be calculated using the sum of its absolute weights Σ.sub.∀j|F.sub.i,j|, with F.sub.i,j the weights of feature map F.sub.i at position j. Visual examples of feature maps are shown in
(59) The steps for pruning M filters from the ith convolutional layer are as follows:
(60) (i) For each DNN layer, loop through all feature maps.
(61) (ii) For each filter F.sub.i,j in convolutional layer i, evaluate the importance of neurons (i.e., weights) (step 101 of
(62) (iii) Sort s.sub.i in descending order.
(63) (iv) Prune filters with smallest s.sub.i and remove kernels in the convolutional layer i corresponding to the pruned feature maps, which (step 102 of
(64) (v) Calculate sensitivity to pruning filters for the new kernel matrix for each layer independently by evaluating the resulting pruned network's accuracy on the validation set.
(65) (vi) Create new kernel matrix by copying the remaining kernel weights.
(66) (vii) Retrain the network to compensate for performance degradation (step 103). Retraining can be performed after every single pruning action, after pruning all layers first or anything in between until the original accuracy is restored (the “yes” path of step 104). The remaining steps of the method (the “no” path of step 104 and beyond) are described later below.
(67) An example of the result of filter pruning is shown schematically in
T1.2) Adaptive Scaling in DNNs
(68) State-of-the-art deep learning libraries use dataflow graphs to define and run multilayer DNN models. Dataflow graphs explicitly define the linear and non-linear processes of a neural network for every layer, giving libraries like TensorFlow the ability to calculate differentials at every level for every node. Efficient calculation of differentials at every level is important to minimize loss functions. For example, a TensorFlow call corresponding to 64 channels of a convolutional layer (with each channel comprising a 3×3 filter that is applied with stride (per dimension) [1, 2]) is: con1=Conv(data, [3,3,64], [1,2], bias=0.0, stddev=5e-2, padding=‘SAME’, name=‘conv1’);
(69) Such calls are mapped to the corresponding C++ code segments (or directly to custom hardware designs that supports TensorFlow) that carry out the data and weight reordering for the actual convolutions and update the filter weights based on the utilized stochastic gradient descent. After the pruning process of T1.1 has been applied, or independent of said pruning process, a-priori scaling with a scaling coefficient matrix that allows for inherent prioritization according to the utilized coefficients is added. For example, in the above conv1 example, a-priori 3×3×64 scaling could be performed following a set of monotonic functions that prioritized within the 64 filters of the convolutional layer, as well as within the 3×3 masks. The specific shape of these monotonic functions could be fixed, or determined based on a learning framework. An example instantiation of the learned coefficients corresponding to the conv1 layer is shown in
(70) Furthermore, in order to provision for incremental DNN training, training can be provided with subsets of the DNN weights, starting from those corresponding to the most significant scaling coefficients, and progressing to those corresponding to the least significant coefficients. Once the DNN weights have converged for a given scaling region, they can be kept constant in order to apply during the forward pass and train subsequent regions. An example (out of many possible) of three retained layers is shown in the leftmost part of
(71) Next, the following steps are performed:
(72) (viii) Optionally retain only a subset of DNN layers (step 105).
(73) (ix) Apply monotonic layer weighting functions to the retained layers (step 106).
(74) (x) Retrain the network to compensate for performance degradation (step 107). Retraining can be performed after every single layer selection and weighting action, after weighting all layers first, or anything in between until the original accuracy is restored (the “yes” path of step 108).
(75) (xi) Once training is stopped in selective retainment and layer weighting (the “no” path of step 108), optionally restart the whole process (the “yes” option of step 109), or alternatively move to the inference step (step 110, following the “no” option of step 109) using new IoVFs or inputs not seen during training and selectively using layers to implement SR or classification inference on the new data.
(76) An example embodiment of layer weighting within Tensorflow [27] is now given. It will be appreciated that the invention is not limited to this embodiment. As this scaling can be implemented with a Hadamard product, e.g., a tf.mul( )command and broadcasting in NumPy prior to the cony command in TensorFlow, basic integration of such a framework within state-of-the-art DNN libraries requires no change in their codebase. Instead, it can be achieved by automated parsing of the Python layer of a TensorFlow DNN design (similar for caffe2 and elsewhere) and appropriate instrumentation. In order for the auto-differentiation and gradient flow processes of such libraries to leave the scaling coefficients unchanged, they can be defined as constants or hyperparameters, and the shape parameters of the multidimensional monotonic layering functions of
T1.3) Optimization and Control of Scaling Coefficient Functions
(77) The described graceful degradation in accuracy for resource scaling and DNN layer compaction depends on the choice of the framework to design the shape of the monotonic scaling functions of each layer. Starting from a variety of initializations, e.g., using linearly-, polynomially- or exponentially-decaying functions, adjustment of the shape parameters of the functions within the backpropagation operation carried out within each layer is performed. As a non-limiting example, based on the average differential carried across each dimension of the layering coefficients of
(78) Example bitrate-PSNR curves obtained from embodiments of the invention include, but are not limited to, those presented in
(79) Embodiments of the invention include the methods described above performed on a computing device, such as the computing device shown in
(80) While the present invention has been described and illustrated with reference to particular embodiments, it will be appreciated by those of ordinary skill in the art that the invention lends itself to many different variations not specifically illustrated herein.
(81) Where in the foregoing description, integers or elements are mentioned which have known, obvious or foreseeable equivalents, then such equivalents are herein incorporated as if individually set forth. Reference should be made to the claims for determining the true scope of the present invention, which should be construed so as to encompass any such equivalents. It will also be appreciated by the reader that integers or features of the invention that are described as preferable, advantageous, convenient or the like are optional and do not limit the scope of the independent claims. Moreover, it is to be understood that such optional integers or features, whilst of possible benefit in some embodiments of the invention, may not be desirable, and may therefore be absent, in other embodiments.
REFERENCES
(82) [1] Dong, Jie, and Yan Ye. “Adaptive downsampling for high-definition video coding.” IEEE Transactions on Circuits and Systems for Video Technology 24.3 (2014): 480-488.
(83) [2] Douma, Peter, and Motoyuki Koike. “Method and apparatus for video upscaling.” U.S. Pat. No. 8,165,197. 24 Apr. 2012.
(84) [3] Su, Guan-Ming, et al. “Guided image up-sampling in video coding.” U.S. Pat. No. 9,100,660. 4 Aug. 2015.
(85) [4] Shen, Minmin, Ping Xue, and Ci Wang. “Down-sampling based video coding using super-resolution technique.” IEEE Transactions on Circuits and Systems for Video Technology21.6 (2011): 755-765.
(86) [5] van der Schaar, Mihaela, and Mahesh Balakrishnan. “Spatial scalability for fine granular video encoding.” U.S. Pat. No. 6,836,512. 28 Dec. 2004.
(87) [6] Boyce, Jill, et al. “Techniques for layered video encoding and decoding.” U.S. patent application Ser. No. 13/738,138.
(88) [7] Dar, Yehuda, and Alfred M. Bruckstein. “Improving low bit-rate video coding using spatio-temporal down-scaling.” arXiv preprint arXiv: 1404.4026 (2014).
(89) [8] Martemyanov, Alexey, et al. “Real-time video coding/decoding.” U.S. Pat. No. 7,336,720. 26 Feb. 2008.
(90) [9] Nguyen, Viet-Anh, Yap-Peng Tan, and Weisi Lin. “Adaptive downsampling/upsampling for better video compression at low bit rate.” Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on. IEEE, 2008.
(91) [10] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the dimensionality of data with neural networks.” science313.5786 (2006): 504-507.
(92) [11] van den Oord, Aaron, et al. “Conditional image generation with pixelcnn decoders.” Advances in Neural Information Processing Systems. 2016.
(93) [12] Theis, Lucas, et al. “Lossy image compression with compressive autoencoders.” arXiv preprint arXiv: 1703.00395(2017).
(94) [13] Wu, Chao-Yuan, Nayan Singhal, and Philipp Krähenbühl. “Video Compression through Image Interpolation.” arXiv preprint arXiv: 1804.06919 (2018).
(95) [14] Rippel, Oren, and Lubomir Bourdev. “Real-time adaptive image compression.” arXiv preprint arXiv: 1705.05823 (2017).
(96) [15] Golub, Gene H., and Charles F. Van Loan. Matrix computations. Vol. 3. JHU Press, 2012.
(97) [16] R. Timofte, et al., “NTIRE 2017 challenge on single image super-resolution: Methods and results,” Proc. Comp. Vis. and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conf. on Comp. Vis. and Pattern Recognition, CVPR, IEEE, 2017, https://goo.gl/TQRT7E.
(98) [17] B. Lim, et al. “Enhanced deep residual networks for single image super-resolution,” Proc. Comp. Vis. and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conf. on Comp. Vis. and Pattern Recogn., CVPR, IEEE, 2017, https://goo.gl/PDSTiV.
(99) [18] C. Dong, et al., “Accelerating the super-resolution convolutional neural network,” Proc. 2016 IEEE Conf. on Comp. Vis. and Pattern Recognition, CVPR, IEEE, 2016, https://goo.gl/Qa1UmX.Dong, Jie, and Yan Ye. “Adaptive downsampling for high-definition video coding.” IEEE Transactions on Circuits and Systems for Video Technology 24.3 (2014): 480-488.
(100) [19] Dong, C., Loy, C. C., He, K., Tang, X, “Learning a deep convolutional network for image super-resolution,” Proc. ECCV (2014) 184-199
(101) [20] Dong, C., Loy, C. C., He, K., Tang, X, “Image super-resolution using deep convolutional networks,” IEEE TPAMI 38(2) (2015) 295-307
(102) [21] Yang, C. Y., Yang, M. H., “Fast direct super-resolution by simple functions,” Proc. ICCV. (2013) 561-568
(103) [22] Dong, C., Loy, C., Tang, X.: Accelerating the Super-Resolution Convolutional Neural Network, Proc. ICCV (2016).
(104) [23] Han, Song, Huizi Mao, and William J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv: 1510.00149 (2015).
(105) [24] Han, Song, et al., “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems. 2015.
(106) [25] Iandola, Forrest N., et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size,” arXiv preprint arXiv:1602.07360 (2016)
(107) [26] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, 2015.
(108) [27] M. Abadi, “TensorFlow: A system for large-scale machine learning,” Proc. 12th USENIX Symp. on Oper. Syst. Des. and Implem. (OSDI), Savannah, Ga., USA. 2016.
(109) [28] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Proc. Advances in Neural Inf. Process. Syst., NIPS, 2014.
(110) [29] V. Sze, et al., “Hardware for machine learning: Challenges and opportunities,” arXiv preprint, arXiv:1612.07625, 2016.
(111) [30] C. Zhang, et al., “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” Proc. ACM/SIGDA Int. Symp. Field-Prog. Gate Arr., FPGA. ACM, 2015.
(112) [31] T. Chen, et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” Proc. ASPLOS, 2014.
(113) [32] A. Shafiee, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” Proc. IEEE Int. Symp. on Comp. Archit., ISCA, 2016.
(114) [33] L. Chi, et al., “PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,” Proc. IEEE Int. Symp. on Comp. Archit., ISCA, 2016.
(115) [34] B. Park, et al., “A 1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications,” Proc. ISSCC, 2015.
(116) [35] A. Cavigelli, et al., “Origami: A convolutional network accelerator,” Proc. GLVLSI, 2015.
(117) [36] H. Mathieu, et al., “Fast training of convolutional networks through FFTs,” Proc. ICLR, 2014.
(118) [37] C. Yang, et al., “Designing energy-efficient convolutional neural networks using energy-aware pruning,” arXiv preprint arXiv:1611.05128, 2016.
(119) [38] J. Hsu, “For sale: deep learning,” IEEE Spectrum, vol. 53, no. 8, pp. 12-13, August 2016.
(120) [39] M. Han, et al., “Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding,” Proc. ICLR, 2016.
(121) [40] K. Chen, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” Proc. ISSCC, 2016.
(122) [41] M. Courbariaux, and Y. Bengio, “Binarynet: Training deep neural networks with weights and activations constrained to +1 or −1,” arXiv preprint, arXiv:1602.02830, 2016.