Feature extraction by directional wavelet packets for image processing by neural networks

Abstract

Methods and systems that replace convolutional layers of a convolutional neural network (CNN) with quasi-analytic directional wavelet packet (qWP)-based filters, and which use the qWP-based filters to perform filtering and extract features from image data. The extracted features are then used by the CNN to perform a classification task. The results of the classification task are output to a user.

Claims

1. A method, comprising: applying quasi-analytical directional wavelet packet (qWP)-based filtering to image data to extract feature maps, wherein the applying includes applying the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and applying the qWP-based filtering to a new image to extract a new feature map in a classification stage; applying neural network (NN) processing to the extracted new feature map to classify an object in the new image and/or the new image; and outputting the classified object and/or the new image to a user.

2. The method of claim 1, wherein the applying qWP-based filtering to image data is preceded by generating qWPs using discrete or polynomial splines, and by using the generated qWPs to obtain the qWP-based filters.

3. The method of claim 1, wherein the applying of NN processing to the extracted new feature map to classify the object and/or image includes using the training-based feature maps and the new feature map for classification of the object in the new image and/or of the entire new image.

4. The method of claim 1, wherein the processing unit is a graphic processing unit.

5. The method of claim 1, wherein the method is performed in a vehicle.

6. The method of claim 1, wherein the neural network is a convolutional neural network.

7. A system, comprising: a processing unit configured to apply quasi-analytical directional wavelet packet (qWP)-based filtering to image data to extract feature maps, wherein the configuration includes a configuration to apply the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and to apply to qWP-based filtering to a new image to extract a new feature map in a classification stage; a neural network (NN) classifying engine configured to apply NN processing to the extracted feature maps to classify an object in the new image and/or the new image; and an interface or input/output device for outputting the classified object and/or the new image to a user.

8. The system of claim 7, wherein the processing unit configuration to apply qWP-based filtering to image data to extract feature maps includes a configuration to generate qWPs using discrete or polynomial splines, and to use the generated qWPs to obtain qWP-based filters used in the qWP-based filtering.

9. The system of claim 7, wherein the NN classifying engine configuration to apply NN processing to the extracted feature maps to classify the object and/or the image includes a configuration to use the training-based feature maps and the new feature map to classify the object in the new image and/or the entire new image.

10. The system of claim 7, wherein the processing unit is a graphic processing unit.

11. The system of claim 7, wherein the system is included in a vehicle.

12. The system of claim 7, wherein the neural network is a convolutional neural network.

13. A method, comprising: replacing convolutional layers of a convolutional neural network (CNN) with quasi-analytic directional wavelet packet (qWP)-based filters; using the qWP-based filters to perform filtering and extract features from training image data, wherein the using includes applying the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and applying the qWP-based filtering to a new image to extract a new feature map in a classification stage; using fully connected layers of the CNN to perform a classification task using the extracted features to classify an object in the new image and/or the new image; and outputting the classified object and/or the new image to a user.

14. The method of claim 13, wherein the replacing of the convolutional layers of the CNN with qWP-based filters includes replacing all of the convolutional layers of the CNN with the qWP-based filters.

15. The method of claim 13, wherein the replacing of the convolutional layers of the CNN with qWP-based filters is preceded by generating qWPs using discrete or polynomial splines, and by using the generated qWPs to obtain the qWP-based filters.

16. The method of claim 13, wherein the using of fully connected layers of the CNN to perform a classification task using the extracted features includes using the training-based feature maps and the new feature map to perform the classification task.

17. The method of claim 13, wherein the method is performed in a vehicle.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Non-limiting examples of embodiments disclosed herein are described below with reference to figures attached hereto that are listed following this paragraph. Identical structures, elements or parts that appear in more than one figure are generally labeled with a same numeral in all the figures in which they appear. The drawings and descriptions are meant to illuminate and clarify embodiments disclosed herein, and should not be considered limiting in any way. In the drawings:

(2) FIG. 1 shows pictorially a common known art CNN architecture;

(3) FIG. 2 shows the CNN architecture of FIG. 1 in a flow chart;

(4) FIG. 3 shows a flow chart of a method according to aspects of the presently disclosed subject matter;

(5) FIG. 4 shows the flowchart of the steps that generate directional oscillating qWPs from discrete or polynomial splines;

(6) FIG. 5 shows the flowchart of the steps from the application of 2D complex qWPs to image till the generation of characteristic features map that are the input to neural networks;

(7) FIG. 6 illustrates in (a) a quasi-analytic wavelet packet (qWP) design and (b) quadrants of the frequency domain;

(8) FIG. 7 shows spline-based wavelet packets (sWPs) {.sub.[3],} from the third decomposition level (top), complimentary wavelet packets cWPs (center) and respective magnitude spectra (bottom);

(9) FIG. 8 shows in (a) a magnitude spectrum of 2D qWP [k,n] and in (b) a 2D qWP [k,n](magnified);

(10) FIG. 9 shows in (a) magnitude spectra of 2D real qWPs {.sub.+[2],, } and in (b) {.sub.[2],, } ,=0,1,2,3, from the second decomposition level;

(11) FIG. 10 shows in (a) 2D real qWPs {.sub.+[2],, } and in (b) {.sub.[2],, }, ,=0,1,2,3, from the second decomposition level;

(12) FIG. 11 shows a diagram of the 1D qWP transform down to the second level;

(13) FIG. 12 describes a schematic way for generating the feature maps described in FIG. 3;

(14) FIG. 13 shows on the left the original image and on the right the translated and rotated (TR) of left image;

(15) FIG. 14 shows an end-to-end schematic flow of the training procedure that generates feature maps;

(16) FIG. 15 shows an end-to-end schematic flow of the classification procedure;

(17) FIG. 16 illustrates in a block diagram a system according to aspects of the presently disclosed subject matter;

(18) FIG. 17 shows a vehicle incorporating a system for implementing a method disclosed herein.

DETAILED DESCRIPTION

(19) FIG. 3 shows a flow chart of an embodiment of a method disclosed herein. qWP-based filters are generated from generated waveforms in step 302. The qWP-based filters and image data are convolved to generate an extracted features map in step 304. Neural network processing is applied to the extracted features map for image classification or recognition in step 306.

(20) In contrast with the use of a conventional CNN in step 302 of FIG. 2, filtering and feature extraction in step 304 of FIG. 3 is done using qWP-based filters. That is, no (or very few) convolutional layers are used to perform filtering and extract features. The extracted features are then fed into NN processing (step 306) in a way similar to that in step 204 of FIG. 2.

(21) FIG. 4 shows the flowchart of the steps that generate qWPs from discrete or polynomial splines. The end result of FIG. 4 is not just qWPs, but oscillating qWPs . Discrete or polynomial splines are used to generate discrete quasi-analytical wavelet packets in step 402. A 1D transform is applied twice to T to generate a qWP .sub.+ in step 404 and a qWP .sub. in step 406. A 2D transform is implemented in a tensor product mode, that is, by a sequential application of the 1D transforms to rows and columns of the 2D arrays. The image transform coefficients, which are the inner products of the image with shifts of the corresponding qWPs, are derived by an iterated multirate filtering of image. The 2D transform is applied separately to .sub. and .sub. to generate qWP-based filters .sub.+ and .sub.+, respectively, in step 408. .sub.++ and .sub.+ are then used to generate directional oscillating qWPs .sub.=Re(.sub.) and .sub.=Re(.sub.++) in step 410.

(22) FIG. 5 describes the flow of a process that generates characteristic features (feature map) of an image using the directional oscillating qWPs .sub.+ and .sub. described in FIG. 4, which are then fed to a neural network for classification and recognition by a modified CNN. The process in FIG. 5 starts with the input of qWP-based filters .sub.+ and .sub. applied to an input image I in step 502 to create Z.sub.z (which denotes the qWPs transform coefficients) using Eq. 3. In step 504, the real part of Z.sub.z, denoted by Y.sub.z, represents the inner products of image I with the directional oscillating qWPs .sub.z (where denotes .sub.+ and .sub.). The transform coefficients of Y.sub. are split into blocks that are related to qWPs .sub.z in step 506, which have different orientations and oscillation frequencies. The blocks of Y.sub.z include patterns with certain orientations and certain oscillation frequencies of the image. The l.sub.p, p1, norm of the blocks of Y.sub.z are computed in step 508. The l.sub.p, p1, norm of the blocks are arranged in step 510 into a features vector F=F.sub.+F.sub.. Features vector F is fed in step 512 into a neural network.

(23) Some advantages of using the process of step 304 instead of that of step 202 include: fast and accelerated classification computations, since there is no need to have either convolutional layers or huge data sets; universal waveforms and filters that do not depend on specific imaging; and construction of features that fit the structure of the image by using directional wavelet packet filters.

(24) Following is a detailed enabling and exemplary way of implementing a method disclosed herein. The section titled quasi-analytic directional wavelet packets outlines the design of qWPs. The design used exemplarily the Hilbert transform (HT) of orthonormal spline-based wavelet packets (sWPs), originated from polynomial and discrete splines. Their shapes and spectra serve as building blocks for the design of the qWPs. The design scheme is illustrated by a diagram in FIG. 6. The sWPs are displayed in FIG. 7.

(25) The set of the designed complex qWPs is described in the section titled Directionality of real-valued 2D WPs. The qWP .sub.+ consists of two groups G.sub.+={.sub.++} and G.sub.={.sub.+} whose discrete Fourier transform (DFT) spectra form a variety of tilings for quadrants q.sub.0 and q.sub.1, respectively (see FIG. 6), by squares of different sizes depending on the decomposition level. Consequently, the set of real qWPs (the real parts of the complex qWPs) consists of two groups g.sub.+={.sub.+=Re{G.sub.+}} and g.sub.={.sub.=Re {G.sub.}} whose DFT spectra cover the quadrants pairs q.sub.0q.sub.2 and q.sub.0q.sub.2, respectively, where g.sub.+=q.sub.0q.sub.2 and g.sub.=q.sub.1q.sub.3 (see part b in FIG. 6).

(26) The size of the covering squares decreases according to the increase of the decomposition level. It is explained later how a directional structure of the qWPs spectra determines the structure of the respective waveforms (see Eq. 1). It is shown that the waveforms are close to windowed cosines with multiple frequencies oriented in multiple directions. The magnitude of the spectra of directional qWPs and the qWPs themselves are displayed in FIG. 7.

(27) The section titled WP transforms with quasi-analytics WPs presents an exemplary scheme for signals and image transforms using the designed qWPs. An exemplary scheme for the 1D case is illustrated by a diagram in FIG. 11. 2D transforms are implemented in a tensor product mode, that is, by a subsequent application of 1D transforms to rows and columns of the 2D arrays. The signal/image transform coefficients, which are the inner products of the signal/image with shifts of the corresponding qWPs, are derived by an iterated multirate filtering of signal/image.

(28) The section titled Extraction of characteristic features introduces characteristic features to be extracted from the images. The set of the transform coefficients of an image I consists of blocks. A block B comprises correlation coefficients of X with shifts of a certain qWP , which is close to a windowed cosine with a certain frequency (in the spatial domain) that is oriented in a certain direction. Thus, the coefficients from block B testify about the presence of fragments in the image I that are oriented (approximately) and oscillating (approximately) as the qWP . The average of moduli of the coefficients from block B is a measure of the contribution of such fragments to the image I. This average of the moduli is taken as a characteristic feature of image I that is related to qWP . The collection of all features (feature map) characterizes the decompositions and oscillations distribution within image I. Note that the average of the moduli is just one possible measure, and other measures may be used as well.

(29) Quasi-Analytic Directional Wavelet Packets

(30) The library of orthonormal wavelet packets originating from discrete and polynomial splines of multiple spline orders (sWPs) forms the building blocks for the design of directional wavelet packets as seen in FIG. 7. The waveforms in the library are symmetric (top), well localized in time domain, and their shapes vary from low-frequency smooth curves to high-frequency oscillating transients. They can have any number of local vanishing moments. Their spectra provide a variety of refined splits of the frequency domain. Shapes of the magnitude spectra tend to be rectangular as the spline's order increases (bottom). The 1D WPs belong to the space [N] of N-periodic discrete time signals. Their tensor products possess similar properties extended to 2D setting but they lack a directionality. The 2D WPs belong to the space [N,N] of N-periodic in the vertical and horizontal directions arrays.

(31) In an example, design of directional WPs is achieved by the following steps: 1. The Hilbert transform is applied to the set {} of orthonormal sWPs thus producing the set {=H()}see a in FIG. 6; 2. The lowest-frequency and the highest-frequency waveforms from the set {} are modified to provide an orthonormal set {} of the complimentary WPs (cWPs), which are anti-symmetric (see middle signal in FIG. 7) and whose magnitude spectra coincides with the magnitude spectra of the respective WPs from the set {}see FIG. 6a; 3. Two sets of complex quasi-analytic WPs (qWPs) {.sub.+=+i} and {.sub.=i} (in FIG. 6a) whose spectra are localized in the positive and negative half-bands of the frequency domain respectively, are defined; 4. Two sets of 2D complex qWPs are defined by the tensor multiplication of the qWPs {.sub.} such as {.sub.++.sub.+.Math..sub.+} and {.sub.+.sub.+.Math..sub.} (FIG. 6a)); 5. The directional qWPs denoted by .sub.+ and .sub. are obtained as the real parts of the complex qWPs such that {.sub.+Re(.sub.++)=.Math..Math.} and {.sub.Re(.sub.+)=.Math.+.Math.} FIG. 6a.

(32) Notation: There are 2.sup.m 1D WPs of any kind (sWPs, cWPs and qWPs) at the m-th decomposition level, which are denoted by {.sub.[m],}, {.sub.[m],} and {.sub.[m],=.sub.[m],i.sub.[m],}, respectively, where =0, . . . , 2.sup.m1. Consequently, there are 2.sup.2m 2D tensor-product WPs, which are denoted by {.sub.[m],, },{.sub.[m],, } and{.sub.+[m],, }, (respectively, where ,=0, . . . , 2.sup.m1. Real 2D qWPs are denoted by {.sub.[m],, =Re(.sub.+[m],, )}.

(33) FIG. 7 displays the symmetric orthonormal sWPs {.sub.[3],}(top signal), the antisymmetric orthonormal cWPs {.sub.[3],}, =0, . . . , 7, (middle signal) from the third decomposition level, and their magnitude spectra (bottom signal).

(34) A localized directional of the spectra of qWPs {.sub.+} and {.sub.}, seen in FIG. 10, determines the structure of the respective waveforms.

(35) Directionality of Real-Valued 2D WPs

(36) Assume, for example, that N=512, m=3, =2, =5. Denote [k,n].sub.++[3],2,5[k,n][k,n]Re([k,n]). The spectrum |{circumflex over ()}[,], displayed in FIG. 8 (left), effectively occupies the square of size 4040 pixels centered around the point C=[.sub.0,.sub.0], where .sub.0=78, .sub.0=178. Thus, the WP is represented by

(37) $\begin{matrix} \begin{matrix} [k, n] \\ \underline{} [k, n] \end{matrix} .Math. \begin{matrix} \frac{1}{N^{2}} {.Math.}_{=_{0} - 20}^{_{0} + 19} {.Math.}_{v = v_{0} - 20}^{v_{0} + 19} e^{2 i (k + n) / N} \hat{} [, v] = e^{2 i (_{0} k + v_{0} n)} \underline{} [k, n] \\ \frac{1}{N^{2}} {.Math.}_{, = - 20}^{19} e^{2 i (k + m) / N} \hat{} [+_{0}, v + v_{0}] \end{matrix} & (1) \end{matrix}$ Consequently, the real-valued WP is represented as follows:

(38) $[k, n] \frac{2 (_{0} k + v_{0} n)}{N} \underline{} [k, n], \underline{} [k, n] \overset{}{=} Re (\underline{} [k, n]) .$

(39) The spectrum of the 2D signal comprises only low frequencies in both directions and it does not have a directionality, but the 2D signal

(40) $\cos \frac{2 (_{0} k + v_{0} n)}{N}$ oscillates in the direction of the vector {right arrow over (v)}=178{right arrow over (i)}+78{right arrow over (j)}. The 2D WP [k,n] is well localized in the spatial domain and the same is true for the low-frequency signal . Therefore, WP [k,n] can be regarded as the directional cosine modulated by the localized low-frequency signal .

(41) Thus, shapes of the real qWPs {.sub.[m]} are close to windowed cosines with multiple frequencies (that depend on the distances of the corresponding frequency squares from the origin), which are oriented in multiple directions (2(2.sup.m+11) at level m). The qWPs {.sub.+} are generally oriented to north-east (FIG. 10a), while the qWPs {.sub.} are oriented to north-west (FIG. 10b). Combinations of waveforms from the sets {.sub.+} and {.sub.} provide a variety of frames in the space of 2D signals.

(42) The magnitude spectra of the real qWPs {.sub.+} and {.sub.} from the second decomposition level are displayed in FIG. 9 and the qWPs {.sub.+} and {.sub.} are shown in FIG. 10.

(43) WP Transforms with Quasi-Analytic WPs

(44) In an example, the qWP transforms are executed in the frequency domain using the Fast Fourier Transform (FFT). Assume that the signals to be processed belong to the space [N] of N-periodic discrete-time signals where N=2.sup.j.

(45) The sets Z.sub.[m] of the transform coefficients with the qWPs {.sub.[m]} from the decomposition level m consist of 2.sup.m+1 wavelet blocks Z.sub.[m]=.sub.=0.sup.2.sup.m.sup.1Z.sub.[m] related to the 2.sup.m+1 qWPs {.sub.[m],}, =0, . . . , 2.sup.m1. The qWP transform coefficients of a signal x={x[k]}[N] are the correlation coefficients of x with the translations of the corresponding wavelet packets:
z.sub.[m][k]= custom character x,.sub.[m],[2.sup.mk]=.sub.l=0.sup.N1.sub.[m],*[l2.sup.mk]x[l]=y.sub.[m][k]ic.sub.[m][k]
y.sub.[m][k]=x,.sub.[m],[2.sup.mk],c.sub.[m][k]=x,.sub.[m][2.sup.mk].(2) The transforms are implemented in a multiresolution mode by multirate filtering. The structure of the filter bank for the transform of a signal x to the first decomposition level differs from the structure of the filter banks for subsequent levels. Define the filters P.sub., F.sub., Q.sub. by their impulse responses:
p.sub.[k]=.sub.[1],,f.sub.[k]=.sub.[1],,[q.sub.[k]=p.sub.[k]if.sub.[k], k=0, . . . , N1,=0,1. The four blocks Z.sub.+[1],0Z.sub.+[1],1Z.sub.[1],0Z.sub.[1],1 of the first-level transform coefficients are derived by filtering the signal x followed by down-sampling, such as z.sub.[1],[k]=.sub.l=0.sup.N1q.sub.[l2k]x[l], =0,1.

(46) The frequency responses of the filters P.sub. are {circumflex over (p)}.sub.[n]=.sub.k=0.sup.N1e.sup.2ikn/Np.sub.[k]={circumflex over ()}.sub.1,[][n], n=0, . . . , N1, =0,1. The filters P.sub.[m],m=1, . . . , M, =0,1 for the transform from the first to the subsequent decomposition levels are defined via their frequency responses: {circumflex over (p)}.sub.[m][n]={circumflex over (p)}.sub.[2.sup.mn], m=1, . . . , M, =0,1. Thus, the qWP transform from the first to the second decomposition level is
z.sub.[2]0[k]=.sub.l=0.sup.N/21p.sub.[1]0[l4k]z.sub.[1]0[l] z.sub.[2]1[k]=.sub.l=0.sup.N/21p.sub.[1]1[l4k]z.sub.[1]0[l]
z.sub.[2]2[k]=.sub.l=0.sup.N/21p.sub.[1]1[l4k]z.sub.[1]1[l] z.sub.[2]3[k]=.sub.l=0.sup.N/21p.sub.[1]0[l4k]z.sub.[1]1[l] The transforms to the subsequent decomposition levels are executed similarly using the filters P.sub.[m],, m=2, . . . , M, =0,1. The diagram in FIG. 11 illustrates the qWP transform implementation. The 2D transforms of an image X of size NN are executed in a tensor product mode involving application of 1D transforms with the filters Q.sub., =0,1, to rows of the image I, which is followed by application of the transform to the columns of the derived coefficient arrays. Thus, eight coefficient arrays of size N/2N/2 are produced: {Z.sub.+[1],,, Z.sub.[1],,,,=0,1}. Then, the coefficient arrays are processed in the same way but using the filters P.sub.[m], m=2, . . . , M, =0,1 instead of Q.sub., =0,1. Arrays related to the positive qWPs .sub.++ and arrays related to the negative qWPs .sub.+ are processed separately, forming a double-tree structure. At the second level one gets 32 coefficient arrays {Z.sub.+[2],,, Z.sub.[2],,,,=0,1,2,3} of size N/4N/4 and so onsee FIG. 12.

(47) Extraction of characteristic features (feature maps): the connection between universal waveforms and wavelet packets coefficients generated by universal filtering P.sub., F.sub., Q.sub.

(48) The qWP transform coefficients are the inner products of image X with 2D complex qWPs:

(49) $\begin{matrix} z_{[m],,} [k, n] = {.Math.}_{, = 0}^{N - 1} {_{+ [m],,} [- 2^{m} k, - 2^{m} n]}^{*} X [,] = y_{[m],,} [k, n] + {ic}_{[m],,} [k, n] . & (3) \end{matrix}$

(50) Consequently, the coefficients y.sub.[m],,[k,n]=Re(z.sub.[m],,[k,n]) are the correlation coefficients of image X with the directional qWPs .sub.[m],,[2.sup.mk,2.sup.mn]=Re(.sub.[m],,[2.sup.mk,2.sup.mn]). As mentioned, the qWPs {.sub.[2],,,,=0,1,2,3} from the second decomposition level are displayed in FIG. 10.

(51) Denote by B.sub.+[m],,+{y.sub.+[m],,[k,n]}.sub.k,n=0.sup.N/2.sup.m.sup.1, ,=0, . . . , 2.sup.m1, the block of the transform coefficients related to qWP .sub.+[m],,. qWP .sub.+[m],, is oriented in some direction d.sub.+[m],, along an angle .sub.+[m],,

(52) $(\sin_{+ [m],,} \frac{}{})$ and is oscillating with a frequency f.sub.+[m],,, which is proportional to the distance of the point (,) from the origin. Thus, the coefficients from the block B.sub.+[m],, testify to the presence of fragments oriented along (approximately) the angle .sub.+[m],, and oscillating with (approximately) the frequency f.sub.+[m],, in the image X. The value

(53) $\begin{matrix} F_{+ [m],,} \overset{}{=} (\frac{2^{m}}{N}) {.Math.}_{k, n = 0}^{N / 2^{m} - 1} .Math. y_{+ [m],,} [k, n] .Math. F_{+ [m],,} \overset{}{=} (\frac{2^{m}}{N}) {.Math.}_{k, n = 0}^{N / 2^{m} - 1} .Math. y_{+ [m],,} [k, n] .Math. & (4) \end{matrix}$ is a measure of the contributions of such fragments into the image X.

(54) FIG. 12 describes an exemplary schematic way for generating the feature maps as described in steps 304 and 306 in FIG. 3. In this example, the image is decomposed into two levels that generates 22 and 44 blocks from the image using positive qWPs (top boxes) and negative qWPs (bottom boxes). They are convolved with predefined filters as described in Eq. 3. The number of partitions of each image to blocks in each level and the number of decomposition levels are free parameters. The positive and negative features maps, which are concatenated into one vector, are generated by Eq. 4 using for example the h norm. Other norms such as l.sub.2 or others can also be used.

(55) Denote by F.sub.+[m] the level-m feature map (FM), which is the 2D array F.sub.+[m]{F.sub.+[m],,}.sub.,=0.sup.2.sup.m.sup.1 of size 2.sup.m2. The feature map F.sub.[m] is derived in the same way as F.sub.+[m]. The combined level-m FM F.sub.[m]F.sub.+[m]F.sub.[m] characterizes the directions and oscillations distribution within image X. The transition from the decomposition level m to level m+1 refines that distribution at the cost of having a wider support of testing waveforms .sub.[m+1],, compared to that of .sub.[m],,. The FMs F.sub.[m] are invariant about translations of an object within the image and are stable about moderate rotations such as FIG. 13 (left original, right translated and rotated).

(56) Each qWP transform coefficient of an image I has a certain physical meaning: it evaluates the presence of an event with a certain direction that is oscillating with a certain frequency in a certain patch of the image. The FMs defined in Eq. 4 inherited these properties except for the localization in the image. Utilization of these physical meanings of the transform coefficients makes it possible to design a variety of feature extraction schemes that can be optimized for different classes of problems. It is worth mentioning the variability of the qWP libraries (choice between the generating polynomial and discrete splines and selection of spline's order). There is an option to extend the FMs by the usage of the imaginary c.sub.[m],,[k,n] together with real y.sub.[m],,[k,n] parts of the transform coefficients (see Eqs. 3 and 4). Various pooling methods are possible.

(57) FIGS. 14 and 15 provide detailed descriptions of the activities in step 302, 304 and 306 in FIG. 3. FIG. 14 describes the feature maps generation during the training process. The description focuses on the positive aspects of qWPs. For each description on positive qWPs there is analogous description on negative qWPs. In 1402, filters for qWPs+ transforms from qWPs+ waveforms are generated. The filters are applied to the training image in 1404 using Eq. 3. From this, 22 blocks in a first decomposition level are generated in 1406. The filters for qWPs+ in 1402 are applied to 1406 to generate in 1408 a 44 block in a second decomposition level. Eq. 4 uses for example the h nom to generate the 44 blocks in 1410. The entries in 1410 are the positive entries in a feature maps vector 1412. As mentioned, the same procedure is applicable to filters for qWPs-transforms. The vector of feature maps in 1412, which is concatenated from positive and negative feature maps, is the input with the given labels to neural nets-based classification in block 1414.

(58) FIG. 15 describes the same operation as FIG. 14 with the following differences: the filters for the qWPs+ and qWPs operate on a newly arrived input in 1504. Here the image numeral 2 is a specific illustrating example. The new image goes through the same steps as in FIG. 14 to generate a feature map in 1512. This new feature map is classified by the neural net in 1514 to generate the classified output in 1516. Blocks 1502, 1506, 1508 and 1510 are identical with blocks 1402, 1406, 1408 and 1410.

(59) The number of partitions of each image to blocks in each level and the number of decomposition levels are free parameters, and they are determined in FIG. 15 according to what was determined in FIG. 14. All the free parameters determined in FIG. 14 (such as the chosen norm in Eq. 4) are used in FIG. 15.

(60) Several computational components in FIGS. 14 and 15 such as computing the + and sections and the blocks 1404, 1406, 1408, 1410, 1504, 1506, 1508, 1510 can be processed in parallel in various types of processors. For example, they may be processed in a graphic processing unit (GPU).

(61) Example of Results

(62) We have conducted several experiments in order to test and demonstrate the feasibility of the qWP-based features extraction methods to fit image classifications by modifying DCNN. The feasibility tests were carried out on MNIST (called MNIST database).

(63) We use the library of directional spline-based qWPs whose design is described above to show that the extracted features (feature maps) can reduce substantially the size of the training data and thus speedup the classification.

(64) The results presented below confirm that directional wavelet packets originating from splines have the potential to be a constructive tool for high quality features extraction from images. We show here that these features can serve as a substitute for several or even all the convolutional layers in DCNNs architecture. The extracted features have adaptation capabilities to the image activities. Classification is made significantly faster and more efficient because of the need for fewer images and use of no (or very few) convolutional layers.

(65) MNIST Database

(66) The feasibility to achieve classification in images by a small training dataset while evaluating the performance of qWPs was tested for the extracted features from the MNIST database of handwritten digits. Sets S.sub.5000, S.sub.4000, S.sub.3000, S.sub.2000 and S.sub.1500 comprising respectively 5000, 4000, 3000, 2000 and 1500 MINST images of size 2828 were taken as the reference data (RD). Each image was padded by zeroes to have an expanded size of 6464, see FIG. 13. A four level qWP 2D transform was applied to each expanded image. 512 features F.sub.[4]F.sub.+[4]F.sub.[4], from the fourth decomposition level were extracted from each image. Seven sets of FMs .sub.M were generated, where M=5000, 4000, 3000, 2000, 1500, 1000, 500. The same processing was applied to the set T.sub.5000 of 5000 images from the MNIST database.

(67) As decision units, simple non-convolution NNs N.sub.M trained on the 7 sets of FMs were used. The trained NNs were used for classification of the images from set T.sub.5000. The NN comprised one input layer of one long short-term memory (LSTM). LSTM is an artificial recurrent neural network (RNN) architecture used in the field of machine learning and deep learning. LSTM has two to five fully connected layers and a softmax function is used as the activation function in the output layer of neural network models. No convolution layer was used and the neural network was used without any optimization.

(68) The qWPs transform coefficients are the inner products of the MINST data with 2D complex qWPs by EQ. 3 which is a 2D convolution of the image with T and the real part of the output from the convolution is used in EQ. 4 for deriving the feature maps.

(69) The classification results for set T.sub.5000 are given in Table 1:

(70) TABLE-US-00001 TABLE 1 NN N.sub.5000 N.sub.4000 N.sub.3000 N.sub.2000 N.sub.1500 N.sub.1000 N.sub.500 Classification % 97.72 97.12 96.78 95.78 94.76 93.08 90.16

(71) The classification results on the MINST database in Table 1 are compared with the results reported in http://yann.lecun.com/exdb/mnist (Yann LeCun, Corinna Cortes, Christopher J. C. Burges, The MNIST database of handwritten digits), where a conventional state of the art CNN was used for feature extraction and classification. Their training set included 60000 datapoints and their test set included 10000 datapoints, in sharp contrast with our training set of on average 3000 datapoints (images) and a test set of 5000 datapoints. While the classification accuracies using both methods are comparable, the conventional CNN of LeCun used 20 times more training images than the ones used to generate Table 1. That is, a modified CNN disclosed herein achieves similar accuracy to that of a conventional CNN, but needs only 1/20.sup.th of the number of training images. A better neural network than the elementary (non-optimized) one that was used to achieve the results in Table 1 will probably require even fewer training images. This speeds up the computation time and memory consumption of a processor in a system running the method.

(72) A method described above may be performed for example in a computer system described with reference to FIG. 16. The figure illustrates schematically in a block diagram a computer system 1600 that may comprise an object classification through neural networks engine (classifying engine) 1602. Classifying engine 1602 is configured to classify objects in computer system 1600 using a method disclosed above. Computer system 1600 may include a processor (processing unit), for example a central processing unit (CPU) 1604, a GPU 1606 and a memory 1608. Memory 1608 may include a first memory section 1608A and a second memory section 1608B.

(73) Computer system 1600 may receive inputs from a variety of data sources 1610. The inputs may be received through an interface or input/output (I/O) device (not shown). Non-limiting examples of data sources 1610 may include (in addition to the MNIST database above): Waymo (https://waymo.com), formerly the Google self-driving car project; Classification of X-rays images on Osteoarthrosis knees, https://doi.org/10.6084/m9._gshare.8139545.v1, 2019; CheXpert69, Stanford Hospital, 224,316 chest radiographs from 65,240 patients, presence/absence of 14 pathologies; Data input to classifying engine 1602 may be of versatile structure and formats, and their volume and span (the number of parameters) may be theoretically unlimited. Computer system 1600 may also be referred to as a modified CNN system.

(74) In an exemplary use case, in an initial (training) processing stage (or phase), splines and qWPs are loaded into memory section 1608A. The splines and the qWPs are processed by CPU 1604 to generate waveforms and qWP-based filters 1612. The waveforms and qWP-based filters are generated independently of the imaging data. In the training stage, the input data is loaded and stored into memory section 1608B. The input data is processed in CPU 1604 using waveforms and qWP-based filters 1612. In addition and/or optionally, some of the processing above may be carried out in GPU 1606. The CPU-GPU based processing generates a training-data based feature map 1614 for each input image. The training data-based feature maps are input to neural network classifying engine 1602. This completes the training stage.

(75) In a second, classification stage, a newly arrived image, which did not participate in the training stage, is processed in CPU 1604 (or in addition and/or optionally in GPU 1606) using the waveforms and qWP-based filters 1612 generated in the training stage, to generate a new feature map 1614. NN classifying engine 1602 uses the training-based feature maps and the new feature map for classification of at least one object in the new image and/or of the entire new image to output a classified object/image 1616 to a user. The output may be done through an interface or I/O device (not shown). The user may be for example a driver of a vehicle, a controller of an autonomous vehicle, a physician or surgeon (for medical image data), etc.

(76) In an exemplary application, a computer system like system 1600 running a method disclosed herein may be incorporated in a vehicle (see FIG. 17) to process images acquired by cameras and/or other sensors (e.g. radars, laser scanners, etc.) in the vehicle to provide real-time inputs for autonomous driving. For example, the classified object may be a person, a pedestrian, a cyclist, another vehicle, an obstacle on a road, a traffic sign, etc.

(77) In conclusion, the proposed replacement of the convolutional layers in DCNN architecture, followed by the application of directional wavelet packets to small data sizes, leads to a construction that extracts features with a performance achieved by a fully equipped DCNN with much bigger datasets and with a larger number of convolutional layers.

(78) Some stages of the aforementioned methods may also be implemented in a computer program for running on the computer system, at least including code portions for performing steps of the relevant method when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the disclosure. Such methods may also be implemented in a computer program for running on the computer system, at least including code portions that make a computer execute the steps of a method according to the disclosure.

(79) A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, a method, an implementation, an executable application, an applet, a servlet, a source code, code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.

(80) The computer program may be stored internally on a non-transitory computer readable medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.

(81) A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.

(82) The computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to the computer program and produces resultant output information via I/O devices.

(83) The connections discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, a plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.

(84) Optionally, the illustrated examples may be implemented as circuitry located on a single integrated circuit or within a same device. Alternatively, the examples may be implemented as any number of separate integrated circuits or separate devices interconnected with each other in a suitable manner. Optionally, suitable parts of the methods may be implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.

(85) Other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense. While certain features of the disclosure have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the disclosure. It will be appreciated that the embodiments described above are cited by way of example, and various features thereof and combinations of these features can be varied and modified. While various embodiments have been shown and described, it will be understood that there is no intent to limit the disclosure by such disclosure, but rather, it is intended to cover all modifications and alternate constructions falling within the scope of the disclosure, as defined in the appended claims.

(86) Unless otherwise stated, the use of the expression and/or between the last two members of a list of options for selection indicates that a selection of one or more of the listed options is appropriate and may be made.

(87) It should be understood that where the claims or specification refer to a or an element, such reference is not to be construed as there being only one of that element.

(88) All references mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual reference was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure.

Feature extraction by directional wavelet packets for image processing by neural networks

Assignee

Inventors

Cpc classification

Classification Explorer

G06V10/893

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

International classification

Classification Explorer

G06V10/00

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V10/88

PHYSICS

Abstract

Claims

Description