Feature extraction by directional wavelet packets for image processing by neural networks
12288374 ยท 2025-04-29
Assignee
Inventors
Cpc classification
International classification
G06V10/00
PHYSICS
G06V10/44
PHYSICS
Abstract
Methods and systems that replace convolutional layers of a convolutional neural network (CNN) with quasi-analytic directional wavelet packet (qWP)-based filters, and which use the qWP-based filters to perform filtering and extract features from image data. The extracted features are then used by the CNN to perform a classification task. The results of the classification task are output to a user.
Claims
1. A method, comprising: applying quasi-analytical directional wavelet packet (qWP)-based filtering to image data to extract feature maps, wherein the applying includes applying the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and applying the qWP-based filtering to a new image to extract a new feature map in a classification stage; applying neural network (NN) processing to the extracted new feature map to classify an object in the new image and/or the new image; and outputting the classified object and/or the new image to a user.
2. The method of claim 1, wherein the applying qWP-based filtering to image data is preceded by generating qWPs using discrete or polynomial splines, and by using the generated qWPs to obtain the qWP-based filters.
3. The method of claim 1, wherein the applying of NN processing to the extracted new feature map to classify the object and/or image includes using the training-based feature maps and the new feature map for classification of the object in the new image and/or of the entire new image.
4. The method of claim 1, wherein the processing unit is a graphic processing unit.
5. The method of claim 1, wherein the method is performed in a vehicle.
6. The method of claim 1, wherein the neural network is a convolutional neural network.
7. A system, comprising: a processing unit configured to apply quasi-analytical directional wavelet packet (qWP)-based filtering to image data to extract feature maps, wherein the configuration includes a configuration to apply the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and to apply to qWP-based filtering to a new image to extract a new feature map in a classification stage; a neural network (NN) classifying engine configured to apply NN processing to the extracted feature maps to classify an object in the new image and/or the new image; and an interface or input/output device for outputting the classified object and/or the new image to a user.
8. The system of claim 7, wherein the processing unit configuration to apply qWP-based filtering to image data to extract feature maps includes a configuration to generate qWPs using discrete or polynomial splines, and to use the generated qWPs to obtain qWP-based filters used in the qWP-based filtering.
9. The system of claim 7, wherein the NN classifying engine configuration to apply NN processing to the extracted feature maps to classify the object and/or the image includes a configuration to use the training-based feature maps and the new feature map to classify the object in the new image and/or the entire new image.
10. The system of claim 7, wherein the processing unit is a graphic processing unit.
11. The system of claim 7, wherein the system is included in a vehicle.
12. The system of claim 7, wherein the neural network is a convolutional neural network.
13. A method, comprising: replacing convolutional layers of a convolutional neural network (CNN) with quasi-analytic directional wavelet packet (qWP)-based filters; using the qWP-based filters to perform filtering and extract features from training image data, wherein the using includes applying the qWP-based filtering to training image data to extract training-based feature maps in a training stage, and applying the qWP-based filtering to a new image to extract a new feature map in a classification stage; using fully connected layers of the CNN to perform a classification task using the extracted features to classify an object in the new image and/or the new image; and outputting the classified object and/or the new image to a user.
14. The method of claim 13, wherein the replacing of the convolutional layers of the CNN with qWP-based filters includes replacing all of the convolutional layers of the CNN with the qWP-based filters.
15. The method of claim 13, wherein the replacing of the convolutional layers of the CNN with qWP-based filters is preceded by generating qWPs using discrete or polynomial splines, and by using the generated qWPs to obtain the qWP-based filters.
16. The method of claim 13, wherein the using of fully connected layers of the CNN to perform a classification task using the extracted features includes using the training-based feature maps and the new feature map to perform the classification task.
17. The method of claim 13, wherein the method is performed in a vehicle.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Non-limiting examples of embodiments disclosed herein are described below with reference to figures attached hereto that are listed following this paragraph. Identical structures, elements or parts that appear in more than one figure are generally labeled with a same numeral in all the figures in which they appear. The drawings and descriptions are meant to illuminate and clarify embodiments disclosed herein, and should not be considered limiting in any way. In the drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
DETAILED DESCRIPTION
(19)
(20) In contrast with the use of a conventional CNN in step 302 of
(21)
(22)
(23) Some advantages of using the process of step 304 instead of that of step 202 include: fast and accelerated classification computations, since there is no need to have either convolutional layers or huge data sets; universal waveforms and filters that do not depend on specific imaging; and construction of features that fit the structure of the image by using directional wavelet packet filters.
(24) Following is a detailed enabling and exemplary way of implementing a method disclosed herein. The section titled quasi-analytic directional wavelet packets outlines the design of qWPs. The design used exemplarily the Hilbert transform (HT) of orthonormal spline-based wavelet packets (sWPs), originated from polynomial and discrete splines. Their shapes and spectra serve as building blocks for the design of the qWPs. The design scheme is illustrated by a diagram in
(25) The set of the designed complex qWPs is described in the section titled Directionality of real-valued 2D WPs. The qWP .sub.+ consists of two groups G.sub.+={.sub.++} and G.sub.={.sub.+} whose discrete Fourier transform (DFT) spectra form a variety of tilings for quadrants q.sub.0 and q.sub.1, respectively (see
(26) The size of the covering squares decreases according to the increase of the decomposition level. It is explained later how a directional structure of the qWPs spectra determines the structure of the respective waveforms (see Eq. 1). It is shown that the waveforms are close to windowed cosines with multiple frequencies oriented in multiple directions. The magnitude of the spectra of directional qWPs and the qWPs themselves are displayed in
(27) The section titled WP transforms with quasi-analytics WPs presents an exemplary scheme for signals and image transforms using the designed qWPs. An exemplary scheme for the 1D case is illustrated by a diagram in
(28) The section titled Extraction of characteristic features introduces characteristic features to be extracted from the images. The set of the transform coefficients of an image I consists of blocks. A block B comprises correlation coefficients of X with shifts of a certain qWP , which is close to a windowed cosine with a certain frequency (in the spatial domain) that is oriented in a certain direction. Thus, the coefficients from block B testify about the presence of fragments in the image I that are oriented (approximately) and oscillating (approximately) as the qWP . The average of moduli of the coefficients from block B is a measure of the contribution of such fragments to the image I. This average of the moduli is taken as a characteristic feature of image I that is related to qWP . The collection of all features (feature map) characterizes the decompositions and oscillations distribution within image I. Note that the average of the moduli is just one possible measure, and other measures may be used as well.
(29) Quasi-Analytic Directional Wavelet Packets
(30) The library of orthonormal wavelet packets originating from discrete and polynomial splines of multiple spline orders (sWPs) forms the building blocks for the design of directional wavelet packets as seen in
(31) In an example, design of directional WPs is achieved by the following steps: 1. The Hilbert transform is applied to the set {} of orthonormal sWPs thus producing the set {=H()}see a in
(32) Notation: There are 2.sup.m 1D WPs of any kind (sWPs, cWPs and qWPs) at the m-th decomposition level, which are denoted by {.sub.[m],}, {.sub.[m],} and {.sub.[m],=.sub.[m],i.sub.[m],}, respectively, where =0, . . . , 2.sup.m1. Consequently, there are 2.sup.2m 2D tensor-product WPs, which are denoted by {.sub.[m],, },{.sub.[m],, } and{.sub.+[m],, }, (respectively, where ,=0, . . . , 2.sup.m1. Real 2D qWPs are denoted by {.sub.[m],, =Re(.sub.+[m],, )}.
(33)
(34) A localized directional of the spectra of qWPs {.sub.+} and {.sub.}, seen in
(35) Directionality of Real-Valued 2D WPs
(36) Assume, for example, that N=512, m=3, =2, =5. Denote [k,n].sub.++[3],2,5[k,n][k,n]Re([k,n]). The spectrum |{circumflex over ()}[,], displayed in
(37)
(38)
(39) The spectrum of the 2D signal comprises only low frequencies in both directions and it does not have a directionality, but the 2D signal
(40)
(41) Thus, shapes of the real qWPs {.sub.[m]} are close to windowed cosines with multiple frequencies (that depend on the distances of the corresponding frequency squares from the origin), which are oriented in multiple directions (2(2.sup.m+11) at level m). The qWPs {.sub.+} are generally oriented to north-east (
(42) The magnitude spectra of the real qWPs {.sub.+} and {.sub.} from the second decomposition level are displayed in
(43) WP Transforms with Quasi-Analytic WPs
(44) In an example, the qWP transforms are executed in the frequency domain using the Fast Fourier Transform (FFT). Assume that the signals to be processed belong to the space [N] of N-periodic discrete-time signals where N=2.sup.j.
(45) The sets Z.sub.[m] of the transform coefficients with the qWPs {.sub.[m]} from the decomposition level m consist of 2.sup.m+1 wavelet blocks Z.sub.[m]=.sub.=0.sup.2.sup.
z.sub.[m][k]=x,.sub.[m],[2.sup.mk]
=.sub.l=0.sup.N1.sub.[m],*[l2.sup.mk]x[l]=y.sub.[m][k]ic.sub.[m][k]
y.sub.[m][k]=x,.sub.[m],[2.sup.mk]
,c.sub.[m][k]=
x,.sub.[m][2.sup.mk]
.(2) The transforms are implemented in a multiresolution mode by multirate filtering. The structure of the filter bank for the transform of a signal x to the first decomposition level differs from the structure of the filter banks for subsequent levels. Define the filters P.sub., F.sub., Q.sub. by their impulse responses:
p.sub.[k]=.sub.[1],,f.sub.[k]=.sub.[1],,[q.sub.[k]=p.sub.[k]if.sub.[k], k=0, . . . , N1,=0,1. The four blocks Z.sub.+[1],0Z.sub.+[1],1Z.sub.[1],0Z.sub.[1],1 of the first-level transform coefficients are derived by filtering the signal x followed by down-sampling, such as z.sub.[1],[k]=.sub.l=0.sup.N1q.sub.[l2k]x[l], =0,1.
(46) The frequency responses of the filters P.sub. are {circumflex over (p)}.sub.[n]=.sub.k=0.sup.N1e.sup.2ikn/Np.sub.[k]={circumflex over ()}.sub.1,[][n], n=0, . . . , N1, =0,1. The filters P.sub.[m],m=1, . . . , M, =0,1 for the transform from the first to the subsequent decomposition levels are defined via their frequency responses: {circumflex over (p)}.sub.[m][n]={circumflex over (p)}.sub.[2.sup.mn], m=1, . . . , M, =0,1. Thus, the qWP transform from the first to the second decomposition level is
z.sub.[2]0[k]=.sub.l=0.sup.N/21p.sub.[1]0[l4k]z.sub.[1]0[l] z.sub.[2]1[k]=.sub.l=0.sup.N/21p.sub.[1]1[l4k]z.sub.[1]0[l]
z.sub.[2]2[k]=.sub.l=0.sup.N/21p.sub.[1]1[l4k]z.sub.[1]1[l] z.sub.[2]3[k]=.sub.l=0.sup.N/21p.sub.[1]0[l4k]z.sub.[1]1[l] The transforms to the subsequent decomposition levels are executed similarly using the filters P.sub.[m],, m=2, . . . , M, =0,1. The diagram in
(47) Extraction of characteristic features (feature maps): the connection between universal waveforms and wavelet packets coefficients generated by universal filtering P.sub., F.sub., Q.sub.
(48) The qWP transform coefficients are the inner products of image X with 2D complex qWPs:
(49)
(50) Consequently, the coefficients y.sub.[m],,[k,n]=Re(z.sub.[m],,[k,n]) are the correlation coefficients of image X with the directional qWPs .sub.[m],,[2.sup.mk,2.sup.mn]=Re(.sub.[m],,[2.sup.mk,2.sup.mn]). As mentioned, the qWPs {.sub.[2],,,,=0,1,2,3} from the second decomposition level are displayed in
(51) Denote by B.sub.+[m],,+{y.sub.+[m],,[k,n]}.sub.k,n=0.sup.N/2.sup.
(52)
(53)
(54)
(55) Denote by F.sub.+[m] the level-m feature map (FM), which is the 2D array F.sub.+[m]{F.sub.+[m],,}.sub.,=0.sup.2.sup.
(56) Each qWP transform coefficient of an image I has a certain physical meaning: it evaluates the presence of an event with a certain direction that is oscillating with a certain frequency in a certain patch of the image. The FMs defined in Eq. 4 inherited these properties except for the localization in the image. Utilization of these physical meanings of the transform coefficients makes it possible to design a variety of feature extraction schemes that can be optimized for different classes of problems. It is worth mentioning the variability of the qWP libraries (choice between the generating polynomial and discrete splines and selection of spline's order). There is an option to extend the FMs by the usage of the imaginary c.sub.[m],,[k,n] together with real y.sub.[m],,[k,n] parts of the transform coefficients (see Eqs. 3 and 4). Various pooling methods are possible.
(57)
(58)
(59) The number of partitions of each image to blocks in each level and the number of decomposition levels are free parameters, and they are determined in
(60) Several computational components in
(61) Example of Results
(62) We have conducted several experiments in order to test and demonstrate the feasibility of the qWP-based features extraction methods to fit image classifications by modifying DCNN. The feasibility tests were carried out on MNIST (called MNIST database).
(63) We use the library of directional spline-based qWPs whose design is described above to show that the extracted features (feature maps) can reduce substantially the size of the training data and thus speedup the classification.
(64) The results presented below confirm that directional wavelet packets originating from splines have the potential to be a constructive tool for high quality features extraction from images. We show here that these features can serve as a substitute for several or even all the convolutional layers in DCNNs architecture. The extracted features have adaptation capabilities to the image activities. Classification is made significantly faster and more efficient because of the need for fewer images and use of no (or very few) convolutional layers.
(65) MNIST Database
(66) The feasibility to achieve classification in images by a small training dataset while evaluating the performance of qWPs was tested for the extracted features from the MNIST database of handwritten digits. Sets S.sub.5000, S.sub.4000, S.sub.3000, S.sub.2000 and S.sub.1500 comprising respectively 5000, 4000, 3000, 2000 and 1500 MINST images of size 2828 were taken as the reference data (RD). Each image was padded by zeroes to have an expanded size of 6464, see
(67) As decision units, simple non-convolution NNs N.sub.M trained on the 7 sets of FMs were used. The trained NNs were used for classification of the images from set T.sub.5000. The NN comprised one input layer of one long short-term memory (LSTM). LSTM is an artificial recurrent neural network (RNN) architecture used in the field of machine learning and deep learning. LSTM has two to five fully connected layers and a softmax function is used as the activation function in the output layer of neural network models. No convolution layer was used and the neural network was used without any optimization.
(68) The qWPs transform coefficients are the inner products of the MINST data with 2D complex qWPs by EQ. 3 which is a 2D convolution of the image with T and the real part of the output from the convolution is used in EQ. 4 for deriving the feature maps.
(69) The classification results for set T.sub.5000 are given in Table 1:
(70) TABLE-US-00001 TABLE 1 NN N.sub.5000 N.sub.4000 N.sub.3000 N.sub.2000 N.sub.1500 N.sub.1000 N.sub.500 Classification % 97.72 97.12 96.78 95.78 94.76 93.08 90.16
(71) The classification results on the MINST database in Table 1 are compared with the results reported in http://yann.lecun.com/exdb/mnist (Yann LeCun, Corinna Cortes, Christopher J. C. Burges, The MNIST database of handwritten digits), where a conventional state of the art CNN was used for feature extraction and classification. Their training set included 60000 datapoints and their test set included 10000 datapoints, in sharp contrast with our training set of on average 3000 datapoints (images) and a test set of 5000 datapoints. While the classification accuracies using both methods are comparable, the conventional CNN of LeCun used 20 times more training images than the ones used to generate Table 1. That is, a modified CNN disclosed herein achieves similar accuracy to that of a conventional CNN, but needs only 1/20.sup.th of the number of training images. A better neural network than the elementary (non-optimized) one that was used to achieve the results in Table 1 will probably require even fewer training images. This speeds up the computation time and memory consumption of a processor in a system running the method.
(72) A method described above may be performed for example in a computer system described with reference to
(73) Computer system 1600 may receive inputs from a variety of data sources 1610. The inputs may be received through an interface or input/output (I/O) device (not shown). Non-limiting examples of data sources 1610 may include (in addition to the MNIST database above): Waymo (https://waymo.com), formerly the Google self-driving car project; Classification of X-rays images on Osteoarthrosis knees, https://doi.org/10.6084/m9._gshare.8139545.v1, 2019; CheXpert69, Stanford Hospital, 224,316 chest radiographs from 65,240 patients, presence/absence of 14 pathologies; Data input to classifying engine 1602 may be of versatile structure and formats, and their volume and span (the number of parameters) may be theoretically unlimited. Computer system 1600 may also be referred to as a modified CNN system.
(74) In an exemplary use case, in an initial (training) processing stage (or phase), splines and qWPs are loaded into memory section 1608A. The splines and the qWPs are processed by CPU 1604 to generate waveforms and qWP-based filters 1612. The waveforms and qWP-based filters are generated independently of the imaging data. In the training stage, the input data is loaded and stored into memory section 1608B. The input data is processed in CPU 1604 using waveforms and qWP-based filters 1612. In addition and/or optionally, some of the processing above may be carried out in GPU 1606. The CPU-GPU based processing generates a training-data based feature map 1614 for each input image. The training data-based feature maps are input to neural network classifying engine 1602. This completes the training stage.
(75) In a second, classification stage, a newly arrived image, which did not participate in the training stage, is processed in CPU 1604 (or in addition and/or optionally in GPU 1606) using the waveforms and qWP-based filters 1612 generated in the training stage, to generate a new feature map 1614. NN classifying engine 1602 uses the training-based feature maps and the new feature map for classification of at least one object in the new image and/or of the entire new image to output a classified object/image 1616 to a user. The output may be done through an interface or I/O device (not shown). The user may be for example a driver of a vehicle, a controller of an autonomous vehicle, a physician or surgeon (for medical image data), etc.
(76) In an exemplary application, a computer system like system 1600 running a method disclosed herein may be incorporated in a vehicle (see
(77) In conclusion, the proposed replacement of the convolutional layers in DCNN architecture, followed by the application of directional wavelet packets to small data sizes, leads to a construction that extracts features with a performance achieved by a fully equipped DCNN with much bigger datasets and with a larger number of convolutional layers.
(78) Some stages of the aforementioned methods may also be implemented in a computer program for running on the computer system, at least including code portions for performing steps of the relevant method when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the disclosure. Such methods may also be implemented in a computer program for running on the computer system, at least including code portions that make a computer execute the steps of a method according to the disclosure.
(79) A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, a method, an implementation, an executable application, an applet, a servlet, a source code, code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
(80) The computer program may be stored internally on a non-transitory computer readable medium. All or some of the computer program may be provided on computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; nonvolatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.
(81) A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
(82) The computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to the computer program and produces resultant output information via I/O devices.
(83) The connections discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, a plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.
(84) Optionally, the illustrated examples may be implemented as circuitry located on a single integrated circuit or within a same device. Alternatively, the examples may be implemented as any number of separate integrated circuits or separate devices interconnected with each other in a suitable manner. Optionally, suitable parts of the methods may be implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.
(85) Other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense. While certain features of the disclosure have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the disclosure. It will be appreciated that the embodiments described above are cited by way of example, and various features thereof and combinations of these features can be varied and modified. While various embodiments have been shown and described, it will be understood that there is no intent to limit the disclosure by such disclosure, but rather, it is intended to cover all modifications and alternate constructions falling within the scope of the disclosure, as defined in the appended claims.
(86) Unless otherwise stated, the use of the expression and/or between the last two members of a list of options for selection indicates that a selection of one or more of the listed options is appropriate and may be made.
(87) It should be understood that where the claims or specification refer to a or an element, such reference is not to be construed as there being only one of that element.
(88) All references mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual reference was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure.