Fast and robust friction ridge impression minutiae extraction using feed-forward convolutional neural network
11430255 · 2022-08-30
Assignee
Inventors
Cpc classification
International classification
Abstract
Disclosed is a system and method for rapid noise-robust friction ridge impression minutiae extraction from digital signal using fully convolutional feed-forward neural network. The proposed neural network based system outperforms classical approaches and other neural network based systems for minutiae extraction in both speed and accuracy. The minutiae extracted using the system can be used at least for tasks such as biometric identity verification, identification or dactyloscopic analysis.
Claims
1. A neural network system implemented by one or more computers, said neural network system comprising: a convolutional neural network, wherein the convolutional neural network is trained and configured to: for each biometric input signal processed by the neural network system, receive the biometric input signal at a first layer block of the convolutional neural network; pass the biometric input signal through a plurality of layer blocks comprising an increasing number of channels and reduced spatial resolution of the output feature map with respect to the biometric input signal, wherein the plurality of layer blocks comprise layers and each layer comprises a nonlinear activation function; and produce an output feature map by propagating the output of a last layer block into a plurality of convolutional branches; wherein the neural network system further comprises a subsystem, wherein the subsystem is configured to: receive the output feature map from the neural network; decode the output feature map; and output a decoded feature map representing friction ridge impression minutiae.
2. The system of claim 1, wherein decoding the output feature map comprises converting from the convolutional neural network output feature map to a friction ridge impression minutiae numeric representation, wherein minutiae numeric representation includes at least: class, rotation, and location.
3. The system of claim 2, wherein minutia class is one of: line ending, bifurcation, or none of the above.
4. The system of claim 1, wherein the convolutional neural network is a fully convolutional neural network.
5. The system of claim 1, wherein the biometric input signal is a digital friction ridge impression image.
6. The system of claim 1, wherein the output feature map comprises the output of the convolutional layer branches.
7. The system of claim 1, wherein the nonlinear activation function of a layer is a nonlinear pointwise activation function chosen from: Sigmoid, Hyperbolic Tangent, Concatenated ReLU, Leaky ReLU, Maxout, ReLU, ReLU-6, and Parametric ReLU.
8. The system of claim 1, wherein convolution is one of: regular convolution, depthwise separable convolution, or grouped convolution in combination with 1×1 convolutions or other type of convolution.
9. The system of claim 1, wherein the convolutional branches comprise a loss function, and wherein the loss function is a multi-loss function comprising multiple loss components.
10. The system of claim 9, wherein the multi-loss function components comprise at least: positive class loss, negative class loss, localization loss, and orientation loss.
11. The system of claim 10, wherein minutia positive class estimation is a classification problem.
12. The system of claim 10, wherein minutia negative class estimation is a classification problem.
13. The system of claim 10, minutia orientation estimation is a regression problem.
14. The system of claim 10, minutia localization estimation is a regression problem.
15. The system of claim 1, wherein the source for each biometric input signal is one of: biometric reader or loaded from memory.
16. The system of claim 1, wherein training the neural network process includes encoding friction ridge impression minutiae.
17. The system of claim 1, wherein training the neural network process includes generating an augmented biometric input signal.
18. The system of claim 1, wherein the output feature map is comprised of class, orientation, and location channels.
19. The system of claim 1, wherein each feature in the output feature map has a spatial resolution roughly equal to ⅛ of the biometric input signal resolution.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The novel features, aspects and advantages of preferred embodiments will be best understood by reference to the following detailed description when read in conjunction with the accompanying drawings and appended claims wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
(10) Disclosed system (100) for fingerprint minutiae extraction is roughly demonstrated in
(11) Several properties of convolutional neural networks and nonlinear activation function as presented below are very important to fingerprint minutia extraction process. Convolutional layers are hugely important due to their locality, which means that when an image is processed with convolutional layer, local patterns, located nearby in pixel space, are related. Translation invariance is another import property of convolutional layer, which provides neural network the ability to register presence of a specific visual pattern regardless of where in the image that specific pattern appears. In another words convolutional network can learn spatial representations and make decisions based on local spatial input. Data in said convolutional network can be represented as a three-dimensional array of size n×h×w, where h and w are spatial dimensions, and n is a feature or color channel dimension. The input image has dimensions h×w, i.e. height and width, and n color channels. In RGB color image n would be equal to 3 where each channel would often represent red, green and blue color values, in black and white n would be equal to a one—single grayscale intensity channel value. As the raw fingerprint image is fed into a convolutional neural network data goes through multiple convolution layers where each layer performs data transformation. One of the ways to look at said transformation is that value at a specific location in the input image represent pixel color value, but in subsequent layers data is converted into higher abstraction level features. Each feature in a higher layer preserve their path-connection to the original locations in the input image, which is also called: a receptive field of that feature. Formally convolutional layer with activation function ƒ can be characterized by a tensor W∈.sup.n.sup.
.sup.n.sup.
y=ƒ(W*x)
where y.sub.o=ƒ(Σ.sub.n=1.sup.n.sup.
(12) Furthermore, in a preferred embodiment to improve computational performance for neural network training and inference depthwise separable convolution operation may be used. It is possible to achieve comparable or even better quality by using regular convolution and comparable speed performance using grouped convolutions in combination with 1×1 convolutions. It has to be noted that even more alternative convolution operators may be used to achieve similar or better results, but our experiments shown that optimal performance in a set of hardware and software environments is achieved using depthwise separable convolutions. In fact, depthwise separable convolution provide speed improvement at least over regular convolution that allows targeting applications that are executed on hardware lacking GPU or any other special hardware.
(13) In regular convolution as presented earlier a single convolution kernel deals with n input channels. Depthwise separable convolution on the other hand splits convolution into two parts: a depthwise (DW) convolution and pointwise (PW) convolution. Depthwise convolution focuses on locality by applying n 2D convolution kernels separately for each of n input channels. Thus, convolving over n.sub.i input channels produces a stacked together n.sub.i channel tensor. Pointwise (1×1) convolution on the other hand focuses on relation between channels. To ensure the same shape output as the regular convolution W, DW is defined as the convolution kernel tensor D∈.sup.n.sup.
.sup.n.sup.
y′.sub.o=ƒ.sub.1(Σ.sub.i=1.sup.n.sup.
where P.sub.o,i=P[o, i,:,:] and D.sub.o=D[o,:,:,:], ƒ.sub.0 and ƒ.sub.1 are elementwise nonlinear activation functions. The computational complexity for the whole feature map is O(H×W×(n.sub.i×k.sub.h×k.sub.w+n.sub.i×n.sub.o))
(14) Alternatively, it possible to switch the convolution order and apply PW convolution before DW to obtain another factorization form PW+DW.
(15) Although many options are available for a nonlinear activation function like Sigmoid or Hyperbolic Tangent, Concatenated ReLU, Leaky ReLU, Maxout, ReLU-6, Parametric ReLU to name a few. The properties that are desirable for a nonlinear activation function are: non-saturation of its gradient, which greatly accelerates the convergence of stochastic gradient descent compared to the likes of sigmoid or hyperbolic tangent functions, reduced likelihood of vanishing gradient, and sparsity-inducing regularization. ReLU, among several other mentioned activation functions, has the properties listed above and is also used in a preferred embodiment as an elementwise nonlinear activation function ƒ. ƒ.sub.0 and ƒ.sub.1 may differ or one of them may be equivalent to ƒ.sub.i(x)=x but in a preferred embodiment ƒ, ƒ.sub.0 and ƒ.sub.1 represent ReLU pointwise activation function, which is defined as follows:
(16)
(17) Also, it is important to understand ReLU computational superiority over activation functions like sigmoid or hyperbolic tangent which involve computationally expensive calculations: exponential and arithmetic operations, ReLU on the other hand can be implemented by simply thresholding a matrix of activations at zero.
(18) It is also important to note that distribution of each layer's inputs has significant variation at least due to previous layer parameter change during training. Distribution variation tends to slow down the training process by requiring lower learning rates and careful parameter initialization. To overcome this problem in a preferred embodiment a batch normalization is used. It allows to use higher learning rates and increases neural network tolerance towards initialization parameters. Moreover, batch normalization also acts as a regularization technique which decreases risk of model overfitting. In fact, in a preferred embodiment batch normalization is used after first convolutional layer and in all depthwise separable convolutions after depthwise convolution (DW) and after pointwise convolution (PW). It should be understood that use of batch normalization or alternative regularization means like a dropout layer within neural network architecture is flexible and similar or better results can be achieved be reordering layers in different manner.
(19) When training a neural network it is also important to define the training target. Training problem is defined with respect to the problem class neural network is expected to solve. Training methods may be chosen in multitude of different ways with varying result, but in a preferred embodiment orientation and localization training is defined as regression problem and for determining fingerprint minutia class as classification problem. To evaluate how neural network is performing at a given training step with provided input data and expected output result we define a loss or error function.
(20) Loss function is necessary to measure the inconsistency between predicted value y and actual value ŷ which is generated by the network for a given input sample. The evaluated error from incorrect predictions is then used to iteratively adjust neural network weights or convolutional filter values. The multi-loss function in a preferred embodiment consists of four parts: classification, negative classification, localization regression and orientation regression as follows:
L(y,ŷ,¬y,¬ŷ,l,{circumflex over (l)},o,ô)=m.sub.p*L.sub.cls(y,ŷ)+m.sub.n*L.sub.¬cls(¬y,¬ŷ)+m.sub.p*L.sub.loc(l,{circumflex over (l)})+m.sub.p*L.sub.ori(o,ô).
(21) Here m.sub.p(ŷ) and m.sub.n(ŷ) are masking factors calculated from ground truth minutia point confidence value. They are applied to all partial losses so that only minutia point concerned is contributing to the loss. The y, ¬y, l, o in multi-loss function represent the predicted probabilities of fingerprint minutia class candidate presence, absence, localization and orientation respectively. It should be noted that said multi-loss function can have less or more partial loss components that calculate loss over fingerprint feature parameters or meta-parameters.
(22) In a preferred embodiment for positive and negative classification softmax crossentropy sum is used as a partial loss function. For localization and orientation sum of differences between actual and predicted values as a partial loss regression function is used. Said partial loss functions are combined into multi-loss function as defined earlier, which in turn is used to do the overall neural network loss estimation to make iterative weight adjustments. Said weight adjustment is performed by a specific optimizer function. Similarly as with other neural network parameters there are multitude of optimizers to choose from, Adagrad, Adadelta, RMSprop just to name a few, but in a preferred embodiment Adam optimizer is used.
(23) Another aspect of the neural network training process that often has significant impact onto training convergence is a method of initializing neural network connection weights and convolutional filters. The neural network may be initialized in multiple ways. In a preferred embodiment neural network weights and convolutional filter values are initialized at random. In alternative embodiments initial values may be set to zeros or values according to some specific heuristic. In yet another embodiment neural network initial weights or convolutional filter values or both are initialized from previously trained neural network trained for different biometric modality or other visual signal set which is also called transfer learning.
(24) Another way to describe the neural network training process is to divide it into several steps. A generic exemplary neural network training process (300) is demonstrated in
(25)
(26) Training data augmentation process (502) wherein the dataset is extended by generating new data (608) from existing data (606) using various data transformation techniques (607) is illustrated in
(27) The training itself can be carried out using widely available neural network software frameworks like Caffe, PyTorch, Tensorflow or using other appropriate means. It is expected that during the training process overall quality measure of the network will converge to an optimal value. There is a multitude of strategies for choosing when to stop training and how to choose best trained model from among intermediate trained models but in general said optimal value usually depends on the training data itself so the training process is usually halted as soon as there are indications of trained neural network model overfitting on test or validation data.
(28) After training is completed and desired accuracy levels are achieved one can utilize the trained neural network (800) as illustrated in
(29) In another embodiment trained neural network may be used in a dynamic setting (400), where neural network fine-tuning or re-training is performed as signals from initial dataset are updated, removed or new signals added. Firstly, acquired input data (401) is augmented (402) if necessary using similar means (502) as for training data. Later, neural network is fine-tuned on augmented data in step (403). Finally, the model of fine-tuned neural network is stored (404).
(30) In yet another embodiment the system and method for signal feature extraction can be used for the purpose of classification, acquisition, person verification or identification of elements or segments of data signals using the neural network disclosed in current invention.
(31) It is obvious to one skilled in art that due to the nature and current and foreseeable state of neural networks research, architecture disclosed herein apart from fingerprints can be applied to other biometric modalities like palmprints, footprints or even veins, irises and faces. In case of palmprint and footprint the friction ridge pattern structure is similar to fingerprint so the disclosed method can be applied without significant modifications, veins, irises and even more so faces have visual structure that differ significantly, but regardless of that vein pattern local feature points and face landmarks have at least locality in common which is a crucial property of disclosed method.
(32) As can be understood, the present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope. It will be also understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations.
REFERENCES CITED
(33) U.S. Pat. No. 5,572,597—Fingerprint classification system U.S. Pat. No. 5,825,907—Neural network system for classifying fingerprints U.S. Pat. No. 5,892,838—Biometric recognition using a classification neural network U.S. Pat. No. 7,082,394—Noise-robust feature extraction using multi-layer principal component analysis US 2006/0215883 A1—Biometric identification apparatus and method using bio signals and artificial neural network CN 107480649 A—Full convolutional neural network-based fingerprint sweat pore extraction method Bhavesh Pandya, G. C. A. A. A. A. T. V. A. B. T. M. M., 2018. Fingerprint classification using a deep convolutional neural network. 20184th International Conference on Information Management (ICIM), pp. 86-91. Branka Stojanovid, A. N. O. M., 2015. Fingerprint ROI segmentation using fourier coefficients and neural networks. 201523rd Telecommunications Forum Telfor (TELFOR)., pp. 484-487. Darlow, L. N. R. B., 2017. Fingerprint minutiae extraction using deep learning. 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 22-30. Dinh-Luan Nguyen, K. C. A. K. J., 2018. Robust Minutiae Extractor: Integrating Deep Networks and Fingerprint Domain Knowledge. 2018 International Conference on Biometrics (ICB). Hilbert, C.-F. C. E., 1994. Fingerprint classification system. United States of America, Patento Nr. 5,572,597. Kai Cao, D.-L. N. C. T. A. K. J., 2018. End-to-End Latent Fingerprint Search. Sankaran, A. a. P. P. a. V. M. a. S. R., 2014. On latent fingerprint minutiae extraction using stacked denoising sparse AutoEncoders. IJCB 2014-2014 IEEE/IAPR International Joint Conference on Biometrics, pp. 1-7. Shrein, J. M., 2017. Fingerprint classification using convolutional neural networks and ridge orientation images. 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8. Thomas Pinetz, D. S. R. H.-M. R. S., 2017. Using a U-Shaped Neural Network for minutiae extraction trained from refined, synthetic fingerprints. Proceedings of the OAGM & ARW Joint Workshop 2017, pp. 146-151. Yao Tang, F. G. J. F., 2017. Latent fingerprint minutia extraction using fully convolutional network. 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 117-123. Yao Tang, F. G. J. F. Y. L., 2017. FingerNet: An unified deep network for fingerprint minutiae extraction. 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 108-116.