MULTI-MODAL FEW-SHOT LEARNING DEVICE FOR USER IDENTIFICATION USING WALKING PATTERN BASED ON DEEP LEARNING ENSEMBLE

Abstract

Disclosed is multi-modal few-shot learning device for user identification using a walking pattern based on deep learning ensemble. The device includes: a walking data collector configured to collect walking data of a user from a smart insole including any one or more of a pressure sensor, an acceleration sensor, and a gyro sensor; a preprocessor configured to convert a series of time series walking data obtained from each of the sensors included in the smart insole into a unit format data set; and an ensemble learner configured to apply an ensemble learning model that provides one final prediction by training CNN series learning and RNN series learning respectively and independently based on the unit-format data set generated by the preprocessor.

Claims

1. A multi-modal few-shot learning device for user identification using a walking pattern based on deep learning ensemble, the device comprising: a walking data collector configured to collect walking data of a user from a smart insole including any one or more of a pressure sensor, an acceleration sensor, and a gyro sensor; a preprocessor configured to convert a series of time series walking data obtained from each of the sensors included in the smart insole into a unit format data set; and an ensemble learner configured to apply an ensemble learning model that provides one final prediction by training CNN series learning and RNN series learning respectively and independently based on the unit-format data set generated by the preprocessor.

2. The device of claim 1, wherein the walking data collector comprises any one or more of n pressure sensors, the acceleration sensor, and the gyro sensor included in the smart insole, and collects the walking data of the user measured from the sensors.

3. The device of claim 2, wherein the n pressure sensors each measure a measurement level of foot pressure of both feet of the user is as 0, 1, or 2.

4. The device of claim 1, wherein the preprocessor processes a sampling rate of the smart insole to 100 Hz.

5. The device of claim 1, wherein the preprocessor further comprises: a unit vectorizer configured to vectorize a unit format of each series of time series walking data obtained from the pressure sensor, the acceleration sensor, and the gyro sensors included in the smart insole; a unit minimum length vectorizer configured to find data having a minimum length in unit-format vectorized data for each of the pressure sensor, the acceleration sensor, and the gyro sensors, and equalize a length of the unit-format vectorized data to the minimum length; and a unit vector set part configured to construct a minimum unit format data set from minimum unit format data equally processed to the minimum length.

6. The device of claim 5, wherein the unit vectorizer performs a convolution operation using N pressure values and an average of a Gaussian function in order to vectorize the time series walking data in a unit format, wherein the convolution operation is calculated according to $Z (t) = (X ? {(t)}^{*} y) (t) = \int_{0}^{t} X ? (τ) y (t - τ) d r$ $? indicates text missing or illegible when filed$ (where indicates an average of N pressure values, $y (t) = \frac{1}{\sqrt{2 π σ}} e^{- \frac{t^{2}}{2}} ?$ $? indicates text missing or illegible when filed$ indicates the N pressure values, and ).

7. The device of claim 5, wherein, in a case where a sorted list for each foot is [t0, t1, . . . , ti . . . ], $\frac{d}{d ?} z (t) = 0 and \frac{d^{2}}{{dt}^{2}} z (t) > 0$ $? indicates text missing or illegible when filed$ are applied with respect to every time t, and the unit vectorizer defines the time series walking data as a vectorized time series in a unit format wherein a discontinuous variable is calculated according to because a sample speed of the insole is 100 Hz and a standard length is defined as text missing or illegible when filed .

8. The device of claim 1, wherein the ensemble learner applies the ensemble learning model to a fully connected network and comprises: a CNN set constructor configured to construct a CNN series learning_vector_data set derived through the CNN series learning based on a minimum unit-format data set; and an RNN set constructor configured to construct an RNN series learning_vector_data set derived through the RNN series learning based on the minimum unit format data set.

9. The device of claim 8, wherein the ensemble learner further comprises a CNN-RNN set constructor configured to, in a test stage, construct an average data vector set of the CNN-series learning_vector_data set and the RNN-series learning_vector_data set to construct a final walking data set for identifying the user from the average data vector set

10. The device of claim 8, wherein the CNN series learning and the RNN series learning are respectively and independently trained using the CNN-series learning_vector_data set and the RNN-series learning_vector_data set, and wherein, in a test stage, an individual's softmax scores are calculated by taking an average of soft max scores in CNN and RNN.

11. The device of claim 8, wherein the CNN series learning or the RNN series learning is defined as text missing or illegible when filed (for tri-modal sensing) where a unit step of in a standard format, an acceleration , and rotation is used as inputs an output of a model is a vector of a soft max probability u.

12. The device of claim 8, wherein the CNN series learning and the RNN series learning construct an average ensemble model to aggregating CNN and RNN predictions and provide one final prediction, and an average probability of CNN and RNN is calculated according to $M ? = \frac{1}{2} (M ? + M ?)$ $? indicates text missing or illegible when filed$ (where indicates a case where only CNN is activated, indicates a case where only RNN is activated, and indicates a case where CNN and RNN are all activated).

13. The device of claim 8, wherein the ensemble learning model is composed of vectors of 128 units in a dimension of embedding the CNN-series learning_vector_data set or the RNN-series learning_vector_data set, wherein a CNN-series learning_vector or an RNN-series learning_vector is connected to a fully connected network layer to form embedding vectors of 256 units, and wherein the embedding vectors are normalized to a same value.

14. The device of claim 1, further comprising: an output part configured to output, through the network trained with the ensemble learning model, walking feature data of the user from each unit format data set obtained from the sensors so as to identify (authenticate) the user from the walking data.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0030] FIG. 1 is a block diagram of the present disclosure.

[0031] FIG. 2 is a block diagram of a preprocessor according to an embodiment of the present disclosure.

[0032] FIGS. 3 and 4 are a block diagram and a schematic diagram of an ensemble learner, according to an embodiment of the present disclosure.

[0033] FIGS. 5 and 6 are a block diagram and a schematic diagram of a few-shot learner according to an embodiment of the present disclosure.

[0034] FIG. 7 is a graph showing a comparison of learning results of an ensemble model, a CNN model, and an RNN model according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

[0035] A multi-modal few-shot learning device for user identification using a walking pattern based on deep learning ensemble according to embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. The present disclosure may make various changes and have various forms, and specific embodiments will be illustrated in drawings and will be described in detail in the present specification. However, these are not intended to limit the present disclosure to specific disclosure forms, and it should be understood that the present disclosure includes all changes, equivalents, or substitutions falling within the spirit and scope of the present disclosure. In describing the drawings, like reference numerals are used for like elements. In the accompanying drawings, the dimensions of the structures might be shown exaggerated for clarity of the disclosure or abridged for a schematic representation of the configurations of some embodiments.

[0036] Also, the terms such as “first” and “second” may be used to describe various components, but those components should not be limited by the terms. The terms are merely used to distinguish one component from other components. For example, without departing from the scope of the present disclosure, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. Meanwhile, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by those of ordinary skill in the art to which the present disclosure belongs. Unless otherwise defined, all terms, including technical and scientific terms, commonly used and defined in dictionaries are to be interpreted as is customary in the art to which the present disclosure belongs. It will be further understood that terms in common usage should also be interpreted as is customary in the related art and not in an idealized or overly formal sense unless expressly so defined herein.

[0037] FIG. 1 is a block diagram of the present disclosure. Referring to FIG. 1, a multi-modal few-shot learning device for user identification using a walking pattern may include a walking data collector 100, a preprocessor 200, an ensemble learner 300, a few-shot identifier 400 and an output part 500.

[0038] In one embodiment of the present disclosure, the walking data collector 100 collects a user's walking data from a smart insole including any one or more of a pressure sensor, an acceleration sensor, and a gyro sensor, and collects the user's walking data measured by n pressure sensors, acceleration sensors, and gyro sensors included in the smart insole. In addition, in one embodiment of the present disclosure, the n pressure sensors measure a measurement level of foot pressure of both feet of the user as 0 or 1 or 2, and a sampling rate of the smart insole may be 100 Hz.

[0039] In addition, in one embodiment of the present disclosure, the preprocessor 200 converts a series of time series walking data obtained from each of the sensors included in the smart insole into a unit data set, and the ensemble learner 300 applies an ensemble learning model that provides one final prediction by training CNN series learning and RNN series learning respectively and independently based on the unit-format data set generated by the preprocessor 200. In addition, the few-shot identifier 400 according to an embodiment of the present disclosure recognizes and excludes walking data having inadequate information (few-shot data) related to the user's walking data. Lastly, the output part 500 outputs a walking data feature for identifying the user, by using a value output from a fully connected network learned through the ensemble learner 300. That is, through the network which has learned the ensemble learning model for identifying (authenticating) the user from the walking data, the user's walking feature data is extracted from each of the unit data set obtained from the sensors. In more detail, in order to accurately extract feature data related to the user with a small amount of walking data related to the user, a function of the few-shot identifier 400 for discriminating whether the collected walking data is walking data related to the user or walking data related to a non-user and a function of the few-shot identifier 400 for learning the ensemble model in the network by inputting the walking data related to the user as an input may be performed in parallel.

[0040] FIG. 2 is a block diagram of a preprocessor according to an embodiment of the present disclosure.

[0041] Referring to FIG. 2, the preprocessor 200 may include a unit vectorizer 205, a unit minimum length vectorizer 210, and a unit vector set part 215.

[0042] In more detail, the unit vectorizer 205 vectorizes a unit format of a series of time series walking data obtained from the pressure sensor, the acceleration sensor, and the gyro sensors included in the smart insole, and the unit minimum length vectorizer 210 finds data having a minimum length from among the unit-format vectorized data for each of the pressure sensor, the acceleration sensor, and the gyro sensor, and equalizes a length of the unit-format vectorized data to a minimum length. Next, the unit vector set part 215 may construct a minimum unit format data set according to the pressure sensor, the acceleration sensor, or the gyro sensor from a series of unit format data having the minimum length.

[0043] That is, in an optimal embodiment (best mode), the unit vectorizer 205 performs a convolution operation using N pressure values and an average of a Gaussian function to vectorize the time series walking data in a unit format. The convolution operation is calculated as

[00005] $Z (t) = (X ? {(t)}^{*} y) (t) = \int_{0}^{t} X ? (τ) y (t - τ) d r$ $? indicates text missing or illegible when filed$

(where text missing or illegible when filed indicates an average of N pressure values,

[00006] $y (t) = \frac{1}{\sqrt{2 π σ}} e^{- \frac{t^{2}}{2}} ?$ $? indicates text missing or illegible when filed$

indicates N pressure values, and text missing or illegible when filed ). In addition, in a case where a sorted list for each foot is [t0, t1 . . . , ti . . . ],

[00007] $\frac{d}{d ?} z (t) = 0 and \frac{d^{2}}{{dt}^{2}} z (t) > 0$ $? indicates text missing or illegible when filed$

are applied with respect to every time t, and text missing or illegible when filed the time series walking data is defined as a vectorized time series in a unit format (where ). Since a sampling rate of the insole is 100 Hz and a standard length is defined as , a discontinuous variable is calculated according to (where indicates a unit step of two feet of every participant).

[0044] FIGS. 3 and 4 are a block diagram and a schematic diagram of an ensemble learner, according to an embodiment of the present disclosure. Referring to FIGS. 3 and 4, the ensemble learner 300 may include a CNN set constructor 305, an RNN set constructor 310, and a CNN-RNN set constructor 315.

[0045] In more detail, in order to accurately extract a user's walking feature data, big data related to the user's walking data is required. However, there is a limitation in learning CNN or RNN by collecting in advance the walking data according to the user's environmental condition. Therefore, through CNN learning and RNN learning only with a small amount of the walking data related to the user, the network is learned by applying an ensemble model, so that feature data of walking data related to the user can be derived for each environmental condition with various changes.

[0046] In more detail, according to an embodiment of the present disclosure, the ensemble learner 300 applies and learns an ensemble learning model to a fully connected network, and the CNN set constructor 305 constructs a CNN-series learning_vector_data set derived through the CNN series learning based on a minimum unit format data set generated by the preprocess 200. The RNN set constructor 310 constructs an RNN-series learning_vector_data set derived through the RNN series learning based on the minimum unit format data set. In addition, the CNN-RNN set constructor 310 constructs an average data vector set of the CNN-series learning_vector_data set and the RNN-series learning_vector_data set in a test stage to construct a final walking data set for identifying the user from the average data vector set.

[0047] In an optimal embodiment (best mode) of the present disclosure, the CNN series learning and RNN series learning are respectively and independently trained using the CNN-series learning_vector_data set and the RNN-series learning_vector_data set, and, in a test stage, an individual's softmax scores are calculated by taking an average of the softmax scores in CNN and RNN. That is, the CNN series learning or the RNN series learning is defined as text missing or illegible when filed (for tri-modal sensing) where an input is a unit step of in a standard format, an acceleration , and rotation and an output of a model is a soft max probability u. In addition, the CNN series learning and the RNN series learning aggregate CNN and RNN predictions, construct an average ensemble model to provide one final prediction, and calculate an average probability of CNN and RNN according to

[00008] $M ? = \frac{1}{2} (M ? + M ?)$ $? indicates text missing or illegible when filed$

(where text missing or illegible when filed indicates a case where only CNN is activated, indicates a case where only RNN is activated, and indicates a case where CNN and RNN are all activated). In addition, the ensemble learning model is composed of vectors of 128 units in a dimension of embedding the CNN-series learning_vector_data set or the RNN-series learning_vector_data set, and a CNN series learning_vector or a RNN series learning_vector is connected to a fully connected network layer to form embedding vectors of 256 units, and the embedding vectors are normalized to the same value.

[0048] FIGS. 5 and 6 are a block diagram and a schematic diagram of a few-shot learner according to an embodiment of the present disclosure. Referring to FIGS. 5 and 6, the few-shot identifier 400 includes a few-shot learner 405.

[0049] In embodiments of the present disclosure, the few-shot identifier 400 includes a few-shot learner that utilizes a Support Vector Machine (SVM) to exclude walking data irrelevant to the user's walking data based on an inadequate information (few-shot data) set related to the user's walking data.

[0050] Here, the few-shot learner 405 includes the inadequate information (few-shot data) set related to the user's walking data, and the inadequate information (few-shot data) set related to the user's walking data includes data which is not walking data (unknown known data) and non-user data (unknown unknown data) in user-related data. Therefore, the few-shot learner 400 utilizes the SVM to exclude inadequate information (few-shot data) sets related to the user's walking data and the non-user data (unknown unknown data).

[0051] In an optimal embodiment of the few-shot learner 405, the few-shot learner sets a vector set text missing or illegible when filed in a unit format randomly selected from the non-user data (unknown unknown data) (where ) and calculates a center of n embedding vectors from an embedding vector set generated by any one model of CNN and RNN networks (where the center of the embedding vector is

[00009] $M ? = \frac{1}{n} {.Math.}_{i = 1}^{n} V ?$ $? indicates text missing or illegible when filed$

). The center of the embedding vectors and {V.sub.i,a|1≤i≤n} are input conditions, and, in order to solve the optimization,

[00010] $\min_{o} = \frac{1}{2} .Math. ? .Math. ?, α_{i} α_{i}$ $? indicates text missing or illegible when filed$

and K( text missing or illegible when filed ) are applied to the SVM (where

[00011] $0 \leq α_{i} \leq \frac{1}{?}, {.Math.}_{i = 1}^{n} α_{i} = 1,$ $? indicates text missing or illegible when filed$

K(V,V′)= text missing or illegible when filed indicates a radial bias kernel function, indicates a Lagrange multiplier, and and V indicate hyperparameters).

[0052] FIG. 7 is a graph showing a comparison of learning results of an ensemble model, a CNN model, and an RNN model according to an embodiment of the present disclosure. Referring to FIG. 7, a distribution of ACC as a function of γ and V for CNN, RNN, and ensemble models is shown in FIG. 7. Selection of γ and V is an important condition for the overall recognition accuracy. FIG. 7 according to an optimal embodiment of the present disclosure shows a comparison of regions (light green to yellow regions) that accounts for 90% or more in area and, shows that an area of the ensemble model is wider than areas of the CNN and RNN. This indicates that the ensemble model has a weak dependence when selecting γ and V, which affects robustness of a recognition result.

[0053] In embodiments of the present disclosure, a method for the multi-modal few-shot learning device for user identification using a walking pattern based on a deep learning ensemble includes collecting a user's walking data from a smart insole including any one or more of a pressure sensor, an acceleration sensor, and a gyro sensor in operation, converting a series of time series walking data obtained from each of the sensors included in the smart insole into each unit format data set in operation, training a network based on an ensemble learning model for extracting the user's features based on the unit format data set in operation, recognizing and excluding walking data having information irrelevant to the user in operation, and extracting user feature walking data to identify the user from the walking data through the network trained with the ensemble learning model in operation.

[0054] According to the multi-modal few-shot learning device for user identification using a walking pattern based on deep learning ensemble as described above, the following effects are obtained. First, by recognizing and excluding meaningless walking information data through a neural network trained with an ensemble learning model, it is possible to extract only feature data that enables identification of a user. Second, by extracting the user's walking feature data without the meaningless information, it is possible to improve a probability of identifying the user. Third, the learning effect is enhanced by using an ensemble model network that connects CNN and RNN networks. Fourth, it is possible to extract feature walking data from walking data that enables identification of an individual through walking data in a situation where there is variation.

[0055] Although the preferred embodiments of the present disclosure have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. Therefore, the above descriptions and the attached drawings should be interpreted as exemplifying the present disclosure rather than limiting the spirit of the present disclosure.

TABLE-US-00001 [Detailed Description of Main Elements] 100: walking data collector 200: preprocessor 205: unit vectorizer 210: unit minimum length vectorizer 215: unit vector set part 300: Ensemble learner 305: CNN set constructor 310: RNN set constructor 315: CNN-RNN set constructor 400: few-shot identifier 405: few-shot learner 500: output part

MULTI-MODAL FEW-SHOT LEARNING DEVICE FOR USER IDENTIFICATION USING WALKING PATTERN BASED ON DEEP LEARNING ENSEMBLE

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06F18/254

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

A61B5/7264

HUMAN NECESSITIES

Classification Explorer

A61B5/112

HUMAN NECESSITIES

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06V40/25

PHYSICS

Classification Explorer

G06F18/24137

PHYSICS

International classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Abstract

Claims

Description