Method and system for CSI-based fine-grained gesture recognition
10592736 ยท 2020-03-17
Assignee
Inventors
Cpc classification
G06F40/232
PHYSICS
International classification
Abstract
The invention provides a method for CSI-based fine-grained gesture recognition, wherein the method comprises the following steps: determining a start point, an end point, a velocity, a direction and/or an inflection point of at least one stroke gesture in multiple dimensions according to an eigenvalue of channel state information; dividing the strokes according to the start point, the end point, the velocity, the direction and/or the inflection point of the stroke using a machine learning method and forming a stroke sequence; building a stroke decipherment model according to frequencies of the strokes appearing in natural language rules and/or scientific language rules and/or connection rules between the strokes; and dividing and recognizing the stroke sequence as a letter sequence, a radical sequence, a numeral sequence and/or a pattern sequence conforming to the natural language rules and/or the scientific language rules using the stroke decipherment model. The present invention involves recognizing strokes of characters from finger gesture, and then recovering characters from the strokes, so as to enrich types of languages that can be recognized from finger gesture and enhance recognition accuracy of gesture writing.
Claims
1. A method for CSI-based fine-grained gesture recognition, being characterized in that the method comprises steps of: determining a start point, an end point, velocity, direction and/or an inflection point of at least one stroke gesture in multiple dimensions according to an eigenvalue of channel state information; classifying strokes according to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke gesture and forming a stroke sequence; building a stroke decipherment model according to frequencies of the strokes appearing in natural language rules and/or scientific language rules and/or connection rules between the strokes; and dividing and recognizing the stroke sequence as a letter sequence, a radical sequence, a numeral sequence and/or a pattern sequence conforming to the natural language rules and/or the scientific language rules using the stroke decipherment model.
2. The method of claim 1, being characterized in that the method further comprises the following step: building an error-correction model according to a logic sequence of the natural language rules and/or the scientific language rules, wherein the error-correction model evaluates and corrects the letter sequence, the radical sequence, the numeral sequence and/or the pattern sequence according to the logic sequence, and recognizes the letter sequence, the radical sequence, the numeral sequence and/or the pattern sequence as phrases, words, numeral codes and/or patterns arranged in order.
3. The method of claim 2, being characterized in that the step of determining the direction of the stroke gesture comprises: eliminating phase shift using a linear transformation method; selecting a signal that contains no multipath effects according to an arrival time of the signal; building a CSI data matrix and determining an incidence angle of the signal; and determining the direction of the velocity of the stroke gesture according to the incidence angle of the signal, the signal path and the stroke gesture displacement; wherein the linear transformation method for eliminating the phase shift is: determining a measured phase {circumflex over ()}.sub.i of an i.sup.th sub-carrier through
4. The method of claim 3, being characterized in that the method further comprises the following steps: establishing a mapping relationship between the variation pattern of the channel state information and the strokes; determining corresponding initial strokes according to the variation pattern of the channel state information; determining connection relationship between adjacent initial strokes according to start points, end points, velocities, directions and/or the inflection points of the initial strokes, thereby determining the true strokes and types of the true strokes according to the connection relationship.
5. The method of claim 4, being characterized in that the method comprises the following steps: determining a multi-dimensional stroke track of the strokes according to the start point, the end point, the stroke time, the velocity, the direction and/or the inflection points of the stroke gesture; and determining the true strokes and types thereof according to a matching level between the stroke tracks and the initial strokes.
6. The method of claim 5, being characterized in that: in case the matching level between the stroke track and the initial strokes is not smaller than a predetermined threshold, determining the initial strokes as the true strokes; or in case the matching level between the stroke track and the initial strokes is smaller than the predetermined threshold, performing stroke analysis according to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke gesture using a machine learning method, then determining the true strokes, and classifying the strokes.
7. The method of claim 6, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
8. The method of claim 2, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
9. The method of claim 3, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
10. The method of claim 4, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
11. The method of claim 5, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
12. The method of claim 1, being characterized in that the step of extracting the eigenvalue of the channel state information comprises: collecting a first channel state information of at least one sub-carrier; processing the first channel state information into a second channel state information by means of denoising and/or slicing; performing multi-dimensional measurement on the second channel state information; calibrating the second channel state information; and extracting an eigenvalue of variation in the second channel state information.
13. The method of claim 12, being characterized in that, the step of extracting the eigenvalue of variation in the second channel state information comprises: performing inverse Fourier transform on at least one said second channel state information; and extracting an eigenvalue of at least one stroke gesture using a discrete wavelet transform method.
14. A system for CSI-based fine-grained gesture recognition, being characterized in that the system at least comprises: a signal acquisition module (10), an eigenvalue extraction module (30), a stroke classifying module (40) and a letter recognizing module (50), wherein the signal acquisition module (10) collects channel state information (CSI), the eigenvalue extraction module (30) extracts an eigenvalue of the channel state information; the stroke classifying module (40) determines a start point, an end point, a velocity, a direction and/or an inflection point of at least one stroke gesture in multiple dimensions according to the eigenvalue of channel state information, and the stroke classifying module (40) classifies strokes according to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke gesture using a machine learning method and forms a stroke sequence; the letter recognizing module (50) has a stroke decipherment model built according to frequencies of the strokes appearing in natural language rules and/or scientific language rules and/or connection rules between the strokes; and the letter recognizing module (50) determines a start point, an end point, a velocity, a direction and/or an inflection point of at least one stroke gesture in multiple dimensions according to the eigenvalue of channel state information, and divides and recognizes the stroke sequence as a letter sequence, a radical sequence, a numeral sequence and/or a pattern sequence conforming to the natural language rules and/or the scientific language rules using the stroke decipherment model.
15. The system of claim 14, being characterized in that the system further comprises an error-correction module (60), wherein the error-correction module (60) has an error-correction model built according to logic sequences of the natural language rules and/or the scientific language rules, wherein the error-correction module (60) evaluates and corrects the letter sequences, radical sequences, numeral sequences and/or pattern sequences according to the logic sequences, and recognizes the letter sequences, radical sequences, numeral sequences and/or pattern sequences as phrase, words, numeral codes and/or patterns that are arranged in order.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTIONS OF THE INVENTION
(10) For further illustrating the means and functions by which the present invention achieves the certain objectives, the following description, in conjunction with the accompanying drawings and preferred embodiments, is set forth as below to illustrate the implement, structure, features and effects of the subject matter of the present invention.
(11) In the present invention, channel state information (CSI) is information of the physical layer in the sub-carrier level, referring to the channel properties of a communication link. Channel state information (CSI) describes how a signal coming from a transmitting end arrives at a receiving end through air propagation, and reflects the attenuation coefficient of the signal along every transmission path.
(12) The present invention is applicable to characters of various natural languages, such as Simplified Chinese, Traditional Chinese, English and Latin and the like, and also applicable to symbolic characters such as punctuation marks, geometric figures, mathematical formulas and chemical formulas that contain numbers and symbols.
(13) In the present invention, the term stroke refers to strokes of characters, numeral stroke and pattern stroke.
(14) In the present invention, a word refers to a single character while radicals comprise radical components forming a character.
Embodiment 1
(15) Referring to
(16) S1: collecting channel state information (CSI);
(17) S2: extracting an eigenvalue of the channel state information;
(18) S3: determining a type of a stroke according to connection between the eigenvalue and stroke gesture and building a stroke sequence; using a stroke decipherment model to recognize the stroke sequence as a letter sequence, radical sequence, numeral sequence and/or pattern sequence arranged according to logic rules; building an error-correction model according to natural language rules and/or scientific language rules to calibrate at least one letter sequence, radical sequence, numeral sequence and/or pattern sequence.
(19) The steps of the disclosed method for CSI-based fine-grained gesture recognition are described in detail below.
(20) S1: collecting channel state information (CSI).
(21) The present invention uses the existing wireless network and equipment to recognize hand-written characters, numerals and patterns, thereby providing the function of smart text input.
(22) In the present invention, the wireless receiving equipment comprises a wireless signal receiving device. The wireless signals may be signals transmitted over constant waves (e.g., RFID signals), fast-changing transmission signals (e.g., TV signals) or signals transmitted intermittently (e.g., Wi-Fi signals).
(23) The example below explains the present invention with Wi-Fi signals. Preferably, a wireless network interface card is used to receive Wi-Fi signals. By modifying the kernel, the present invention can use low-cost, commercial wireless network interface card to acquire channel state information (CSI) in the form of sub-carriers transmitted through the orthogonal frequency-division carrier multiplexing technology.
(24) Preferably, collecting channel state information comprises steps of S11 through S13.
(25) S11: collecting a first channel state information of at least one sub-carrier.
(26) For example, a wireless access point is taken as a transmitting end of Wi-Fi signals. The transmitting end transmits Wi-Fi signals to a receiving end equipped with an Intel iwl 5300 Card. Based on the orthogonal frequency-division carrier multiplexing platform of the physical layer of the wireless network, with the use of the multipath effect in complex indoor environment, a first channel state information of plural sub-carriers as exclusive identification can be acquired at the receiving end.
(27) In the present invention, the first channel state information refers to the initially collected channel state information that has not received any treatment of any technical means.
(28) Preferably, a wireless router and the wireless network interface card are both placed on a table with a distance therebetween kept between 1 and 2 meters. The closer the wireless router and the wireless network interface card are, the more accurate the resulting recognition of stroke gesture is. During writing, the user shall keep his/her hand writing in the line between the wireless router and the wireless network interface card, i.e., in the LOS (Line-Of-Sight) path of propagation of wireless signals. The exact position of the user in the path has no significant influence on the accuracy of stroke gesture determination.
(29) Preferably, the channel state information of at least one sub-carrier is demodulated into the first channel state information.
(30) During transmission of data pack in the orthogonal frequency-division carrier multiplexing system, wireless signals are demodulated and directly output to a decoder. The decoder decodes the wireless signals to obtain demodulated first channel state information of each sub-carrier, thereby calculating the amplitude and phase of signals corresponding to the first channel state information.
(31) S12: performing denoising on the first channel state information.
(32) Denoising channel state information may be achieved by denoising using a filter and/or using a matrix analysis method.
(33) The present invention uses a low-pass Butterworth filter and the robust PCA method to perform denoising on the amplitude of the first channel state information. Since the Butterworth low-pass filter does not greatly distort the phase information of the original signal, and the bandpass has relatively large flat amplitude response, the signal phase variation caused by the stroke gesture can be reserved to a maximum extent. That is the reason a Butterworth low-pass filter is used for filtering the first channel state information. By setting reasonable parameters for the Butterworth low-pass filter, most high-frequency ambient noise in the signals can be removed.
(34) If the filtered signal still contains some noise, the robust PCA method may be used for further denoising. As compared to traditional, classic PCA analysis, the robust PCA method implemented by the present invention is improved and can remove noise with enhanced efficiency while keeping effective information. The first channel state information denoised by the low-pass filter and the robust PCA method contains less noise, so the first channel state information is more accurate.
(35) S13: slicing the denoised first channel state information using an adjustable threshold, so as to generate second channel state information.
(36) Slicing is favorable to extraction of the eigenvalue of channel state information, thereby determining amplitude and phase of the second channel state information more clearly. In still environment, the second channel state information has a stable waveform, and movements of stroke gesture can cause significant changes in amplitude and phase of channel state information. The present invention uses slicing based on valuation of window scalability to acquire sections of the second channel state information corresponding to the process of writing a single stroke. The initial value of the threshold is set as:
(37)
wherein, .sub. and .sub. are the mean value and standard deviation of the gesture respectively in the stroke-writing state, .sub.s and .sub.s are the mean value and standard deviation respectively when the gesture is in its still state.
(38) S2: extracting an eigenvalue of the channel state information.
(39) In the process of performing stroke gesture recognition, it is impossible to recognize the slices of the stroke gesture directly, and an eigenvalue has to be extracted for recognition.
(40) Preferably, multi-dimensional measurement and calibration are performed on the second channel state information of the slices of the stroke gesture.
(41) S21: performing multi-dimensional measurement on the second channel state information.
(42) For example, three-dimensional measurement is performed on the second channel state information. Preferably, the channel state information is a three-dimensional matrix of NtxNryN (Ntx representing the number of transmitting antennas, Nry representing the number of reveiving antennas, and N in the third dimension representing the number of sub-carriers).
(43) S22: calibrating the second channel state information.
(44) Calibration is performed on the distorted channel state data to remove channel state data that lack for dimensions or have obviously incorrect data, thereby calibrating all data of the second channel state information and keeping the accurate channel state data.
(45) S23: extracting the eigenvalue of variation of the second channel state information.
(46) Preferably, the steps of extracting the eigenvalue of variation of the second channel state information comprise:
(47) performing inverse Fourier transform on at least one said second channel state information, and extracting the eigenvalue of at least one stroke gesture using the discrete wavelet transform method.
(48) Particularly, inverse Fourier transform is performed on the calibrated second channel state information of the stroke gesture slices, and discrete wavelet transform is performed on the sectioned channel state information to extract the variation pattern of the channel state information and its eigenvalue of the stroke gesture.
(49) According to a preferred embodiment of the present invention, connection relationship between stroke gesture and the variation pattern of channel state information is established. The present invention uses the random forest algorithm to perform training for the variation pattern of channel state information of known stroke gesture samples, so as to obtain a stroke decipherment model. The random forest algorithm is a multi-class classification algorithm, which can classify eigenvalues of channel states, corresponding to 7 predefined strokes. The class labels corresponding to the 7 strokes are defined as Y=[1,7], and N=500 decision trees T={T.sub.1, T.sub.2, . . . , T.sub.N} are selected to build the random forest, as shown in
(50) Section x in the test CSI represents one of the strokes, the random forest model asks each decision tree T.sub.iT to vote for the property labels yY of Section X of the test CSI. After selecting all the N trees, the random forest classifies the test CSI Section x into one of the strokes by giving class prediction C.sub.rf.sup.N(x) of majority voting of the 7 class labels of all the trees. Class prediction C.sub.rf.sup.N(x) is calculated as
C.sub.rf.sup.N(x)=Majority{C.sub.i(x)}.sub.1.sup.N;
wherein C.sub.i(x) is prediction of the tree T.sub.i. In addition to the predicted class label of the written stroke, the voting probability of the 7 strokes is obtained through the random forest method. Input the voting probability of the stroke class label y of the CSI Section x as C.sub.i(x)=y, and divide it by the total amount of N, calculated as
(51)
(52) For channel state information of an unknown hand-written stroke, the present invention first acquires the variation patterns of channel state information of the written stroke, and inputs it to the trained stroke decipherment model to obtain a classification result, thereby achieving determination of the handwritten letter stroke.
(53) According to a preferred embodiment, the present invention can also determine feature data of stroke gesture according to the mean value, standard deviation, mean absolute deviation and/or maximum of the second channel state information. The class of the stroke gesture can be determined according to the feature data.
(54) The mean value can better reflect the central tendency of channel state data. The standard deviation can measure the dispersion of channel state data. The mean absolute deviation can determine the mean absolute deviation of the sequence of channel state data. The maximum can reflect the level of the extreme value. Through discrete wavelet transform (DWT), a dispersed CSI amplitude segment x having a length of N in l.sup.2(Z) can be represented using the approximate value g and the wavelength h:
(55)
wherein j.sub.0 is an arbitrary value,
(56)
Discrete wavelet transform (DWT) decomposes the original CSI signal into approximation coefficients and detail coefficients. The present invention uses approximation coefficients to perform feature extraction because they describe the shape of the signal maintaining the CSI variation pattern. Since basic wavelet functions {g.sub.i.sup.(j.sup.
(57)
(58) Preferably, the present invention determines the start point, end point, direction, velocity, acceleration and/or inflection points of the stroke gesture according to the eigenvalue of the stroke gesture. Where the velocity of the stroke gesture is predetermined as uniform, the present invention may omit collection and calculation for velocity and acceleration of the stroke gesture.
(59) The present invention determines the direction, velocity and inflection points of the stroke gesture through the following process.
(60) The present invention calculates the velocity of the stroke gesture using the eigenvalue of the channel state information in the way shown below.
(61) Based on analysis of the collected and data-processed second channel state data, the energy of the channel state data comprises static CSI energy and dynamic CSI energy.
(62)
(63) The time domain information of the channel state information is converted into frequency domain information through the inverse Fourier transform, and the velocity of the stroke gesture is determined according to the relationship between frequency and wavelength as:
(64)
wherein, represents the magnitude of the wavelength, and f represents the frequency of the sub-carrier. As shown in
(65) The velocity variation and acceleration of the stroke gesture is determined according to the velocity and time of the stroke gesture. According to the velocity variation of the stroke gesture, it is determined whether the stroke gesture turns or stops at the inflection points.
(66) The present invention determines the direction of stroke gesture in the way described below. Through determining the direction of stroke gesture, the present invention records changes in the direction of stroke gesture, so as to support determination of inflection points.
(67) The direction of the velocity of stroke gesture is determined by:
(68) eliminating phase shift using a linear transformation method;
(69) selecting a signal that contains no multipath effects according to an arrival time of the signal;
(70) building a CSI data matrix and determining an incidence angle of the signal: and
(71) determining a direction of the velocity of the stroke gesture according to the incidence angle of the signal, signal path and stroke gesture displacement.
(72) Wherein the method of using the linear transformation method to eliminate phase shift includes:
(73) determining a measured phase {circumflex over ()}.sub.i of an sub-carrier through
(74)
Therein, .sub.i is a true phase, and is a clock skew at a receiving end of the signal with respect to a transmitting end of the signal, in which the phase shift generated correspondingly is
(75)
is a position constant phase shift, Z is measurement noise, K.sub.i is a serial number of the i.sup.th sub-carrier, and N is a magnitude of FFT. In the present invention, the serial number of the sub-carrier ranges between 28 and 28. Due to the existence of unknown phase shift, it is impossible to figure out the true phase shift only relying on a commercial interface card.
(76) For eliminating the effects of random phase shift, linear transformation is performed on the measured phase.
(77) The first step is to define
(78)
wherein the frequency of the sub-carrier is symmetric, i.e.,
(79)
b is simplified into
(80)
and a linear term ak.sub.i+b is subtracted from the measured phase {circumflex over ()}.sub.i, so as to eliminate phase shift caused by and , thereby obtaining a linear combination of the true phase containing no noise Z (ignoring the measurement noise Z), namely
(81)
After unknown phase shift is removed, the stably distributed phase can be obtained and used as an effective characteristic.
(82) In addition to phase shift, the multipath effects on CSI data have to be removed.
(83) In the present invention, signal variation caused by movement process of stroke gesture is extracted from the signal received at the receiving end. This signal variation is also called as stroke gesture path signal. The stroke gesture path signal has the characteristic that a wireless signal reflected by stroke gesture is directly received by the receiving end without any more reflection so the wireless signal arrives in the minimal time. The present invention performs determination according to the receiving time of wireless signals, and selects the minimal arrival time of the wireless signal, thereby eliminating multipath effects and obtaining the stroke gesture path signal.
(84) An antenna matrix array is built using the time sequence data of the 30 sub-carriers in the CSI data.
(85)
The MUSIC algorithm is used to obtain the incidence angle between stroke gesture and signals:
(86)
(87) The present invention determines the direction of the stroke gesture according to the incidence angles among the wireless signal transmitting end, at least two wireless signal receiving ends, stroke gesture and signals.
(88) As shown in
(89) Preferably, writing strokes corresponding to the stroke gesture is determined by comprehensively referring to the time, velocity, direction and/or inflection points of the stroke gesture.
(90) For example, the English letter L requires two strokes, | and , so the stroke movement has one inflection point. According to the velocity variation of the two strokes gesture when the inflection point appears and the time interval between the end point of the first stroke and the start point of the second stroke, it is determined that the two strokes are connected, but not two separate strokes.
(91) Preferably, for basic stroke types of characters, numerals and patterns, the waveform and inflection points of the corresponding basic stroke samples are classified and stored. The advantage of storing these basic stroke samples is that the number of samples to be stored is reduced. For example, many of the 26 English letters contain the stroke . For them, the present invention only needs to save one stroke , and this and other strokes can be combined according to natural or scientific language rules to form letters, words, regular patterns or formulas. The present invention may form more letters, words and numerals without saving the waveform samples for all characters. This not only saves storage space, but also improves recognition rate of strokes and characters.
(92) S3: determining the type of stroke according to the mapping relationship between the eigenvalue and the stroke gesture, and building a stroke sequence; using the stroke decipherment model to recognize the stroke sequence as a letter sequence, a radical sequence, a numeral sequence and/or a pattern sequence arranged according to logic rules.
(93) Preferably, a mapping list between stroke gesture waveforms and strokes is established according to the CSI data sample of the stroke gesture, so as to establish the mapping relationship between the stroke gesture waveforms and the strokes. Thereby, the type of the stroke can be determined according to the eigenvalues of the waveforms of the stroke gesture.
(94) Preferably, the type and sequence of the stroke are determined using machine learning method according to the connection relationship between the eigenvalues and the stroke gesture.
(95) For example, there are 26 letters in English, but all these letters are formed by only 7 kinds of strokes. The 7 strokes forming English letters written by a user at the writing position are collected. The user writes each stroke with his/her finger for at least 50 times. The channel state information corresponding to every stroke is collected, and the second channel state information is obtained by denoising and slicing the channel state information and extracting its eigenvalue, then the second channel state information is used for performing sample training by the random forest method so as to obtain a stroke decipherment model. The stroke decipherment model is used to classify the stroke gesture.
(96) Preferably, the present invention may further classify stroke gesture of written letters according to the HMM (Hidden Markov Model) algorithm. The HMM method enables the machine to quickly recognize strokes, thereby improving efficiency and accuracy of stroke recognition. Preferably, the sequence of the strokes is determined according to the input sequence of the stroke gesture waveforms.
(97) Typically, the HMM model is composed of three parts and defined as =(,A,B). The three parameters in correspond to the initial distribution probability of hidden states, the transition probability matrix of hidden states, and the arrangement probability matrix, respectively. In the present invention, the initial distribution probability of hidden states represents the prior probability of different strokes, namely .sub.i=P(s.sub.i), i1,8, and by counting the frequency of the first stroke in the character set, CH={A, B, . . . Z} can be obtained. Every element in the transition probability matrix A.sub.i,j represents the probability of the stroke s.sub.i being written when the previous stroke is s.sub.j, namely A.sub.i,j=P(s.sub.i|s.sub.j), i,j[1,8]. Similarly, the probability matrix may be obtained by compile statistics of all the characters. In the arrangement probability matrix, every element represents the probability of getting a certain observation (CSI section) B.sub.k,i when the hidden state is s.sub.i, namely B.sub.k,i=P(O.sub.k|s.sub.i), k[1,n], i[1,8]. Assuming that all the parameters of the HMM are known, for the foregoing observation sequence O=o.sub.1 o.sub.2 . . . o.sub.n, the corresponding stroke sequence may be estimated by calculating argmax.sub.sP(S=s.sub.1 s.sub.2 . . . s.sub.n|o.sub.1 o.sub.2 . . . o.sub.n,).
(98) Since 8 strokes of a given signal section get voting probability, P(s.sub.i|o.sub.k)={umlaut over (P)}.sub.rf (o.sub.k,i), i[1,8] uses the random forest method in the previous step. The present invention modifies the HMM model and uses it to get all feasible stroke sequences SsCH that have confidence probability but not the optimal S. According to Bayes' Theorem, the probability P(o.sub.k|s.sub.i) is estimated as
(99)
wherein P(o.sub.k) and P(s.sub.i) represent a given observation value and the prior probability of a certain passage respectively. Hence, the observation sequence Oo.sub.1 o.sub.2 . . . o.sub.n is given. The probability of generating these observations for some feasible sequence of hidden states may be calculated as
(100)
By substituting P(O.sub.i|s.sub.i) in the above equation into Equation 1, an equation can be obtained as Equation 2
(101)
(102) For calculating the confidence level of every stroke sequence of the given observation value, the present invention summarizes the capacity probability of every S calculated using the foregoing equations, and uses the softmax function to normalize the probability of every S as
(103)
(104) Although in the foregoing derivation the prior probability P(o.sub.k), k[1,n] is unknown, fortunately, in the step of normalization in Equation 4, all the unknown probability can be removed. After the normalized probability of all the feasible stroke sequences is obtained, namely the confidence level probability of all candidate characters in the CSI section sequence, the character can be determined by selecting the one having the greatest normalized probability.
(105) For example, the input sequence of the stroke gesture is / and V, so the recognized stroke sequence is first / and then V.
(106) Preferably, according to the type, velocity, direction, inflection points and/or sequence of the stroke, at least one stroke sequence corresponding to individual letters, individual radicals, individual numerals and individual patterns is built.
(107) Every letter, radical or pattern is formed by at least one stroke. Therefore, according to the type and sequence of strokes, the connection relationship between strokes can be preliminarily determined, thereby forming the stroke sequence corresponding to individual letters, individual radicals, individual numerals, and individual patterns.
(108) S31: building a stroke decipherment model according to the appearing frequency in natural languages, direction, velocity, and inflection points of the strokes or connection rules between the strokes.
(109) A stroke decipherment model that can recognize strokes into individual letters, individual words and patterns is built according to the stroke sequence, appearing frequency, direction, velocity, and inflection points of the strokes or connection rules between the strokes. Preferably, the stroke decipherment model comprehensively estimates the connection relationship between the strokes according to the sequence, direction, velocity, inflection points of the strokes, and determines the connection relationship between the strokes according to connection rules of strokes, thereby forming the stroke sequence into complete letters, words or patterns. Preferably, the stroke decipherment model determines the track of the strokes according to the velocity, direction and inflection points of the strokes, and determines stroke connection relationship according to the track of the strokes. Some letters have similar strokes. For example, D and b are composed of similar strokes, namely one vertical stroke and a semicircular stroke. The only difference lays on the magnitude of the semicircular part. Thus, the stroke decipherment model determines the letter to be recognized is a or a according to the relative position between start point of the track of the semicircular stroke and the track of the vertical stroke.
(110) S32: building an error-correction model according to natural language rules and/or scientific language rules to calibrate at least one letter sequence, word sequence, numeral sequence and/or pattern sequence.
(111) Preferably, an error-correction model is built. The error-correction model organizes the letter sequence, word sequence, numeral sequence and/or pattern sequence into words, phrases or formula patterns, and performs calibration.
(112) The letter sequence, word sequence, numeral sequence and/or pattern sequence output by the stroke decipherment model may have recognition errors. For example, a letter may be recognized as another letter having similar strokes.
(113) The error-correction model combines the letter sequence, word sequence, numeral sequence and/or pattern sequence according to natural language rules and/or scientific language rules, so as to combine letter sequences into words, combine word sequences into phrases, and combine numerals and patterns into formulas.
(114) Preferably, the error-correction model calibrates the letters, words, numerals, and patterns according to natural language rules and/or scientific language rules. Preferably, the error-correction model calibrates the letters, words, numerals, and patterns according to natural language rules, scientific language rules and logic rules.
(115) For example, when a lowercase a and a lowercase d are written using similar movements, the stroke decipherment model may be only able to differentiates according to velocity and direction. However, the language model can provide the information that the character sequence and has greater probability than the character sequence dnd, because and is a frequently used English word, while dnd has no meaning in English. The error-correction model may be used to correct recognition of the corresponding characters.
(116) The error-correction model based on natural language rules may provide information about word sequences. For example, the word sequence be my guest has greater probability than the word sequence be my quest. The two sequences are only different for their letters g and q, and it is hard to tell them apart. The appearing frequency of word sequences can positively affect recognition of correct characters and words.
(117) After the error-correction model is built successively, samples of stroke gesture waveforms and samples of stroke gesture movement information are used to train the stroke decipherment model and the error-correction model, thereby improving efficiency of stroke gesture recognition.
(118) According to a preferred embodiment of the present invention, the disclosed method further comprises the following steps: establishing a mapping relationship between a variation pattern of the channel state information and the strokes. According to the variation pattern of the channel state information, the corresponding initial stroke is determined. According to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke, the connection relationship between successive initial strokes is determined, thereby determining the true strokes and types of the true strokes according to the connection relationship. This process can advantageously ensure accuracy of stroke recognition and reduce error rate.
(119) Preferably, the disclosed method further comprises the following step: determining a multi-dimensional stroke track of the stroke according to the start point, end point, stroke time, velocity, direction and/or inflection points of the stroke gesture.
(120) Preferably, the present invention encodes strokes according to the start point, end point, velocity, direction and inflection points of stroke gesture for stroke recognition.
(121) As shown in
(122) Preferably, the present invention encodes the strokes according to the acceleration, direction and inflection points of the stroke gesture for stroke recognition.
(123) The letter A is herein taken as an example. The component sequence of strokes of the letter A is expressed as: s: c-ur, s: c-dr, s: c-ul, s: c-r. Through calculation of velocity variation of the stroke gesture made by the antenna matrix, the acceleration of the stroke gesture is obtained. The acceleration data provide important reference that improves accuracy of hand-writing recognition. For example, when a user writes the letter A in the air, the acceleration data can be used to accurately measure the velocity variation and direction variation of the stroke gesture, thereby determining the stroke track of the stroke gesture in the three-dimensional space.
(124) Preferably, the present invention uses the normalization method to process stroke tracks, so as to normalize non-standard stroke tracks into standard stroke tracks, thereby facilitating stroke recognition.
(125) Preferably, true strokes and their stroke types are determined according to matching level between the stroke tracks and the initial strokes. This step is advantageous because if the stroke tracks highly match the initial strokes, the step of performing recognition according to the characteristics of the stroke such as velocity and direction can be simplified, thereby improving efficiency of recognition and classification of strokes, without undermining accuracy of stroke recognition.
(126) Preferably, the disclosed method further comprises the following step: where the matching level between the stroke track and the initial strokes is not smaller than a predetermined threshold, determining the initial strokes as the true strokes; where the matching level between the stroke track and the initial strokes is smaller than the predetermined threshold, performing stroke analysis according to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke gesture using a machine learning method, determining the true strokes, and classifying the strokes. This step is advantageous because it selects a stroke recognition method that is appropriate to current needs, so as to not only improve accuracy of stroke recognition, but also save time for stroke recognition.
(127) S4: performing overall training for letter recognition and error-correction recognition using stroke gesture samples, so that the present invention can rapidly recognize the character information of stroke gesture.
(128) Preferably, the error-correction model performs training using machine learning, thereby fast calibrating the recognized characters, numerals or patterns.
Embodiment 2
(129) This embodiment is further improvement of Embodiment 1, and those details having been described previously are not repeated herein.
(130) In the present embodiment, the word HI is taken as an example for explaining recognition of stroke gesture.
(131) In the word HI, the letter H is composed of strokes |, , |, and the letter I is formed by the stroke |. Each of the strokes takes one second for its writing. The time interval between two successive strokes of the same letter is 2 seconds. The time interval between the two letters is 5 seconds. The user keeps the gesture still during these intervals.
(132) After receiving all channel state information of the written word HI, the wireless network interface card treats the CSI with denoising and slicing, and uses the discrete wavelet transform to extract characteristics of channel state information corresponding to every stroke, thereby forming the variation patterns of channel state information corresponding to each stroke. For the word HI, 3 patterns corresponding to the letter H and 1 pattern corresponding to the letter I are collected. All the patterns are input into the stroke decipherment model in sequence for classification, thereby obtaining the possible probability of every pattern corresponds to every stroke.
(133) The present invention compiles statistics of the appearance frequency of every stroke and the connection rules between strokes of English letter, and builds the stroke decipherment model according to the Hidden Markov Model (HMM), so as to achieve conversion between strokes and letters, and obtain the first two most possible letters among the letters corresponding to the series of strokes for writing the letter as the possibly determined correct letters. For example, the letter H may be determined as the letter and the letter and the letter I may be determined as the letter I and the letter C. The letters are then combined to obtain possible letter sequences. In this case, the possible letter combinations are HI, GI, HC and GC. Then the letter combinations are compared to words existing in the dictionary, thereby determining and recognizing that the word written by the user is HI.
Embodiment 3
(134) This embodiment is further improvement of Embodiments 1 and 2, and those details having been described previously is not repeated herein.
(135) The present embodiment provides a system for CSI-based fine-grained gesture recognition. It is characterized in that the system at least comprises: a signal acquisition module 10, an eigenvalue extraction module 30, a stroke classifying module 40 and a letter recognizing module 50. The signal acquisition module 10 collects channel state information CSI. The eigenvalue extraction module 30 extracts an eigenvalue of the channel state information. The stroke classifying module 40 according to the eigenvalue of channel state information determines a start point, an end point, a velocity, a direction and/or an inflection point of at least one stroke gesture in multiple dimensions, and the stroke classifying module 40 uses machine learning to classify strokes according to the start point, the end point, the velocity, the direction and/or the inflection points of the stroke gesture, thereby forming a stroke sequence. The letter recognizing module 50 has a stroke decipherment model built according to frequencies of the strokes appearing in natural language rules and/or scientific language rules and/or connection rules between the strokes. The letter recognizing module 50 according to the eigenvalue of channel state information determines a start point, an end point, a velocity, a direction and/or an inflection point of at least one stroke gesture in multiple dimensions, and uses the stroke decipherment model to divide and recognize the stroke sequence as a letter sequence, a radical sequence, a numeral sequence and/or a pattern sequence conforming to the natural language rules and/or the scientific language rules.
(136) Preferably, the system further comprises an error-correction module 60. The error-correction module 60 has an error-correction model built according to logic sequence of natural language rules and/or scientific language rules. The error-correction module 60 according to the logic sequence evaluates and corrects the letter sequence, the radical sequence, the numeral sequence and/or the pattern sequence, and recognize the letter sequence, the radical sequence, the numeral sequence and/or the pattern sequence into phrases, words, numeral codes and/or patterns arranged in order.
(137) Preferably, the disclosed system for CSI-based fine-grained gesture recognition further comprises a signal processing module 20.
(138) As shown in
(139) Preferably, the signal processing module 20 performs denoising and slicing on the channel state information collected by the signal acquisition module 10. The signal processing module 20 sends the second channel state information after denoising and slicing to the eigenvalue extraction module 30. The eigenvalue extraction module 30 performs multi-dimensional measurement and signal calibration on the second channel state information, and performs eigenvalue extraction. Preferably, the eigenvalue extraction module 30 extracts the eigenvalue of the waveform of the stroke gesture. Preferably, the eigenvalue comprises the mean value, standard deviation, mean absolute deviation and/or maximum of the waveform of the stroke gesture. Preferably, the eigenvalue further comprises the incidence angles where at least two receiving ends receive wireless signals.
(140) The stroke classifying module 40 has a stroke decipherment model. The stroke classifying module 40, according to the information transmitted by the eigenvalue extraction module 30, recognizes the type, velocity, direction and inflection points of the stroke gesture. Preferably, the stroke classifying module 40 determines the type of the stroke and stroke sequence according to the velocity, direction and inflection points of the recognized stroke gesture.
(141) The letter recognizing module 50, according to the stroke sequence, connection rules of the stroke and the appearance frequency of the strokes, combines the strokes recognized by the stroke classifying module 40 and recognizes the combinations as letter sequences, word sequences, numeral sequences and/or pattern sequences.
(142) The error-correction module 60 has an error-correction model. The error-correction module 60 calibrates the letter sequence, word sequence, numeral sequence and/or pattern sequence transmitted by the letter recognizing module 50, and according to natural language rules and/or scientific language rules, recognizes the letter sequence into at least one word, recognizes the word sequence into at least one phrase, recognizes the numeral sequence into at least one number, and recognizes the pattern sequence into at least one pattern. Preferably, the error-correction model 60, according to the mixed sequence of numerals and patterns, recognize the input as a mathematical formula or a chemical formula.
(143) The disclosed system for CSI-based fine-grained gesture recognizes strokes by collecting stroke gesture information, and then recognizes character information according to stroke sequences. The present invention is advantageous because it helps to enrich the languages and patterns that can be recognized, and outputs diverse language characters or numeral patterns. The present invention recognizes characters based on the velocity, direction and inflection points of stroke gesture, so as to improve recognition accuracy and decrease miss rate. The present invention supports a recognition rate up to 98% or greater.
(144) For example, the stroke decipherment module recognizes the stroke gesture input by the user as letters, numerals and chemical pattern sequences, such as CH.sub.3, , and
(145) ##STR00001##
The error-correction model, based on chemical language rules, organizes the letters, numerals and chemical pattern sequences according to the appearance frequency and connection rules of chemical symbols into:
(146) ##STR00002##
(147) Preferably, the user can predetermine the types of input languages. The stroke decipherment model according to language rules, stroke connection rules and language logic of the selected languages organizes the input into corresponding letters, radicals, numerals or patterns. The error-correction model according to the language logic and rules of the selected language corrects the information output by the stroke decipherment model output so as to recognize it into words, phrases, numeral codes, patterns or formulas conforming to the specific language logic.
(148) The disclosed system for CSI-based fine-grained gesture recognition may be completely formed by hardware. The signal acquisition module 10 comprises antennas and an interface card modeled Intel iwl 5300. The signal processing module 20 comprises one or more of a numeral signal processor, a signal processing IC and a DSP chip. The eigenvalue extraction module 30 is an application-specific integrated chip for analyzing eigenvalues of channel state information. The stroke classifying module 40 is an application-specific integrated chip for organizing strokes into stroke sequences. The letter recognizing module 50 is an application-specific integrated chip for organizing stroke sequences into language characters. The error-correction module 60 is an application-specific integrated chip for correcting errors in characters.
(149) According to a preferred embodiment of the present invention, the eigenvalue extraction module 30 comprises an application-specific integrated chip and a microprocessor. The stroke classifying module 40 comprises an application-specific integrated chip and a microprocessor. The letter recognizing module 50 comprises an application-specific integrated chip and a microprocessor. The error-correction module 60 comprises an application-specific integrated chip and a microprocessor.
(150) The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims.