DEVICE, METHOD AND PROGRAM FOR DETECTING POSITIONS OF PARTIAL CHARACTER STRINGS

20170302442 · 2017-10-19

Assignee

Inventors

Cpc classification

International classification

Abstract

The positions in a text in which partial character strings in a pattern appear are efficiently detected. A partial-character-string position detecting device 1 takes inputs of a secret text [t] of a text t, a secrete text <p> of a pattern p, a secret text <c> of a vector c, and a secret text <E> of a matrix E and outputs a secret text <H> of a matrix H. A first matrix generating part 20 generates a secret text <F> of a matrix F, in which F[i][j]=E[i][j+i mod n+1] (where it is assumed that E[i][n]=custom-characterc[i]). A second matrix generating part 30 generates a secret text <F′> of a matrix F′, in which F[i][j]=1 is set if c[i]=0 or if c[i]=1 and F[k][j]=1 for every k that is successively c[k]=1, otherwise F[i][j]=0 is set, where k=i, . . . , n−1. A third matrix generating part 40 computes <H[i][j]>=<F[i][j−i mod n+1]>custom-character<c[i]>custom-character> to generate the secrete text <H>.

Claims

1. A partial-character-string position detecting device taking inputs of a secret text <t> of a text t having a length of n, a secret text <p> of a pattern p having a length of m, a secret text <c> of a vector c having a length of m, and a secret text <E> of a matrix E of m rows and n columns and outputting a secret text <H> of a matrix H of m rows and n columns, the partial-character-string position detecting device comprising: a first matrix generating part generating a secret text <F> of a matrix F of m rows and (n+1) columns in which F[i][j]=E[i][j+i mod n+1], where it is assumed that E[i][n]=custom-characterc[i]; a second matrix generating part generating a secret text <F′> of a matrix F′ of m rows and (n+1) columns, wherein, in the secret text <F′>, F[i][j]=1 is set if c[i]=0 or if c[i]=1 and F[k][j]=1 for every k that is successively c[k]=1 when k is incremented by 1 from i, otherwise, F[i][j]=0 is set; and a third matrix generating part computing <H[i][j]>=<F[i][j−i mod n+1]>custom-character<c[i]>custom-charactercustom-character<c[i−1]> to generate the secrete text <H>; wherein p[i] is an i-th element of the pattern p, t[i] is an i-th element of the text t, c[i] is an i-th element of the vector c, E[i][j] is an element in an i-th row of a j-th column of the matrix E, H[i][j] is an element in an i-th row of a j-th column of the matrix H; in the vector c, c[i]=1 is set if p[i] is not a limitless gap representing a character string having an arbitrary length, otherwise, c[i]=0 is set; in the matrix E, E[i][j]=1 is set if c[i]=0 or p[i]=t[j], otherwise, E[i][j]=0 is set; and in the matrix H, H[i][j]=1 is set if p[i] is the leading element of a partial character string resulting from separating the pattern p by the limitless gap and the partial character string appears in the j-th position in the text t, otherwise, H[i][j]=0 is set.

2. The partial-character-string position detecting device according to claim 1, wherein
⊕  [Formula 14] is a binary operation defined by
(x.sub.1,f.sub.1)⊕(x.sub.2,f.sub.2):=(x.sub.1custom-characterx.sub.2,f.sub.1custom-character(custom-characterx.sub.1custom-characterf.sub.2)) and pi:=(c[i],ej[i]); and  [Formula 15] the second matrix generating part sets e′.sub.j[i] as the first element of S.sub.i, where S.sub.i is computed in accordance with the following formula:
S.sub.i:=(((((p.sub.i⊕p.sub.i+1)⊕p.sub.i+2)⊕p.sub.i+3)custom-character)⊕p.sub.m-1).  [Formula 16]

3. A partial character string position detecting method using as inputs a secret text <t> of a text t having a length of n, a secret text <p> of a pattern p having a length of m, a secret text <c> of a vector c having a length of m, and a secret text <E> of a matrix E of m rows and n columns and outputting a secret text <H> of a matrix H of m rows and n columns, the partial-character-string position detecting method comprising: a first matrix generating step of generating, by a first matrix generating part, a secret text <F> of a matrix F of m rows and (n+1) columns in which F[i][j]=E[i][j+1 mod n+1], where it is assumed that E[i][n]=custom-characterc[i]; a second matrix generating step of generating, by a second matrix generating part, a secret text <F′> of a matrix F′ of m rows and (n+1) columns, wherein, in the secret text <F′>, F[i][j]=1 is set if c[i]=0 or if c[i]=1 and F[k][j]=1 for every k that is successively c[k]=1 when k is incremented by 1 from i, otherwise, F[i][j]=0 is set; and a third matrix generating step of computing, by a third matrix generating part, <H[i][j]>=<F[i][j−i mod n+1]>custom-character<c[i]>custom-charactercustom-characterc[i−1]> to generate the secrete text <H>; wherein p[i] is an i-th element of the pattern p, t[i] is an i-th element of the text t, c[i] is an i-th element of the vector c, E[i][j] is an element in an i-th row of a j-th column of the matrix E, H[i][j] is an element in an i-th row of a j-th column of the matrix H; in the vector c, c[i]=1 is set if p[i] is not a limitless gap representing a character string having an arbitrary length, otherwise, c[i]=0 is set; in the matrix E, E[i][j]=1 is set if c[i]=0 or p[i]=t[j], otherwise, E[i][j]=0 is set; and in the matrix H, H[i][j]=1 is set if p[i] is the leading element of a partial character string resulting from separating the pattern p by the limitless gap and the partial character string appears in the j-th position in the text t, otherwise, H[i][j]=0 is set.

4. The partial-character-string position detecting method according to claim 3, wherein
⊕  [Formula 17] is a binary operation defined by
(x.sub.1,f.sub.1)⊕(x.sub.2,f.sub.2):=(x.sub.1custom-characterx.sub.2,f.sub.1custom-character(custom-characterx.sub.1custom-characterf.sub.2)) and pi:=(c[i],ej[i]); and  [Formula 18] the second matrix generating step sets e′.sub.j[i] as the first element of S.sub.i, where S.sub.i is computed in accordance with the following formula:
S.sub.i:=(((((p.sub.i⊕p.sub.i+1)⊕p.sub.i+2)⊕p.sub.i+3)custom-character)⊕p.sub.m-1)  [Formula 19]

5. A non-transitory computer readable medium including computer executable instructions that make a computer function as the partial-character-string position detecting device according to claim 1 or 2.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 is a diagram illustrating a functional configuration of a partial-character-string position detecting device; and

[0010] FIG. 2 is a diagram illustrating a process flow of a method for detecting positions of partial character strings.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0011] Before describing embodiments, notation and the definitions of terms used herein will be given.

Notation

[0012] A value “a” concealed by encryption or secret sharing is referred to as a secret text of “a” and denoted as <a>. If secret sharing is used for concealment, a set of pieces of a shared secret held by secure computing devices is referred to by <a>.

[0013] The i-th row of a matrix X is denoted by X[i]. The i-th element of a vector u is denoted by u[i]. A whole matrix resulting from concealing the elements of a matrix X is denoted by <X> and is referred to as a secret text of X. A whole vector resulting from concealing the elements of a vector “u” is denoted by <u> and referred to as a secret text of “u”.

[0014] •.sup.T denotes the transpose of •.

[0015] <Addition, Subtraction, Multiplication>

[0016] Addition, subtraction and multiplication take inputs of secret texts <a>, <b> of two values a, b and yield secret texts <c.sub.1>, <c.sub.2> and <c.sub.3>, respectively, as the results of the computations, a+b, a−b, and ab, respectively. The executions of the operations are written as follows:


<c.sub.1>←Add(<a>,<b>),


<c.sub.2>←Sub(<a>,<b>),


<c.sub.3>←Mul(<a>,<b>)  [Formula 1]

[0017] Note that when there is no risk of misunderstanding, Add (<a>, <b>), Sub (<a>, <b>) and Mul (<a>, <b>) are simply denoted as <a>+<b>, <a>−<b> and <a>×<b>, respectively.

[0018] <Logical Operations>

[0019] Logical OR, logical AND, and negation operations take inputs of secret texts <a>, <b> of two values a, bε{0, 1} and yield secret texts <c.sub.1>, <c.sub.2> and <c.sub.3>, respectively, of the results c.sub.1, c.sub.2, and c.sub.3 of logical OR of “a” and “b”, logical AND of “a” and “b” and negation of “a”, respectively. The executions of the operations are written as follows:


<c.sub.1>←<a>custom-character<b>,


<c.sub.2>←<a>custom-character<b>,


<c.sub.3>←custom-character<a>

[0020] The logical operations are accomplished by computations of the following formulas:


<c.sub.1>←<a>+<b>−<a>×<b>,


<c.sub.2>←<a>×<b>,


<c.sub.3>←1−<a>  [Formula 3]

[0021] <Equality Testing>

[0022] Equality testing operations take inputs of secret texts <a>, <b> of two values a, b and yields secret texts <c.sub.1>, <c.sub.2> of truth values c.sub.1, c.sub.2, of a=b, a≠b, respectively. A truth value of 1 represents true and a truth value of 0 represents false.

[0023] The executions of the operations are written as follows.


<c.sub.1>←(<a>custom-character<b>),


<c.sub.2>←(<a>≠<b>)  [Formula 4]

[0024] Concealment, reconstruction, addition, subtraction and multiplication may be accomplished using methods described in Non-patent literature 1. Equality testing may be accomplished using a method described in Ivan Damgard, Matthias Fitzi, Eike, Kitz, Jesper Buus Nielsen and Tomas Toft, “Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation”, TCC, pp. 285-304, 2006 (Reference Literature 1).

[0025] <Pattern Matching>

[0026] Pattern matching is the problem of, given two character strings, i.e. a text t and a pattern, determining whether or not the text t matches a condition described in the pattern p. The text t is a vector in which 0 or more elements of an alphabet Σ={a, b, c, . . . , z} are arranged. The pattern p is a vector that consists of 0 or more alphabetical characters or special symbols. The special symbols include special symbols such as “?” and “*” used in the command LIKE in SQL, which is a language for database operations, and in many shells, which are languages for operating system (OS) software operations. The former special symbol “?” is a special symbol representing any single alphabetic character and is called a wildcard. The latter special symbol “*” is a special symbol representing an alphabetic character string of an arbitrary length greater than or equal to 0 and called a limitless gap. A text t matches a pattern p if the text t is included in a set of character strings that can be represented by the pattern p.

[0027] For example, let a pattern p be p=(a, b, ?, a, b, *). The pattern p can represent t.sub.0=(a, b, c, a, b) and t.sub.1=(a, b, a, a, b, x, x) but cannot represent t.sub.2=(a, b, a, b, a). Therefore, the former two texts t.sub.0 and t.sub.1 matches the pattern p but the latter text t.sub.2 does not match the pattern p.

[0028] A popular method of such pattern matching is to consider a pattern to be an arrangement of partial character strings separated by limitless gaps, compute the positions in which the partial character strings appear in a text, and use information about the positions to determine whether the text matches the pattern.

[0029] Partial character strings s.sub.0, . . . , s.sub.k-1 (si[i]ε(ΣU [?])), which is a pattern p separated by a limitless gap “*”, is considered to be vectors coupled by a limitless gap “*”. Here, k is the number of partial character strings in the pattern p. In this case, it is said that a partial character string s.sub.i appears in a position j in a text t if the following formula is satisfied.


(s.sub.i[0]=p[j]custom-characters.sub.i[0]=?)


custom-character(s.sub.i[1]=p[j+1]custom-characters.sub.i[1]=?)


custom-character . . . custom-character(s.sub.i[λ.sub.i−1]=p[j+λ.sub.i−1]custom-characters.sub.i[λ.sub.i−1]=?)  [Formula 5]

[0030] Here, λ.sub.i is the size of the partial character string

[0031] For example, if a text t=(a, a, b, a, a, b, a) and a pattern p=(*, a, ?, *, b, *), the pattern p can be divided into two partial character strings (a, ?) and (b) by a limitless gap “*”. Therefore, k=2, s.sub.0=(a, ?) and s.sub.1=(b). In this case, s.sub.0=(a, ?) appears in positions j=0, 1, 3, 4 and s.sub.1=(b) appears in positions j=2, 5.

[0032] Embodiments of the present invention will be described below in detail. Note that components that have like functions are given like reference numerals in drawings and repeated description of the components will be omitted.

[0033] As illustrated in FIG. 1, a partial-character-string position detecting device 1 according to an embodiment comprises an input part 10, a first matrix generating part 20, a second matrix generating part 30, a third matrix generating part 40 and an output part 50.

[0034] The partial-character-string position detecting device 1 is a special device configured by installing a special program into a well-known or dedicated computer comprising a central processing unit (CPU), a random access memory (RAM) and other components. The partial-character-string position detecting device 1 executes processes under the control of the CPU, for example. Data input into the partial-character-string position detecting device 1 and data obtained through the processes are stored in the RAM and the data stored in the RAM is read and used in other processes as needed, for example.

[0035] The partial-character-string position detecting device 1 takes inputs of a secret text <t> of a text having a length n, a secret text <p> of a pattern p having a length m, a secret text <c> of a vector c having a length m, and a secret text <E> of a matrix E of m rows and n columns, and outputs a secret text <H> of a matrix H of m rows and n columns.

[0036] In the vector c, c[i]=1 is set if p[i] is not a limitless gap representing a character string having an arbitrary length, otherwise, c[i]=0 is set. Here, p[i] is the i-th element of the pattern p and c[i] is the i-th element of the vector c.

[0037] In the matrix E, E[i][j]=1 is set if c[i]=0 or p[i]=t[j], otherwise, E[i][j]=0 is set. Here, t[i] is the i-th element of the text t. E[i][j] is an element in the i-th row of the j-th column of matrix E.

[0038] In the matrix H, H[i][j]=1 is set if p[i] is the leading element of a partial character string s.sub.λ resulting from separating the pattern p by a limitless gap and the partial character string s.sub.λ appears in the j-th position in the text t, otherwise, H[i][j]=0 is set. Here, H[i][i] is an element in the i-th row of the j-th column of matrix H. Here, λ is an index of a partial character string and λ=0, . . . , L−1, assuming that there are L partial character strings into which the pattern p is separated by limitless gaps.

[0039] A method for detecting the positions of partial character strings according to an embodiment will be described with reference to FIG. 2.

[0040] At step S10, a secret text <t> of a text t, a secret text <p> of a pattern p, a secret text <c> of a vector c, and a secret text <E> of a matrix E are input into the input part 10.

[0041] At step S20, the first matrix generating part 20 generates a secret text <F> of a matrix F of m rows and (n+1) columns in which F[i][j]=E[i][j+i mod n+1] (where it is assumed that E[i][n]=custom-characterc[i]). The matrix F is a matrix E having custom-characterc.sup.T coupled to the last row and being shifted to the left by i in the i-th row, where i=0, . . . , m−1.

[0042] At step S30, the second matrix generating part 30 generates a secret text <F′> of a matrix F′ of m rows and (n+1) columns in which F′[i][j]=1 is set if c[i]=0 or if c[i]=1 and F[k][j]=1 for every k that is successively c[k]=1 when k is incremented by 1 from i, otherwise, F[i][j]=0 is set.

[0043] The matrix F′ may be generated as follows. Let e.sub.j denote the j-th column vector of the matrix F. Vector c and the vector e.sub.j are used to generate a vector e′.sub.j as follows. In the vector e′.sub.j, e′.sub.j[i]=1 is set if c[i]=0custom-character(c[i]=1custom-character(e.sub.j[k]=1 for all successive k that satisfies c[k]=1 at and succeeding c[i])), otherwise, e′.sub.j[i]=0 is set. The matrix F′ is generated as F′[i][j]=e′.sub.j[i] by using the vector e′.sub.j.

[0044] A specific method for computing the vector e′.sub.j is as follows.

[0045] First, consider a binary operation defined by the following formula.


⊕  [Formula 6]


(x.sub.1,f.sub.1)⊕(x.sub.2,f.sub.2):=(x.sub.1custom-characterx.sub.2,f.sub.1custom-character(custom-characterx.sub.1custom-characterf.sub.2))  [Formula 7]

[0046] pi:=(c[i],e.sub.j[i]) is defined and the first element S.sub.i[0] of S.sub.i computed according to the following formula is set as e′.sub.j[i]. This allows a vector e′j that satisfies the condition given above to be generated.


S.sub.i:=(((((p.sub.i⊕p.sub.i+1)⊕p.sub.i+2)⊕p.sub.i+3) . . . )⊕p.sub.m-1)  [Formula 8]

[0047] Using coupling nature of the binary operation


⊕,  [Formula 9]

an approach described in Richard E. Ladner and Michael J. Fischer, “Parallel prefix computation”, J. ACM, vol. 27, no. 4, pp. 831-838, 1980 (Reference literature 2)” can be used. The approach described in Reference literature 2 changes an order relation of a binary operation to increase the efficiency of computation. Using this approach, S.sub.1, . . . , S.sub.m-1 can be computed with binary operations with O(m) times and O(log m) rounds and therefore the vector e′.sub.j can be more efficiently computed.

[0048] At step S40, the third matrix generating part 40 computes <H[i][j]>=<F′[i][j−1 mod (n+1)]>custom-character<c[i]>custom-charactercustom-character<c[i−1> to generate a secret text <H> of the matrix H.

[0049] At step S50, the output part 50 outputs the secret text <H> of matrix H of m rows and n columns. The matrix H indicates that if j exists in the i-th row of the matrix H such that H[i][j]=1, a partial character string s.sub.λ is detected at the j-th position of the text t, where p[i] is the leading element of the partial character string s.sub.λ in the pattern p.

[0050] An example will be used to show that the positions of partial character strings can be detected using the method described above.

[0051] For example, assume that a text t, a pattern p, a vector c and a matrix E given below are input at step S10.

[00001] t = ( a , a , b , a , b , a , b , b , a , b ) , .Math. p T = ( * a b * ? b * ) , c T = ( 0 1 1 0 1 1 0 ) , .Math. E = ( 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 ) [ Formula .Math. .Math. 10 ]

[0052] In this example, n=10 and m=7.

[0053] At step S20, a matrix F given below is generated.

[00002] F = ( 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 ) [ Formula .Math. .Math. 11 ]

[0054] At step S30, a matrix F′ given below is generated.

[00003] F = ( 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 ) [ Formula .Math. .Math. 12 ]

[0055] At step S40, a matrix H given below is generated.

[00004] H = ( 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ) [ Formula .Math. .Math. 13 ]

[0056] For example, partial character strings (p[4], p[5])=(?, b) in a pattern p=(*, a, b, *, ?, b, *) appear at t[1]=(a, b), t[3]=(a, b), t[5]=(a, b), t[6]=(b, b), and t[8]=(a, b) in a text t=(a, a, b, a, b, a, b, b, a, b). It can be seen that in the matrix H, elements H[4][1], H[4][3], H[4][5], H[4][6], H[4][8] are 1 of the vector H[4]=(0 1 0 1 0 1 1 0 1 0) and the other elements are 0. In this way, the positions in which partial character strings in the pattern p appear can be detected by using the matrix H.

Effects of the Invention

[0057] The partial character string position detecting technique according to the present invention is capable of detecting the positions in which partial character strings included in a pattern appear in a text in O(log m) rounds with a communication amount of O(mn) when results of character-by-character matching between the text and the pattern are given.

KEY POINTS OF THE INVENTION

[0058] According to the present invention, the positions in which partial character strings included in a pattern appear in a text are computed at once using the results of character-by-character matching, rather than computing the positions on a partial character sting-by-partial character string basis. Consequently, the processing that would require an amount of communication of Ω(m.sup.3n) if the positions are computed on a partial character string-by-partial character string basis can be accomplished with an amount of communication of O(mn).

[0059] It would be understood that the present invention is not limited to the embodiments described above and modifications can be made without departing from the spirit of the present invention. The operations described above may be performed not only in time sequence as is written but also in parallel or individually, depending on the throughput of the devices that perform the processes or requirements.

[0060] [Program and Recording Media]

[0061] If the processing functions of the devices described in the descriptions of the above embodiments are implemented by a computer, processing of the function that each device needs to include is described in a program. The program is executed on the computer to implement the processing functions described above on the computer.

[0062] The program describing the processing can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, and a semiconductor memory, for example.

[0063] The program may be distributed, for example, by selling, transferring, or lending portable recording media on which the program is recorded, such as DVDs or CD-ROMs. The program may be stored on a storage device of a server computer and transferred from the server computer to other computers over a network, thereby distributing the program.

[0064] A computer that executes the program first stores the program recorded on a portable recording medium or the program transferred from a server computer into a storage device of the computer. When the computer executes the processes, the computer reads the program stored in the storage device of the computer and executes the processes according to the read program. In another mode of execution of the program, the computer may read the program directly from a portable recording medium and may execute the processes according to the program or may execute the processes according to the program each time the program is transferred from the server computer to the computer. Alternatively, the processes may be executed using a so-called ASP (Application Service Provider) service in which the program is not transferred from a server computer to the computer but processing functions are implemented only by instructions to execute the program and acquisition of the results of the execution. It should be noted that the program in this mode comprises information that is made available for use in processing by an electronic computer and is equivalent to a program (such as data that is not direct commands to the computer but has the nature of defining processing performed by the computer).

[0065] While a given program is executed on a computer to configure the present device in this mode, at least part of the processes may be implemented by hardware.