Quantum machine learning algorithm for knowledge graphs

11562278 · 2023-01-24

Assignee

Inventors

Cpc classification

International classification

Abstract

A method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of the third type. By performing specific steps and choosing values according to specific prescriptions, an efficient and resource-saving method is developed that utilizes the power of quantum computing systems for inference tasks on large knowledge graphs. An advantageous value for a cutoff threshold for a cutoff based on singular values of a singular value tensor decomposition is prescribed, and a sequence of steps is developed in which only the squares of the singular values are of consequence and their signs are not.

Claims

1. A computer-implemented method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of a third type, comprising at least the steps of: providing a query comprising the first entity and the second entity; modelling the knowledge graph as a partially observed tensor {circumflex over (χ)} in a classical, non-transitory computer-readable memory structure; providing a cutoff threshold, τ; creating, from the partially observed tensor {circumflex over (χ)}, a density operator in a quantum random access memory, qRAM, on the quantum computing device; preparing a unitary operator, U, based on the created density operator, comprising states on a clock register, C; creating a first entity state |χ.sub.s.sup.(1)custom character indicating the first entity on an input data register of the qRAM, wherein the first entity state is entangled with a maximally entangled clock register; applying the prepared unitary operator to at least the first entity state; performing thereafter a quantum phase estimation on the clock register, C; performing thereafter a computation on the clock register, C, to recover singular values; creating an auxiliary qubit in an auxiliary register, R, which is entangled with the first entity state resulting from the creating; wherein the auxiliary register, R, has a first eigenstate |1custom character.sub.R and a second eigenstate |0custom character.sub.R; performing, on a result of the computation to recover the singular values, a singular value projection conditioned on the state of the auxiliary register, R, such that eigenstates whose squared singular values are to one side of the squared cutoff threshold, τ.sup.2, are entangled with the first eigenstate |1custom character.sub.R of the auxiliary register, R, and such that eigenstates whose squared values are to another side of the squared cutoff threshold, τ.sup.2, or equal to the squared cutoff threshold, τ.sup.2, are entangled with the second eigenstate |0custom character.sub.R of the auxiliary register, R; measuring the auxiliary register, R, and post-selecting one of the two eigenstates |0custom character.sub.R; tracing out the clock register, C; measuring the result thereof in a canonical basis of the input data register, wherein the canonical basis comprises tensor products of a basis connected to the second entity type and of a basis connected to the third entity type; post-selecting the second entity in the basis connected to the second entity type to infer the third entity.

2. The method of claim 1, wherein the partially observed tensor {circumflex over (χ)} is obtained such that, for each entry of the partially observed tensor {circumflex over (χ)}, the entry is, with a probability p: directly proportional to a corresponding entry of a complete tensor χ modelling a complete knowledge graph; and equal to 0 with a probability of 1−p, with p being smaller than 1.

3. The method of claim 2, wherein the cutoff threshold τ is chosen as smaller or equal to a quantity which is indirectly proportional to the probability p.

4. The method of claim 2, wherein the probability p is chosen to be larger to or equal a maximum value out of a set of values.

5. The method of claim 4, wherein the set of values comprises at least a value of 0.22.

6. The method of any of claim 4, wherein the partially observed tensor {circumflex over (χ)} is expressable as a sum of the complete tensor χ and a noise tensor N, and wherein a desired value {tilde over (∈)}>0 is defined such that the Frobenius norm ∥⋅∥.sub.F of a rank-r-approximation custom character.sub.r of the noise tensor Υ is bounded such that ∥Υ.sub.r∥.sub.F≤{tilde over (∈)}∥A∥.sub.F, and wherein the set of values comprises at least one value that is proportional to r and indirectly proportional to {tilde over (∈)} to the n-th power, with n integer and n≥1.

7. The method of claim 6, wherein the set of values comprises at least one value that is proportional to r and that is indirectly proportional to the square of {tilde over (∈)}.

8. The method of claim 6, wherein the set of values comprises at least one value that is proportional to a square root of r and that is indirectly proportional to {tilde over (∈)}.

9. The method of claim 6, wherein the set of values comprises at least one value that is independent of r and that is indirectly proportional to the square of {tilde over (∈)}.

10. A computing system comprising a classical computing device and a quantum computing device, wherein the computing system is configured to perform the method according to claim 1.

Description

BRIEF DESCRIPTION

(1) FIG. 1 shows a schematic flow diagram illustrating a method according to an embodiment of the first aspect of the present invention;

(2) FIG. 2 shows a schematic block diagram of a computing according to an embodiment of the second aspect of the present invention;

(3) FIG. 3 shows results of measurements of the performance of the present method; and

(4) FIG. 4 shows results of measurement of the present method.

DETAILED DESCRIPTION

(5) Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that the variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. Generally, this application is intended to cover any adaptations or variations of the specific embodiments discussed herein.

(6) FIG. 1 shows a schematic flow diagram illustrating a method according to an embodiment of the first aspect of the present invention.

(7) This description contains two parts. The first part contributes to the classical binary tensor sparsification method. Especially, the first binary tensor sparsification condition is derived under which the original tensor can be well approximated by a truncated (or: projected) tensor SVD of its subsampled tensor.

(8) The second part contributes to the method of performing knowledge graph inference on universal quantum computers. In order to handle the tensorized data, a quantum tensor contraction subroutine is described. Then, a quantum sampling method on knowledge graphs using quantum principal component analysis, quantum phase estimation and quantum singular value projection is described. The runtime complexity is analyzed, and it is shown that this sampling-based quantum computation method provides exponential acceleration with respect to the size of the knowledge graph during inference.

(9) All the state-of-the-art algorithms for statistical relational learning on knowledge graphs are implemented on classical computational powers, e.g., CPUs or GPUs. The major difference between a classical approach and quantum approach is that classical algorithm is learning-based, e.g., by back-propagating the gradients of loss function, while the proposed quantum algorithm is sampling-based. The present method for implicit knowledge inference on knowledge graphs is implemented at least by measuring the quantum states returned by the quantum algorithm without requiring any particular loss function and gradients update rules.

(10) In the first part in the following the conditions under which the classical tensor singular value decomposition (tSVD) can be applied to recover a subsampled tensor are shown. These conditions ensure that the quantum counterpart is feasible and has good performance in comparison with benchmarking classical algorithms. The second part explains the method of implicit knowledge inference from tensorized data on universal quantum computers. Furthermore, the runtime complexity of the quantum method is analyzed.

(11) As an overview, first some theoretical and practical foundations and considerations for the method are described, and then the method according to an embodiment of the first aspect of the present invention is described in more detail. In addition, we also incidentally describe a computing system according to an embodiment of the second aspect of the present invention.

(12) Part 1: Classical Tensor Singular Value Decomposition

(13) First, singular value decomposition (SVD) of matrices is described. Then, a tensor SVD is introduced and it is shown that a given tensor can be reconstructed with a small error from the low-rank tensor SVD of the subsampled tensor.

(14) the single value decomposition, SVD, can be defined in the following: let A∈custom character.sup.m×n, the SVD is a factorization of A of the form A=UΣV.sup.T, where Σ is a rectangle diagonal matrix with singular values on the diagonal, U∈custom character.sup.m×m and V∈custom character.sup.n×n are orthogonal matrices with U.sup.TU=UU.sup.T=I.sub.m and V.sup.TV=VV.sup.T=I.sub.n, wherein I.sub.m is an m×m identity matrix.

(15) An N-way tensor is defined as custom character=(custom character.sub.i.sub.1.sub.i.sub.2.sub.. . . i.sub.N)∈custom character.sup.d.sup.1.sup.×d.sup.2.sup.× . . . ×d.sup.N, where d.sub.k is the k-th dimension. Given two tensors custom character and custom character with the same dimensions, the inner product is defined as

(16) .Math. 𝒜 , .Math. F := .Math. i 1 = 1 d 1 .Math. .Math. i N = 1 d N 𝒜 i 1 i 2 .Math. i N i 1 i 2 .Math. i N

(17) The Frobenius norm is defined as ∥custom character∥.sub.F:=√{square root over (custom charactercustom character, custom charactercustom character.sub.F)}. The spectral norm ∥custom character∥.sub.σ of the tensor A is defined as
max{custom character.Math..sub.1x.sub.1 . . . .Math..sub.Nx.sub.N|x.sub.k∈S.sup.d.sup.k.sup.−1,k=1 . . . N},  (1)
where the tensor-vector product is defined as

(18) 𝒜 .Math. 1 x 1 .Math. .Math. N x N := .Math. i 1 = 1 d 1 .Math. .Math. i N = 1 d N 𝒜 i 1 i 2 .Math. i N x 1 i 1 x 2 i 2 .Math. x Ni N
and S.sup.d.sup.k.sup.−1 denotes the unit sphere in custom character.sup.n.sup.k.

(19) In the following, a tensor single value decompositions, Tensor SVD, is described. In analogy to the matrix singular value decomposition, tensor singular value decomposition is described in detail e.g. in “Chen et al.”.

(20) Definition 1. If a tensor custom charactercustom character.sup.d.sup.1.sup.×d.sup.2.sup.× . . . ×d.sup.N can be written as a sum of rank−1 outer product tensors custom character=Σ.sub.i=1.sup.Rσ.sub.iu.sub.1.sup.(i).Math.u.sub.2.sup.(i) . . . .Math.u.sub.N.sup.(i), with singular values σ.sub.1≥σ.sub.2≥ . . . ≥σ.sub.R and custom characteru.sub.k.sup.(i), u.sub.k.sup.(j)custom character=δ.sub.ij for k=1, . . . , N, then custom characterhas a tensor singular value decomposition with rank R.

(21) In real-world applications, we can only observe part of the non-zero entries in a given tensor custom character, and the task is to infer unobserved non-zero entries with high probability. This corresponds to items recommendation for users given an observed preference matrix, or implicit knowledge inference given partially observed relational database. In other words, a “partially observed” tensor representing a knowledge graph is only partially known, since not all semantic triplets are known a priori, and the inference task is to infer interesting entities (subject, predict, or object) of semantic triples which are not contained in the “partially observed” tensor but which would be obtained in a hypothetical complete tensor custom character. The partially observed tensor is herein also designated as sub-sampled or sparsified, denoted custom character. Particularly, without further specifying the dimensionality of the tensor, the following subsampling and rescaling scheme proposed in “Achlioptas et al.” is used:

(22) 𝒜 ^ i 1 i 2 .Math. i N = { 𝒜 i 1 i 2 .Math. i N p with probability p 0 otherwise . ( 2 )

(23) This means that the non-zero elements of a hypothetical complete tensor custom character are independently and identically sampled with the probability p and rescaled afterwards. The sub-sampled tensor can be rewritten as custom character=custom character+custom character, where custom character is a noise tensor. Entries of custom character are thus independent random variables with distribution

(24) 𝒩 i 1 .Math. i N = { ( 1 p - 1 ) 𝒜 i 1 .Math. i N with probability p - 𝒜 i 1 .Math. i N with probability 1 - p ( 3 )

(25) In the following, a 3-dimensional semantic tensor χ as one example of a tensor custom character is of particular interest. The present methods builds on the assumption that the original semantic tensor χ modelling the complete knowledge graph (or, in from a different viewpoint, the tensor χ as the complete knowledge graph) has a low-rank approximation, denoted χ.sub.r, with small rank r.

(26) This is a plausible assumption if the knowledge graph contains global and well-defined relational patterns, as has been shown in “Nickel et al.”. Therefore, the question may be posed under what conditions the original tensor χ can be reconstructed approximately from the low-rank approximation of subsampled semantic tensor {circumflex over (χ)} derived from the incomplete knowledge graph. In the following, {circumflex over (χ)}.sub.r denotes the r-rank approximation of the subsampled tensor {circumflex over (χ)}; and {circumflex over (χ)}.sub.|⋅|>τ denotes the projection of {circumflex over (×)} onto the subspaces whose absolute singular values are larger than a predefined threshold τ>0.

(27) Thus, with reference to FIG. 1, in a first step S10, a knowledge graph is modelled as a partially observed tensor {circumflex over (χ)} in a classical computer-readable memory structure.

(28) This classical computer-readable memory structure may be any volatile or non-volatile computer-readable memory structure such as a hard drive, a solid state drive and/or the like. The computer-readable memory structure may be part of a classical computing device which in turn may be part of a computing system for performing an inference task on a knowledge graph.

(29) FIG. 2 shows a schematic block diagram of a computing system 1000 for performing an inference task on a knowledge graph according to an embodiment of the second aspect of the present invention.

(30) FIG. 2 shows that the computing system 1000 comprises two main components: a classical computing device 100 as well as a quantum computing device 200 which are connected by an exchange interface 300 therebetween. The quantum computing device 200 may be implemented in any known way, for example as has been described in “Ma et al.”. Moreover, the computing system 1000 comprises an input interface 10, in particular for receiving an inference task (or: query) with a first entity of a first entity type and a second entity of a second entity type, e.g. (s, p, ?) or (?, p, o) or (s, ?, o) and an output interface 90 for outputting the inferred third entity of the third entity type o, s, or p, respectively.

(31) It shall be understood that the method that is being described with respect to FIG. 1 may be performed using a computing system according to an embodiment of the second aspect of the present invention, in particular the computing system 1000 of FIG. 2 and/or that the computing system 1000 may be adapted to perform the method according to an embodiment of the first aspect of the present invention, in particular the method according to FIG. 1.

(32) In further description of the method of FIG. 1, in the next two theorems, subsample conditions are shown under which the original semantic tensor χ can be reconstructed approximately from {circumflex over (χ)}.sub.r or {circumflex over (χ)}.sub.|⋅|>τ. The analysis was conducted by bounding the reconstruction error ∥χ−{circumflex over (χ)}.sub.r∥.sub.F and ∥χ−{circumflex over (χ)}.sub.|⋅|>τ∥.sub.F, respectively. Bounding the reconstruction error ensures a good implicit knowledge inference.

(33) Suppose that we infer on a knowledge graph given the query (s, p, ?), i.e. subject s and predicate p are given, and the inference task at hand is to infer on the object o which completes the semantic triplet—or, in other words, which makes the semantic statement (s, p, o) true.

(34) In the present context, the subjects are a first entity type which corresponds to a dimension of size d.sub.1 in χ and {circumflex over (χ)}; the predicates are a second entity type which corresponds to a dimension of size d.sub.2 in χ and {circumflex over (χ)}; and the objects are a third entity type which corresponds to a dimension of size d.sub.3 in χ and {circumflex over (χ)}.

(35) Then, given an incomplete semantic triple (or: query, or: inference task) as (s, p, ?), the running time for inferring the correct objects to the query scales, in classical systems, as O(d.sub.3). This is because the same algorithm is repeated at least d.sub.3 times in order to determine possible answers, leading to huge waste of computing power especially, when nowadays the sizes of knowledge graphs are consistently growing.

(36) Advantageously, only top-n returns from the reconstructed tensor {circumflex over (χ)} written as {circumflex over (χ)}.sub.sp1, . . . , {circumflex over (χ)}.sub.spn, are read out, where n is a small integer corresponding to the commonly used Hits@n metric (see e.g. “Ma et al.”). The inference is called successful if the correct object corresponding to the query can be found in the returned list {circumflex over (χ)}.sub.sp1, . . . , {circumflex over (χ)}.sub.spn. It can be proven that the probability of a successful inference is high if the reconstruction error small enough. Therefore, in the following we provide sub-sampling conditions under which the construction error is unexpectedly small.

(37) Without further specifying the dimension of the tensor, let us consider a high-dimensional tensor custom character. Theorem 1 gives the condition for the subsample probability under which the original tensor custom character can be reconstructed approximately from custom character.sub.r.

(38) Theorem 1. Let custom character∈{0, 1}.sup.d.sup.1.sup.×d.sup.2.sup.× . . . ×d.sup.N. Suppose that custom character can be well approximated by its r-rank tensor SVD custom character.sub.r. Using the subsampling scheme defined in Eq. 2 with the sample probability

(39) p max { 0.22 , 8 r ( log ( 2 N N 0 ) .Math. k = 1 N d k + log 2 δ ) / ( ϵ ~ .Math. 𝒜 .Math. F ) 2 } ,
where N.sub.0=log 3/2, then the original tensor custom character can be recons from the truncated tensor SVD of the subsampled tensor custom character. The error satisfies ∥custom charactercustom character.sub.r∥.sub.F≤∈∥custom character∥.sub.F with probability at least 1−δ, where ∈ is a function of {tilde over (∈)}. Especially, {tilde over (∈)} together with the sample probability controls the norm of the noise tensor.

(40) In particular it is desired that the Frobenius norm ∥⋅∥.sub.F of a rank-r-approximation custom character.sub.r of the noise tensor custom character is bounded such that ∥custom character.sub.r∥.sub.F≤{tilde over (∈)}∥A∥.sub.F.

(41) Now, it is briefly discussed why tensor custom character.sub.|⋅|>τ is introduced before describing the reconstruction error caused by it. Note that quantum algorithms are fundamentally different from classical algorithms. For example, classical algorithms for matrix factorization approximate a low-rank matrix by projecting it onto a subspace spanned by the eigenspaces possessing top-r singular values with predefined small r. Quantum methods, e.g., quantum singular value estimation, on the other hand, can read and store all singular values of a unitary operator into a quantum register.

(42) However, singular values stored in the quantum register cannot be read out and compared simultaneously since quantum state collapses after one measurement; measuring the singular values one by one will also break the quantum advantage. Therefore, we perform a projection onto the union of operator's subspaces whose singular values are larger than a threshold; and this step can be implemented on the quantum register without destroying the superposition. Moreover, since herein quantum principal component analysis is used as a subroutine which ignores the sign of singular values during the projection, reconstruction error given by custom character.sub.|⋅|≥τ for the quantum algorithm may be analyzed.

(43) The following Theorem 2 gives the condition under which custom character can be reconstructed approximately from custom character.sub.|⋅|>τ.

(44) Theorem 2. Let custom character∈{0, 1}.sup.d.sup.1.sup.×d.sup.2.sup.× . . . ×d.sup.N. Suppose that custom character can be well approximated by its r-rank tensor SVD custom character.sub.r. Using the subsampling scheme defined in Eq. 2 with the sample probability

(45) p max { 0.22 , p 1 := l 1 C 0 ( ϵ ~ .Math. 𝒜 .Math. F ) 2 , p 2 := rC 0 ( ϵ ~ .Math. 𝒜 .Math. F ) 2 , p 3 := 2 rC 0 ϵ 1 ϵ ~ .Math. 𝒜 .Math. F } ,
wherein p<1, wherein

(46) C 0 = 8 ( log ( 2 N N 0 ) .Math. k = 1 N d k + log 2 δ ) ,
N.sub.0=log 3/2; l.sub.1 denotes the largest index of singular values of tensor custom character with σ.sub.l.sub.1≥τ, such that wen choosing the threshold as

(47) τ 2 C ( ) p ϵ ~ ,
then the original tensor custom character can be reconstructed from the projected tensor SVD of custom character. The error satisfies ∥custom charactercustom character.sub.|⋅|≥τ∥.sub.F≤∈∥custom character∥.sub.F with probability at least 1−δ, where ∈ is a function of {tilde over (∈)} and ∈.sub.1. Especially, {tilde over (∈)} together with p.sub.1 and p.sub.2 determine the norm of noise tensor and ∈.sub.1 together with p.sub.3 control the value of custom character's singular values that are located outside the projection boundary.

(48) Thus, as shown in the equation above, advantageously the threshold τ is chosen as smaller or equal to a quantity which is indirectly proportional to the probability p and/or smaller or equal to a quantity which is indirectly proportional to {tilde over (∈)}.

(49) On the other hand, the probability p is advantageously chosen to be larger to or equal a maximum value out of a set of values, which set of values in the foregoing example comprises four values: 0.22, p1, p2, and p3. In other words, p will always be larger or equal to at least 0.22. Instead of 0.22, another value in the range of 0.2 and 0.24 may be chosen. However, experiments by the inventors have shown that 0.22 is an ideal value in order to ensure desirable properties for the threshold τ. Specifically, experiments as well as a numerical proof by the inventors have shown that 0.22 is the minimal subsample probability that a subsampled tensor can be reconstructed with a bounded error.

(50) Thus, the set of values the set of values comprises:

(51) a) at least one value in the range of between 0.2 and 0.24, preferably between 0.21 and 0.23, most preferably of 0.22;

(52) b) at least one value p.sub.2 that is proportional to r and indirectly proportional to {tilde over (∈)} to the n-th power, with n integer and n≥1, in particular proportional to r and indirectly proportional to the square of {tilde over (∈)};

(53) c) at least one value p.sub.3 that is proportional to a square root of r and that is indirectly proportional to {tilde over (∈)};

(54) and/or

(55) d) at least one value p.sub.1 that is independent of r and that is indirectly proportional to the square of {tilde over (∈)}.

(56) In the bodies of Theorem 1 and 2 there exist data-dependent parameters r and l.sub.1 which are unknown a priori. These parameters can be estimated by performing tensor SVD to the original and subsampled tensors explicitly. However, in practice, mostly the subsampled tensor is given without knowing the subsample probability. For example, given an incomplete semantic tensor it is usually not known what percentage of information is missing and therefore the entries in the incomplete tensor cannot be easily rescaled. Fortunately, unlike the prior art, the present invention provides a rational initial guess for the subsample probability numerically, and inversely an initial guess for the lower-rank r and the projection threshold τ as well.

(57) Part 2: Inference on Knowledge Graphs Using Quantum Computers

(58) Quantum Mechanics

(59) For ease of understanding, the Dirac notations of quantum mechanics as it is used herein is briefly described. Under Dirac's convention quantum states can be represented as complex-valued vectors in a Hilbert space custom character. For example, a two-dimensional complex Hilbert custom character.sub.2 space can describe the quantum state of a spin-1 particle, which provides the physical realization of a qubit.

(60) By default, the basis in custom character.sub.2 for a spin-1 qubit read |0custom character=[1, 0].sup.T and |1custom character=[0, 1].sup.T. The Hilbert space of a n-qubits system has dimension 2.sup.n whose computational basis can be chosen as the canonical basis |icustom character∈{|0custom character, 1custom character}.sup..Math.n, where .Math. represents tensor product. Hence any quantum state |ϕcustom charactercustom character.sub.2.sub.n can be written as a quantum superposition

(61) .Math. "\[LeftBracketingBar]" ϕ .Math. = .Math. i = 1 2 n ϕ i .Math. "\[LeftBracketingBar]" i .Math. ,
wherein the squared coefficients |ϕ.sub.i|.sup.2 can also be interpreted as the probability of observing the canonical basis state |icustom character after measuring |ϕcustom character using canonical basis.

(62) Moreover, we use custom characterϕ| is used to represent the conjugate transpose of |ϕcustom character, i.e., (|ϕcustom character).sup.†=custom characterϕ|. Given two stats |ϕcustom character and |ψcustom character The inner product on the Hilbert space is defined as custom characterϕ|ψcustom character*=custom characterψ|ϕcustom character. A density matrix is a projection operator which is used to describe the statistics of a quantum system. For example, the density operator of the mixed state |ϕcustom character in the canonical basis reads ρ=Σ.sub.i=1.sup.2.sup.n|ϕ.sub.i|.sup.2|icustom charactercustom characteri|. Moreover, given two subsystems with density matrices ρ and σ the density matrix for the whole system is their tensor product, namely ρ .Math.σ.

(63) The time evolution of a quantum state is generated by the Hamiltonian of the system. The Hamiltonian H is a Hermitian operator with H.sup.†=H. Let |ϕ(t)custom character denote the quantum state at time t under the evolution of an invariant Hamiltonian H. Then according to the Schrôdinger equation |ϕ(t)custom character=e.sup.−iHt|ϕ(0)custom character,

(64) where the unitary operator e.sup.−iHt can be written as the matrix exponentiation of the Hermitian matrix H, i.e.,

(65) 0 e - iHt = .Math. n = 0 ( iHt ) n n ! .

(66) Eigenvectors of the Hamiltonian H, denoted |u.sub.icustom character, also form a basis of the Hilbert space. Then the spectral decomposition of the Hamiltonian H reads H=Σ.sub.iλ.sub.i|u.sub.icustom charactercustom characteru.sub.i|, where λ.sub.i is the eigenvalue or the energy level of the system. Therefore, the evolution operator of a time-invariant Hamiltonian can be rewritten as

(67) e - iHt = e - it .Math. i λ i .Math. "\[LeftBracketingBar]" u i .Math. .Math. u i .Math. "\[RightBracketingBar]" = .Math. i e - i λ i t .Math. "\[LeftBracketingBar]" u i .Math. .Math. u i .Math. "\[RightBracketingBar]" , ( 4 )
where we use the observation (|u.sub.icustom charactercustom characteru.sub.i|).sup.n=|u.sub.icustom charactercustom characteru.sub.i| for n=1, . . . , ∞.

(68) When applying it on an arbitrary initial state |ϕ(0)custom character we obtain |ϕ(t)custom character=e.sup.−iHt|ϕ(0)custom character=Σ.sub.ie.sup.−iλ.sup.i.sup.tβ.sub.i|u.sub.icustom character, where β.sub.i indicates the overlap between the initial state and the eigenbasis of H, i.e., β.sub.i:=custom characteru.sub.i|ϕ(0)custom character. To implement the time evolution operator e.sup.−iHt and simulate the dynamics of a quantum system using universal quantum circuits is a challenging task since it involves the matrix exponentiation of a possibly dense matrix.

(69) The present invention concerns a method for the inference on knowledge graphs using a quantum computing device 200. In the following we focus on the semantic tensor χ∈{0, 1}.sup.d.sup.1.sup.×d.sup.2.sup.×d.sup.3, with d.sub.1, d.sub.2, and d.sub.3 defined as above, and let {circumflex over (χ)} denote the partially observed tensor.

(70) Since knowledge graphs contain global relational patterns, χ could be approximated by a lower-rank tensor χ.sub.r thereof reconstructed approximately from {circumflex over (χ)} via tensor SVD according to Theorem 1 and 2. Since our quantum method is sampling-based instead of learning-based, without loss of generality we consider sampling the correct objects given the query (s, p, ?) as an example and discuss the runtime complexity of one inference. Herein we therefore designate the given subject as a first entity of a first entity type (“subjects”), the predicate as a second entity of a second entity type (“predicates”) and the unknown object as a third entity of a third entity type (“objects”).

(71) The preference matrix of a recommendation system normally contains multiple nonzero entries in a given user-row; items recommendations are made according to the nonzero entries in the user-row by assuming that the user is ‘typical’. However, in a knowledge graph there might be only one nonzero entry in the row (s, p, ⋅). Therefore, advantageously, for the inference on a knowledge graph quantum algorithm triples with the given subject s are sampled and then and post-selected on the predicate p. This is a feasible step especially if the number of semantic triples with s as subject and p as predicate is custom character(1).

(72) The present method contains the preparing and exponentiating of a density matrix derived from the tensorized classical data. One of the challenges of quantum machine learning is loading classical data as quantum states and measuring the states since reading or writing high-dimensional data from quantum states might obliterate the quantum acceleration. Therefore, the technique quantum Random Access Memory (qRAM) was developed (see “Giovannetti et al.”) which can load classical data into quantum states with exponential acceleration. For details about the qRAM technique, it is referred to “Giovannetti et al.”. The basic idea of the present method is to project the observed data onto the eigenspaces of {circumflex over (χ)} whose corresponding singular values have an absolute value larger than a threshold τ. Therefore, we need to create an operator which can reveal the eigenspaces and singular values of {circumflex over (χ)}.

(73) As mentioned in the foregoing, in a step S10, a knowledge graph is modelled as a partially observed tensor {circumflex over (χ)} in a classical computer-readable memory structure 110 of a classical computing device 100, see FIG. 2.

(74) In a step S12, which does not have to be performed in this order necessarily, a cutoff threshold τ is provided, which is preferably determined as has been described in the foregoing.

(75) In a step S20, the following density operator (or: density matrix) is created, on the quantum computing device 200, from {circumflex over (χ)} via a tensor contraction scheme:

(76) ρ χ ^ χ ^ := .Math. i 2 i 3 i 2 i 3 .Math. i i χ ^ i 1 , i 2 i 3 χ ^ i 1 , i 2 i 3 .Math. i 2 i 3 .Math. .Math. i 2 i 3 .Math. , ( 5 )
where

(77) .Math. i i χ ^ i 1 , i 2 i 3 χ ^ i 1 , i 2 i 3
means tensor contraction along the first dimension (here: the subject dimension since the exemplary inference task is (s, p, ?)); a normalization factor is neglected temporarily.

(78) Especially, ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)} can be prepared via qRAM (see “Giovannetti et al.”) in time
custom character(polylog(d.sub.1d.sub.2d.sub.3))  (6)
in the following way: First, the quantum state

(79) .Math. i 1 i 2 i 3 χ ^ i 1 , i 2 i 3 .Math. i 1 i 2 i 3 .Math. = .Math. i 1 i 2 i 3 χ ^ i 1 , i 2 i 3 .Math. i 1 .Math. .Math. .Math. i 2 .Math. .Math. .Math. i 3 .Math.
is prepared via qRAM, which can be implemented in time custom character(polylog(d.sub.1d.sub.2d.sub.3)), where |i.sub.1custom character.Math.|i.sub.2custom character.Math.|i.sub.3custom character represents the tensor product of index registers in the canonical basis.

(80) The corresponding density matrix of the quantum state reads

(81) ρ = .Math. i 1 i 2 i 3 .Math. i 1 i 2 i 3 χ ^ i 1 i 2 i 3 .Math. i 1 .Math. .Math. .Math. i 2 .Math. .Math. .Math. i 3 .Math. .Math. i 1 .Math. .Math. .Math. i 2 .Math. .Math. .Math. i 3 .Math. χ ^ i 1 i 2 i 3 .

(82) After preparation, a partial trace implemented on the first index register of the density matrix

(83) tr 1 ( ρ ) = .Math. i 2 i 3 .Math. i 2 i 3 .Math. i 1 χ ^ i 1 i 2 i 3 .Math. i 2 .Math. .Math. .Math. i 3 .Math. .Math. i 2 .Math. .Math. .Math. i 3 .Math. χ ^ i 1 i 2 i 3 = .Math. i 2 i 3 i 2 i 3 .Math. i 1 χ ^ i 1 i 2 i 3 χ ^ i 1 i 2 i 3 .Math. i 2 i 3 .Math. .Math. i 2 i 3 .Math.
gives the desired operator ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)}.

(84) Suppose that {circumflex over (χ)} has a tensor SVD approximation with

(85) χ ^ .Math. i = 1 R σ i u i ( i ) .Math. u 2 ( i ) .Math. u 3 ( i ) .
Then the spectral decomposition of the density operator can be written as

(86) ρ χ ^ χ ^ = 1 .Math. i = 1 R σ i 2 .Math. i = 1 R σ i 2 .Math. u 2 ( i ) .Math. .Math. .Math. u 3 ( i ) .Math. .Math. u 2 ( i ) .Math. .Math. .Math. u 3 ( i ) .Math. .
Especially, the eigenstates |u.sub.2.sup.(i)custom character.Math.|u.sub.3.sup.(i)custom character of ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)} form another set of basis in the Hilbert space of the tensor product of quantum index registers.

(87) Then we need to readout singular values of ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)} and write into another quantum register, preferably via the density matrix exponentiation method proposed in “Lloyed et al.”. This operation is also referred to as quantum principal component analysis (qPCA).

(88) In order to write the singular values into a quantum register, in a step S30 the unitary operator

(89) U := .Math. k = 0 K - 1 .Math. k Δ t .Math. .Math. k Δ t .Math. C .Math. exp ( - ik Δ t ρ ~ χ ^ χ ^ )
is prepared which is the tensor product of a maximally mixed state

(90) 0 .Math. k = 0 K - 1 .Math. k Δ t .Math. .Math. k Δ t .Math. C
with the exponentiation of the rescaled density matrix

(91) ρ ~ χ ^ χ ^ := ρ ~ χ ^ χ ^ d 2 d 3 .
Especially, the clock register C is needed for the phase estimation and Δt determines the precision of estimated singular values.

(92) Recall that the query is (s, p, ?) on the knowledge graph, and that the present method should return triples with subject s. Hence, in a step S40, the quantum state |{circumflex over (χ)}.sub.s.sup.(1)custom character.sub.I is created (or: generated) via qRAM in an input data register I, where {circumflex over (χ)}.sub.s.sup.(1) denotes the s-row of the flattened tensor {circumflex over (χ)} along the first dimension.

(93) After preparing S40 the quantum state |{circumflex over (χ)}.sub.s.sup.(1)custom character.sub.I, in a step S50 the prepared unitary operator U is applied onto

(94) .Math. k = 0 K - 1 .Math. k Δ t .Math. C .Math. .Math. χ ^ s ( 1 ) .Math. I .

(95) Implementing the unitary operator U is nontrivial since the exponent {tilde over (ρ)}.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)} in the operator U could be a dense matrix, and exponentiating a dense matrix can be very involved. Therefore, one can use the dense matrix exponentiation method recently proposed in “Rebentrost et al.”. Especially, one can show that the unitary operator

(96) e - it ρ ~ χ ^ χ ^
can be applied to any quantum state up to an arbitrary simulation time t. The total number of steps for simulation is

(97) 𝒪 ( t 2 ϵ T ρ ~ ) , ( 7 )
where ∈ is the desired accuracy, and T.sub.{tilde over (ρ)} is the time for accessing the density matrix {tilde over (ρ)}. Hence the unitary operator U can be applied to any quantum state given simulation time t in

(98) 𝒪 ( t 2 ϵ T ρ ~ )
steps on quantum computers.

(99) After applying the unitary operator U onto

(100) .Math. k = 0 K - 1 .Math. k Δ t .Math. C .Math. .Math. χ ^ s ( 1 ) .Math. I
we have the following quantum state

(101) .Math. i = 1 R β i ( .Math. k = 0 K - 1 e - ik Δ t σ ~ i 2 .Math. k Δ t .Math. C ) .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I ( 8 )
where

(102) σ ~ i : = σ i d 2 d 3
are the rescaled singular values of {tilde over (ρ)}.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)} (see Eq. 4). Moreover, β.sub.i are the coefficients of |{circumflex over (χ)}.sub.s.sup.(1)custom character.sub.I decomposed in the eigenbasis |u.sub.2.sup.(i)custom character.sub.I.Math.|u.sub.3.sup.(i)custom character.sub.I of ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)}, namely |{circumflex over (χ)}.sub.s.sup.(1)custom character.sub.I=Σ.sub.i=1.sup.Rβ.sub.i|u.sub.2.sup.(i)custom character.sub.I.Math.|u.sub.3.sup.(i)custom character.sub.I.

(103) In a step S60, a quantum phase estimation on the clock register C is performed, preferably using the quantum phase estimation algorithm proposed in “Kitaev”. The resulting state after phase estimation reads Σ.sub.i=1.sup.Rβ.sub.i|λ.sub.icustom character.sub.C.Math.|u.sub.2.sup.(i)custom character.sub.I.Math.|u.sub.3.sup.(i)custom character.sub.I, where

(104) λ i := 2 π σ ~ i 2 .
In fact, it can be shown that the probability amplitude of measuring the register C is maximized when

(105) 0 k Δ t = .Math. 2 π σ ~ i 2 .Math. ,
where └⋅┐ represents the nearest integer. Therefore, the small time step Δt determines the accuracy of quantum phase estimation. We may choose

(106) Δ t = 𝒪 ( 1 ϵ ) ,
and the total run time is

(107) 𝒪 ( 1 ϵ 3 T ρ ~ ) = 𝒪 ( 1 ϵ 3 polylog ( d 1 d 2 d 3 ) )
according to Eq. (6) and Eq. (7).

(108) In a step S70, a computation on the clock register C is performed to recover the original singular values of ρ.sub.{circumflex over (χ)}.sub..sub.{circumflex over (χ)}, and obtain

(109) .Math. i = 1 R β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .

(110) For example, in this step S70 the λ.sub.i stored in the clock register C may be transferred to σ.sub.i.sup.2, λ.sub.i being a function of σ.sub.i.sup.2. The threshold operations discussed in the following as applied to the σ.sub.i.sup.2 may therefore also be, in an alternative formulation, be applied to the λ.sub.i, with the threshold τ being appropriately rescaled.

(111) In a step S90, a quantum singular value projection on the quantum state obtained from the last step S70 is performed. Notice that, classically, this step corresponds to projecting {circumflex over (χ)} onto the subspace {circumflex over (χ)}.sub.|⋅|≥τ. In this way, observed entries will be smoothed and unobserved entries get boosted from which we can infer unobserved triples (s, p, ?) in the test dataset (see Theorem 2).

(112) Quantum singular value projection given the threshold τ>0 can be implemented in the following way. Therefore, in a step S80, a new auxiliary register R is created on the quantum computing device 200 using an auxiliary qubit and

(113) a unitary operation that maps |σ.sub.i.sup.2custom character.sub.C.Math.|0custom character.sub.R to |σ.sub.i.sup.2custom character.sub.C.Math.|1custom character.sub.R only if σ.sub.i.sup.2<τ.sup.2, otherwise |0custom character.sup.R remains unchanged. This step of projection gives the state

(114) .Math. i : σ i 2 τ 2 β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .Math. .Math. 0 .Math. R + .Math. i : σ i 2 < τ 2 β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .Math. .Math. 0 .Math. R . ( 9 )

(115) In other words, the step S90 means performing, on the result of the computation S70 to recover the singular values, a singular value projection conditioned on the state of the auxiliary register, R, such that eigenstates whose squared singular values are to one side of (here: smaller than) the squared cutoff threshold, τ.sup.2, are entangled with a first eigenstate |1custom character.sub.R of the auxiliary register, R, and such that eigenstates whose squared values are to another side of (here: larger than) the squared cutoff threshold, τ.sup.2, or equal to the squared cutoff threshold, τ.sup.2, are entangled with the second eigenstate |0custom character.sub.R of the auxiliary register, R.

(116) One of the major advantages here is that not the individual singular values are used in any decisions but only their squares. This means that possible negative singular values which may occur in the case of tensors (unlike in the case of matrices) do not have any negative impact on the present method.

(117) In a step S100, the new register R is measured and post-selected on the state |0custom character.sub.R. This gives the projected state

(118) .Math. i : σ i 2 τ 2 β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I

(119) In a step S110, the clock register C is traced out such that the following equation is obtained:

(120) .Math. i : σ i 2 τ 2 β i .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .

(121) The tracing out may be performed e.g. as has been described in “Nielsen et al.”.

(122) In a step S120, the resulting quantum state from the last step S110 is measured in the canonical basis of the input register I to get the triples with subject s.

(123) In a step S130, they are post-selected on the predicate p. This will return objects to the inference (s, p, ?) after

(124) 𝒪 ( 1 ϵ 3 polylog ( d 1 d 2 d 3 ) )
steps.

(125) The quantum algorithm is summarized also in the following table Algorithm 1.

(126) TABLE-US-00001 Algorithm 1 Quantum Tensor SVD on KGs Input: Inference task (s, p, ?) Output: Possible objects to the inference task Require: Quantum access to {circumflex over (χ)} stored in a classical memory structure; thresh- old τ for the singular value projection 1: Create custom character  via qRAM 2: Create state |{circumflex over (χ)}.sub.s.sup.(1) custom character .sub.I on the input data register I via qRAM 3: Prepare unitary operator U and apply on |{circumflex over (χ)}.sub.s.sup.(1) custom character .sub.I, where U := .Math. k = 0 K - 1 .Math. k Δ t k Δ t .Math. C exp ( - ik Δ t ρ ~ χ ^ χ ^ ) 4: Quantum phase estimation on the clock register C to obtain Σ.sub.i=1.sup.R β.sub.i|λ.sub.i custom character .sub.C .Math. |u.sub.2.sup.(i) custom character .sub.I .Math. |u.sub.3.sup.(i) custom character .sub.I 5: Controlled computation on the clock register C to obtain Σ.sub.i=1.sup.R β.sub.i|σ.sub.i.sup.2 custom character .sub.C .Math. |u.sub.2.sup.(i) custom character .sub.I .Math. |u.sub.3.sup.(i) custom character .sub.I 6: Singular value projection given the threshold τ to obtain .Math. i : σ i 2 τ 2 β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .Math. .Math. 0 .Math. R + .Math. i : σ i 2 < τ 2 β i .Math. σ i 2 .Math. C .Math. .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I .Math. .Math. 1 .Math. R 7: Measure on the register R and post-select the state |0 custom character .sub.R 8: Trace out the clock register C 9: 0 Measure the resulting state .Math. i : .Math. σ i .Math. τ β i .Math. u 2 ( i ) .Math. I .Math. .Math. u 3 ( i ) .Math. I in the canonical basis of the the input register I 10: Post-select on the predicate p from the sampled triples (s, ., .)

(127) One of the main advantages of the present invention is that a method for implementing implicit knowledge inference from tensorized data, e.g., relational databases such as knowledge graphs, on quantum computing devices is proposed.

(128) The present method shows that knowledge inference from tensorized data can be implemented with exponential acceleration on quantum computing devices. Compared to classical systems, this is, as has been shown, much faster and thus less resource-consuming than classical methods.

(129) We also test the classical part of our method, namely the tensor singular value decomposition, on a classical devices since due to technical challenges current quantum devices only have a few universal physical qubits. The simulation results show comparable results to other benchmarking algorithms, which ensures the performance of implementing the quantum τSVD on future quantum computers.

(130) The acceleration is given by the intrinsic parallel computing of quantum computing devices as described in the foregoing which, however, is only made applicable by the specific technical implementation of the present invention.

(131) In some sense, the present method is based on finding the corresponding quantum counterpart of classical tensor singular value decomposition method. To show that tensor singular value decomposition has comparable performance with other classical algorithms, the present method is verified by investigating the performance of classical tensor SVD on benchmark datasets: Kinship and FB15k-237, see e.g. the scientific publication by Kristina Toutanova and Danqi Chen, “Observed versus latent features for knowledge base and text inference.”, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57-66, 2015.

(132) Given a semantic triple (s, p, o), the value function of τSVD is defined as

(133) η spo = .Math. i = 1 R σ i u s ( i ) u p ( i ) u o ( i ) ,
where u.sub.s, u.sub.p, u.sub.o are vector representations of s, p, o, respectively. The τSVD is trained by minimizing the objective function

(134) := 1 .Math. 𝒟 train .Math. .Math. ( s , p , o ) 𝒟 train ( y spo - η spo ) 2 + γ ( .Math. U s U s - 𝕀 R .Math. F + .Math. U p U p - 𝕀 R .Math. F + .Math. U o U o - 𝕀 R .Math. F )
via stochastic gradient descent. The hyper-parameter γ is used to encourage the orthonormality of embedding matrices for subjects, predicates, and objects.
in the following Table 1, the performance of tensor SVD model with other benchmark models,
e.g., RESCAL (proposed in Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel, “A three-way
model for collective learning on multi-relational data”, in: ICML, volume 11, pages 809-816, 2011), Tucker (L. R. Tucker, “Some mathematical notes on three-mode factor analysis”, Psychometrika, September 1966, Vol. 31, Issue 3, pp. 279-311), and ComplEx (Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard, “Complex embeddings for simple link prediction”, in: International Conference on Machine Learning, pages 2071-2080, 2016) are shown.

(135) FIG. 3 and FIG. 4 show the training curves of the TSVD on FB1K-237. It shows that τSVD performs reasonably well for small rank, hence we can estimate the projection threshold τ according to the Theorem 2.

(136) TABLE-US-00002 TABLE 1 Mean Rank, Hits@3, Hits@10 scores of various models compared on the Kinship and FB15k-237 datasets. KINSHIP FB15K-237 Methods MR @3 @10 MR @3 @10 RESCAL 3.2 88.8 95.5 291.3 20.7 35.1 TUCKER 2.9 89.8 95.0 276.1 20.9 35.7 COMPLEX 2.2 90.0 97.7 242.7 25.2 39.7 TSVD 2.7 84.8 96.6 365.5 19.4 35.8

(137) FIG. 3 shows the mean rank over epochs for rank values R=32, R=128; R=258 and R=512.

(138) FIG. 4 shows the Hits@10 scores over epochs for rank values R=32, R=128; R=258 and R=512.

(139) Noisy intermediate-scale quantum processing units (or: quantum computing devices) are expected to be commercially available in the near future. With the help of these quantum computing devices and the present method, learning and inference on the ever-increasing industrial knowledge graphs can be dramatically accelerated compared to conventional computers.

(140) In short, the invention provides a computer-implemented method of performing an inference task on a knowledge graph comprising semantic triples of entities, wherein entity types are subject, object and predicate, and wherein each semantic triple comprises one of each entity type, using a quantum computing device, wherein a first entity of a first type and a second entity of a second type are given and the inference task is to infer a third entity of the third type.

(141) By performing specific steps and choosing values according to specific prescriptions, an efficient and resource-saving method is developed that utilizes the power of quantum computing systems for inference tasks on large knowledge graphs. In particular, an advantageous value for a cutoff threshold for a cutoff based on singular values of a singular value tensor decomposition is prescribed, and a sequence of steps is developed in which only the squares of the singular values are of consequence and their signs are not.