Method for verifying the identity of a speaker, system therefore and computer readable medium
09792912 · 2017-10-17
Assignee
Inventors
Cpc classification
G10L17/24
PHYSICS
G10L17/26
PHYSICS
International classification
G10L17/02
PHYSICS
G10L17/24
PHYSICS
Abstract
The invention refers to a method of verifying the identity of a speaker based on the speakers voice comprising the steps of: receiving (1, 5) a first and a second voice utterance; using biometric voice data to verify (2, 6) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second voice utterance and determine (8) the similarity of the two received voice utterances characterized in that the similarity is determined using biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics. The invention further refers to a System (80) for verifying the identity of a speaker based on the speakers voice comprising: a component (81) for receiving a first and a second voice utterance; a component (82) for using biometric voice data to verify that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second voice utterance and a component (83) for comparing the two received voice utterances in order to determine the similarity of the two voice utterances characterized in that the similarity is determined using biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics.
Claims
1. A method of verifying, using analysis of a speaker's voice, a specific identity of a speaker, the method comprising the steps of: a) receiving, at a first software component of an electronic access control system, a first voice utterance and a second voice utterance; b) verifying, using biometric voice data extracted from the first and second received voice utterances and using a second software component of the electronic access control system, that the speaker's voice corresponds to a specific identity; and c) determining, using a third software component of the electronic access control system, an indicia of similarity of the two received voice utterances, wherein the indicia of similarity is determined using analysis of the extracted biometric voice data of the two received voice utterances; wherein the extracted biometric voice data used for determining the indicia of similarity of the two received voice utterances comprise or are based on a first set of at least n values, wherein the first set of at least n values is determined from a time slice of one of the received voice utterances, the time slice having a length of between 10 and 40 ms, wherein n is a number between 2 and 40; wherein if the speaker's voice has been verified, using biometric voice data, to correspond to the specific identity, the second received voice utterance is requested from the speaker; wherein the speaker is requested to repeat the first voice utterance in order to receive the second voice utterance; and, wherein data derived for the first voice utterance is determined at least 50 times for the first voice utterance and data derived for the second voice utterance is determined at least 50 times for the second voice utterance.
2. The method of claim 1, wherein a set of biometric voice data is extracted from the two received voice utterances and the extracted set of data is used as biometric voice data for verifying that the speaker's voice corresponds to the specific identity and as biometric voice characteristics for determining an indicia of similarity of the two received voice utterances or for deriving data for determining the indicia of similarity of the two received voice utterances.
Description
(1) Preferred embodiments of the invention are disclosed in the figures. The preferred embodiments are not to be understood as exposing a limitation of the invention and rather are provided in order to explain a particular useful way of carrying out the invention.
(2) It is shown in:
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11) In item 7 it is decided whether the identity of the step 6 is considered to be verified or not. If not, the speaker is rejected in item 4, otherwise it is proceeded to the step 8. In this step the similarity between the first and the second voice utterance is determined. If the voice utterance is found to be suspiciously similar, it is proceeded to rejection step 4 otherwise it is proceeded to acceptance.
(12) The determination of the similarity between the first and the second voice utterance can also be performed directly after having received the second voice utterance in step 5. The speaker verification of item 6 may then only be preformed in case that the similarity is found not to be suspiciously similar. Also the speaker verification of step 6 and the determination of similarity in step 8 may be processed in parallel. The results of each of the decisions of items 7 and 9 may be combined in order to decide whether or not the speaker is to be rejected or accepted or other further steps before deciding about acceptance or rejection are carried out.
(13) Further, instead of the acceptance in item 10 other tests may be carried out in order to check for fraud before accepting a speaker, such as a liveliness test (see PCT/EP2008/010478, FIGS. 4 and 5).
(14) In
(15) For each time slice, biometric data or biometric characteristics may be calculated. For example, for each time slice the signal portion 15 may be Fourier transformed and the envelope thereof may be determined from which characteristic biometric data may be obtained as shown in
(16) In
(17) For each voice utterance more than a 1,000 or more than 10,000 time slices may be evaluated in giving more than 1,000 or more than 10,000 data points in
(18) A temporal evolution of such a characteristic Cy between two different voice utterances may be compared.
(19) In
(20) In
(21) Line 23, in
(22) With the dynamic time warping as shown in
(23) In
(24) A set of values which represent the biometric voice characteristics used for determining the similarity of two voice utterances may be compared to a statistical voice model. This is schematically shown in
(25) A specific value v of the characteristic Cn is present according to the different Gaussians G1, G2 and G3 with a different probability W. This probability W, according to each of the Gaussians G1, G2, G3 leads to the value m and is shown in vector 24 in
(26) Specifically, the probability W(v) that the characteristic Cn has the value v is calculated for expressing the coincidence of a biometric voice characteristic with a statistical voice model. This is an example of deriving a set of values from a set of values of biometric voice characteristics. The derived set of values may have, for example the same number (l) of values as there are numbers (l) of components of the statistical voice model.
(27) Such a derived set 24 may be derived for multiple time slices T. Hence, the temporal evolution of each of the values of m can be calculated similar to the one explained in
(28) The temporal evolution of any of the values can be used to determine correlations between two voice utterances. Hereby, also dynamic time warping may be performed.
(29) In
(30) If the data shown in
(31) The data sets 40 and 41 shown in
(32) Also, dynamic time warping for the data sets 40, 41 may be carried out in order to compare the two data sets.
(33) In
(34) In case that the passive test for falsification considers the voice utterance to be falsified in item 64, in item 65 a second voice utterance is requested which is received in item 66. Here, the speaker verification of the second received voice utterance is performed in item 67 and evaluated in item 68. If the identity of the speaker can not be verified, the speaker is rejected in item 69. If the identity can be verified, it is proceeded to calculate the determination of an exact match in item 70. The determination of the exact match according to the present method is done by calculating the similarity of the two received voice utterances using biometric voice characteristics. If this test indicates a falsification in item 71 the speaker is rejected in item 72, otherwise it is accepted in item 73.
(35) The herein described determination of the similarity of the two received voice utterances can be carried out as a determination of an exact match in each of the cases mentioned in the above-mentioned PCT application PCT/EP2008/010478. Disclosure of this application is therefore fully included in the present application by reference, each of the methods mentioned in PCT/EP2008/010478 which mentions an exact match is considered to be included and disclosed herein by reference.
(36)
(37) Furthermore, a component 82 is shown for using biometric voice data to verify that the speaker's voice corresponds to the speaker the identity of which is to be verified based on the received first and second voice utterance.
(38) Furthermore, a component 83 for comparing the two received voice utterances in order to determine the similarity of the two voice utterances is shown. This component 83 uses biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics in order to determine the similarity of the two voice utterances. The result of the verification of the identity is output by means 85.