Privacy-preserving, mutual PUF-based authentication protocol

Abstract

An authentication protocol using a Hardware-Embedded Delay PUF (HELP), which derives randomness from within-die path delay variations that occur along the paths within a hardware implementation of a cryptographic primitive, for example, the Advanced Encryption Standard (AES) algorithm or Secure Hash Algorithm 3 (SHA-3). The digitized timing values which represent the path delays are stored in a database on a secure server (verifier) as an alternative to storing PUF response bitstrings thereby enabling the development of an efficient authentication protocol that provides both privacy and mutual authentication.

Claims

1. A Physically Unclonable Function (PUF) method for authenticating a token by a server to prevent cloning and unauthorized use of Integrated Circuits, the method providing both privacy and mutual identification between the server and the token, the method comprising the steps of: measuring, by the PUF, natural variations that occur in one or more path delays of the PUF; digitizing the measured one or more path delays; storing in a database of the server, the digitized measured one or more path delays; generating a plurality of bitstrings from the digitized measured one or more path delays; comparing the bitstrings of the plurality to bitstrings of the token; and authenticating the token when the comparing step results in one or more matches.

2. The method according to claim 1, wherein the plurality of bitstrings is generated on-the-fly.

3. The method according to claim 1 further comprising the step of generating by the token both a token helper data bitstring and a token bitstring.

4. The method according to claim 3 further comprising the step of generating by the server both a server helper data bitstring and a server bitstring.

5. The method according to claim 4 further comprising the steps: modifying the token bitstring by eliminating one or more bits from the token bitstring; and modifying the server bitstring by eliminating one or more bits from the server bitstring.

6. The method according to claim 5, wherein the modified server bitstring is compared to the modified token bitstring to authenticate the token.

7. The method according to claim 5, further comprising the step of using the server helper data bitstring to modify the server bitstring and using the token helper data bitstring to modify the token bitstring.

8. The method according to claim 5 further comprising the steps of: performing an operation to bitwise AND the token helper data bitstring from the token to obtain a AND'ed token helper data bitstring; and performing an operation to bitwise AND the server helper data bitstring to obtain a AND'ed server helper data bitstring.

9. The method according to claim 8, wherein the one or more bits eliminated from the token bitstring are bits that correspond to bits in the AND'ed token helper data bitstring that are logic 0 because of the bitwise AND operation, and the one or more bits eliminated from the server bitstring are bits that correspond to bits in the AND'ed server helper data bitstring that are logic 0 because of the bitwise AND operation.

10. The method according to claim 1, wherein the PUF is provided in a hardware implementation of a cryptographic primitive.

11. The method according to claim 10, wherein the cryptographic primitive is an Advanced Encryption Standard (AES) algorithm.

12. The method according to claim 10, wherein the cryptographic primitive is a Secure Hash Algorithm 3 (SHA-3).

Description

DESCRIPTION OF THE DRAWING

(1) The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an implementation of the invention and, together with the description, serve to explain the advantages and principles of the invention:

(2) FIG. 1 is a block diagram of a functional unit (FU) including clock strobing method for measuring path delays according to an embodiment of the invention.

(3) FIG. 2 is a graph illustrating random pairings of TV compensated rising and falling PUFNum Differences (PNDc) and PNDc with an applied Modulus (modPNDc) according to an embodiment of the invention.

(4) FIG. 3 is a graph illustrating temperature-voltage compensation (TVCOMP) of PNDc and temperature-voltage (TV) corners according to an embodiment of the invention.

(5) FIG. 4 illustrates a block diagram of a Margin and Dual Helper Data (DHD) algorithm according to an embodiment of the invention.

(6) FIG. 5A illustrates a flow diagram of an enrollment operation of an authentication protocol according to an embodiment of the invention.

(7) FIG. 5B illustrates a flow diagram of authentication operation of an authentication protocol according to an embodiment of the invention.

(8) FIG. 6A illustrates a graph of actual inter-chip hammering distance results using a Mean scaling factor according to an embodiment of the invention.

(9) FIG. 6B illustrates a graph of actual inter-chip hammering distance results using a Max. scaling factor according to an embodiment of the invention.

(10) FIG. 7A illustrates a graph of National Institute of Standards and Technology (NIST) statistical test results for a Margin of 2 using a Mean scaling factor according to an embodiment of the invention.

(11) FIG. 7B illustrates a graph of NIST statistical test results for a Margin of 3 using a Mean scaling factor according to an embodiment of the invention.

(12) FIG. 8A illustrates a graph of true inter-chip hammering distance results using a Mean scaling factor according to an embodiment of the invention.

(13) FIG. 8B illustrates a graph of true inter-chip hammering distance results using a Max. scaling factor according to an embodiment of the invention.

(14) FIG. 9A illustrates a graph of entropy results using a Mean scaling factor according to an embodiment of the invention.

(15) FIG. 9B illustrates a graph of entropy results using a Max. scaling factor according to an embodiment of the invention.

(16) FIG. 10A illustrates a graph of probability of failure results using a Mean scaling factor according to an embodiment of the invention.

(17) FIG. 10B illustrates a graph of probability of failure results using a Max. scaling factor according to an embodiment of the invention.

(18) FIG. 11A illustrates a graph of smallest bitstring size results using a Mean scaling factor according to an embodiment of the invention.

(19) FIG. 11B illustrates a graph of smallest bitstring size using a Max. scaling factor according to an embodiment of the invention.

(20) FIG. 12 illustrates a table of HELP authentication protocol area and runtime overhead.

DETAILED DESCRIPTION OF THE INVENTION

(21) The Hardware embedded Delay PUF (HELP) is a strong PUF that leverages path delay variations in a functional unit as described in the following patent applications incorporated by reference: International Patent Application PCT/US14/053276 filed Aug. 28, 2014, and International Patent Application PCT/US15/065909 filed Dec. 15, 2015. In particular, HELP generates bitstrings from delay variations that exist within existing functional units and provides a large number of CRPs.

(22) The source of entropy (randomness) for HELP is the manufacturing variations that occur in the delays of paths that define the functional unit. HELP measures path delays using a clock strobing technique as illustrated in FIG. 1. The source of entropy is represented by the functional unit, which is an existing on-chip macro that implements, e.g., components of the authentication protocol, i.e., an integer divider or a cryptographic hash function.

(23) As indicated above, a challenge for HELP consists of a 2-vector sequence and a Path-Select-Mask. The Launch Row FFs and Capture Row FFs are also components of the functional unit. The only modification required for the integration of HELP into the functional unit involves the use of a second clock, labeled Clk.sub.2, which drives the Capture Row FFs, and the addition of the XOR gates on the primary outputs PO[x].

(24) The Launch Row FFs in FIG. 1 are used to apply the 2-vector sequences to the primary inputs PI[x] of the functional unit, while the Capture Row FFs are used to measure the path delays at the primary ouputs PO[x]. The path delays are measured by applying a series of launch-capture clocking events (called clock strobing) using Clk.sub.1 and Clk.sub.2 as shown on the left side of FIG. 1. The first vector of the sequence represents the initialization vector. The application of the second vector generates a set of transitions which are timed by the clock strobing technique. The clock strobing technique requires the repeated application of the 2-vector sequence. For each repeated application of this 2-vector test sequence, the phase shift between Clk.sub.1 and Clk.sub.2 is increased by a small fixed t.

(25) The phase shift value between the two clocks is digitally controlled, and is referred to as the launch-capture interval (LCI). The smallest LCI that allows the propagating edge along a path starting from a Launch FF to be captured in a capture FFoccurs when an XOR gate on the output becomes 0is used as the digitized timing value for the path. The digital timing values for a large number of paths can be obtained by repeating the clock strobing operation for multiple 2-vector test sequences. In the following description, the LCI path timing value is referred as a PUFNum or PN.

(26) The signed difference of two randomly selected PNs is referred to as a PNDiff or PND. HELP constructs PND by pairing each of the rising PNs with a falling PN using two linear-feedback shift registers (LFSR). The LFSRs are initialized with a pair of configuration parameters referred to as LFSR seeds.

(27) The authentication protocol according to the invention requires HELP to generate nonces in addition to the PNs. The VHDL module responsible for implementing the PN timing engine generates nonces in parallel with PN generation by leveraging the meta-stability characteristics that exist in a subset of the tested paths. Meta-stability is determined for a path by repeatedly measuring it and then analyzing the variations in the fractional component of the computed average. Those paths that produce two consecutive PN values nearly of equal frequencies are used as a source of true random numbers (TRNG). It should be noted that the random statistical properties associated with the nonces generated in this fashion pass all of the National Institute of Standards and Technology (NIST) statistical test statistical tests.

(28) It should be noted that the ability to dynamically control the fine phase shift of a Clk signal is a common feature of on-chip digital clock managers (DCMs) in FPGAs. For example, Xilinx includes this phase shift capability even on their lowest cost FPGAs. For low-cost components that do not include a DCM, this phase shift capability can be implemented with a small area overhead using a multi-tapped delay chain.

(29) The reliability of a PUF refers to the number of bit flip errors that occur when the bitstring is regenerated. Ideally, the bitstrings are precisely reproduced during regeneration but this is rarely possible with PUFs. The largest source of noise that causes bit flip errors for PUFs is a change in temperature and/or supply voltage (TV noise). Although sample-averaging of path delays is effective at reducing measurement noise, this strategy is not effective for TV noise, and instead a TV compensation (TVCOMP) method is required. The TVCOMP process is described by Equations (1) and (2):

(30) $\begin{matrix} {zval}_{i} = \frac{({PND}_{i} -_{token})}{{Rng}_{token}} & Equation (1) \\ PNDc = {zval}_{i} {Rng}_{ref} +_{ref} & Equation (2) \end{matrix}$

(31) Here, zval.sub.i represents a standardized PND after subtracting a mean .sub.token and dividing by a range Rng.sub.token, with .sub.token and Rng.sub.token derived from the distribution of all PND obtained during regeneration under potentially adverse environmental conditions, referred to as TV corners. The individual zval are then transformed to a set of PNDc (with c for compensated) using two additional configuration parameters, .sub.ref and Rng.sub.ref (ref is for reference). This linear transformation is very effective at reducing TV noise. The noise from environmental variations that remain in the PNDc is called uncompensated TV noise or UC-TVNoise.

(32) The bitstring generation process uses the signed PNDc as a means of both hardening the algorithm against model building and increasing the diversity in the PUF responses. A mod-PNDc is defined by applying a Modulus to the PNDc. According to one embodiment of the invention, the Modulus is a fifth configuration parameter to the HELP algorithm (adding to the .sub.ref, Rng.sub.ref and LFSR seed parameters).

(33) The modulus is necessary because the paths in the functional unit vary in length and this path length bias is captured in the PNDc. The modulus reduces the bias while fully preserving the within-die delay variations, i.e., the most important source of randomness.

(34) FIG. 2 shows a sample set of PNDc (18) computed from pseudo-random pairings of PN measured from chip C.sub.1. Each PNDc is measured 16 times under different TV conditions. One curve line connects the data points obtained under enrollment conditions (25 C., 1.00V) while the remaining curve lines connect data points under a set of regeneration TV corners, for example, all combinations of temperatures 40 C., 0 C., 25 C., 85 C., 100 C. with supply voltages 0.95V, 1.00V and 1.05V. The top of FIG. 2 illustrates the modPNDc values after a Modulus of 20 is applied. The modPNDc is used in the HELP bitstring generation process described below.

(35) According to the invention, an offset technique may be used to further reduce bias effects, particularly when the Modulus is greater than the magnitude of the within-die variations. FIG. 3 provides a graph of a PNDc obtained from a set of 45 chips to illustrate the concept. The line connected points in each curve are generated by the same chip and represent the value of the PNDc measured in the 16 TV corner experiments after they has been TVCOMP'ed. The UC-TVNoise referred to earlier that remains after TVCOMP is annotated on the bottom-most curve. In contrast, within-die variations (WID) are represented by the vertical extension of the individual curves, which is also annotated in FIG. 3. The magnitude of WID for this PNDc is approx. 11 LCIs. If a Modulus of 20 is used, then the position of this group of curvesshown between 131 and 120represents a worst case scenario because the bit generated in the bitstrings (discussed below) would be the same for nearly all chips. The bias that creates this problem can be eliminated by adding a constant of 6 to the points in the all curves as shown on the right side of FIG. 3). This centers the PNDc distribution over 120 and maximizes the entropy contained in this PNDc by making the number of chips which produce a 1 in the generated bitstrings nearly equal to the number that produce a 0. The appropriate offset is computed by the verifier using the stored enrollment data and is encoded in the set of Path-Select-Mask sent to the token.

(36) A Margin technique is used to improve reliability. The Margin technique identifies modPNDc that have the highest probability of introducing bit flip errors. The modPNDc data shown along the top of FIG. 2 is replicated and enlarged as shown by (a) in FIG. 4. The region defined by the Modulus is split into two halves, with the lower half used as the 0 region (between 0 and 9 in (a) of FIG. 4) and the upper half as the 1 region.

(37) Without Margining, bit flips would occur at modPNDc indexes 4, 6, 7, 8, 10 and 14 because some of the values in the groups of PNDc data points from the 16 TV corner experiments cross over the 0-1 lines at 9-10 and 19-0. The Margin technique avoids these bit flip errors by creating weak and strong classes for the bits associated with the modPNDc. The bit associated with a modPNDc is classified as weak if the modPNDc falls within a margin around the 0-1 boundaries, and is classified as a strong bit otherwise. The margin is set ideally to the worst case UC-TVNoise level for the best results, but can be tuned to attain a specific probability of failure in the authentication protocol discussed further below.

(38) A Dual Helper Data (DHD) algorithm is proposed as a means of further reducing bit flip errors. The helper data (HelpD) and response bitstrings (RespBS) for the hardware token are shown by (b) in FIG. 4, while (c) in FIG. 4 shows HelpD and RespBS for the verifier. The values are derived using the token and verifier highlighted data points from the modPNDc shown in (a) in FIG. 4.

(39) Authentication in the field makes use of data stored earlier during enrollment in the Verifier Database. The following operations are carried out to generate the Token and Verifier StrongBS. First, the token generates helper data (Token HelpD) using the Margining technique to produce the Token StrongBS, which are both transmitted to the verifier. For each token stored in the Verifier Database, the verifier computes helper data (Verifier HelpD), and then bitwise AND's it with the received Token HelpD. The verifier constructs the Verifier StrongBS using the AND'ed HelpD while simultaneously eliminating strong bits from the Token's StrongBS that correspond to Token HelpD bits that were changed from 1 to 0 during the AND operation (3 bits are eliminated in this example as shown along the bottom of (c) in FIG. 4). Lastly, the two StrongBS are compared. A successful authentication requires either an exact match between the Token and verifier StrongSB, or a fuzzy match where a match is successful if most, but not all, of the bits match. The AND'ing of the token and verifier's HelpD bitstrings allows the margin to be reduced to approx. one-half of that required if the individual HelpD bitstrings were used by themselves. This is true because a bit flip error can only occur if UC-TVNoise causes a modPNDc to move across both margins, and into the opposite strong bit region, as shown by the caption and illustration in (a) of FIG. 4. If the modPNDc moves but remains in either the 1 or 0 weak bit regions, then the AND operation eliminates it. The smaller margins used with the DHD scheme allow the Modulus to be reduced, which in turn, allows better access to within-die variations.

(40) A privacy-preserving, mutual authentication protocol is now discussed in detail. Path delay information, the PNs, is stored on the verifier instead of response bitstrings. As an example, the PNs can each be represented as a 15-bit values (which provides a range of +/1024 with 4 bits of fixed-point precision).

(41) The protocol employs several parameters, including a Modulus (also referred to as Mod), a .sub.ref and Rng.sub.ref from Equations (1) and (2), a pair of LFSR Seeds (S), a Margin and a Path-Select-Mask, to allow multiple response bitstrings to be generated from a fixed set of PNs. The verifier specifies a set of paths in the Path-Select-Mask and encodes offsets in the unused bits to improve entropy as above.

(42) A challenge is defined as a 2-vector sequence plus a Path-Select-Mask. A one-time interface (implemented on the FPGA as a special programming bitstring) is used during enrollment to allow the token to transfer PNs to the verifier. The protocol separates token identification (ID Phase) from authentication (Authen Phase) to support the privacy preserving component. The protocol does not require any cryptographic primitives nor non-volatile memory (NVM) on the token.

(43) The enrollment operation is graphically illustrated in FIG. 5A. Prior to manufacture, automatic test pattern generation (ATPG) is used to select a set of test vector sequences, {ck}, used as a common set of challenges for all tokens in the ID Phase. The number of vectors depends on the security requirements regarding privacy.

(44) The common challenges are transmitted to the token in a secure environment during enrollment and applied to the functional unit's PIs. The token generated PN are transmitted to the verifier, annotated as {PN.sub.j} in FIG. 5(a). The verifier generates an internal identifier ID.sub.i for each token using VerifierGenID( ) and stores the set {PN.sub.j} under ID.sub.i in the secure database.

(45) A similar process is carried out during the Authen Phase of enrollment except that a distinct set of ATPG-generated challenges are selected using SelectATPG(ID.sub.i) for each token. The number of hazard-free testable paths in typical functional units can be very large, making it possible to create minimally overlapping sets for each token (some overlap is desirable for privacy reasons as discussed below). Note that the task of generating 2-vector sequences for all paths is likely to be computationally infeasible for even moderately sized functional units. However, it is feasible and practical to use ATPG to target random subsets of paths for the enrollment requirements. The set of PNs, {PN.sub.y}, as generated in the Authen Phase are also stored, along with the challenge vectors that are used, in the secure database under ID.sub.i.

(46) The fielded token authenticates using a 3-phase process, Phase 1 is token identification (ID Phase), Phase 2 is verifier authentication (Mutual Phase) and Phase 3 is token authentication (Authen Phase). The operations carried out in the ID Phase are shown graphically in FIG. 5B. The other two phases are nearly identical, with only the differences noted below.

(47) The token initiates the process by transmitting a req. to authen. signal to the verifier. The verifier generates nonce n.sub.2 and transmits it to the token, along with a selected set of challenges ({ck} to the token. It should be noted that the transmitted challenges are typically a subset of those used during enrollment. The token generates a nonce n.sub.1 and transmits it to the verifier. This prevents the adversary from constructing n.sub.2 as a means of carrying out a systematic attack.

(48) The token and verifier compute m=(n.sub.1 XOR n.sub.2) and use the m as an input parameter to the SelParam function. SelParam constructs the parameters Mod, S, .sub.ref, Rng.sub.ref, and Margin using bit-fields from m. The two LFSR Seed parameters Scan be derived directly from a bit-field in m. The remaining parameters are derived using a table lookup operation as a means of constraining them to specific ranges. For example, Mod is lower bounded by the Margin and is constrained to be an even number less than 30. Similarly, .sub.ref and Rng.sub.ref parameters are constrained to a range of fixed-point values. SelParam is carried out on the verifier in the same fashion.

(49) Once the parameters are selected, the bitstring generation process is carried out First, the challenges {ck} are applied to generate a set {PN.sub.j}, referenced as PUF({ck}) in FIG. 5B. The PNDiff, TVCOMP and Modulus operations described above are then applied to the set of PNs using the AppParam procedure with parameters S, .sub.ref, Rng.sub.ref, and Mod parameters to generate the set (modPNDc.sub.j). Next, bitstring generation (BitGenS) is performed on the token using the Margining process described above and shown graphically by (b) in FIG. 4. BitGenS returns both a bitstring bss that is composed of only strong bits under the constraints of the Margin and a helper data string h Both bss and h are transmitted to the verifier.

(50) The verifier carries out a search process by processing each of its stored token i data sets {PN.sub.j}.sub.i using the same parameters. However the DHD scheme, denoted BitGenD in FIG. 5B, is used instead. BitGenD bitwise-ANDs the token's helper data h with the helper data derived for each data set (not shown), and uses it to modify the token's bitstring bss to bss eliminating bits as needed (see bottom of (c) in FIG. 4) and to produce the verifier's StrongBS bss. Lastly, the verifier compares bss with bss and completes the ID Phase successfully if a match is found.

(51) Although this is a compute-intensive operation for large databases because AppParam and BitGenD must be applied to each stored {PN.sub.j}.sub.i in the database, the search operation can be carried out in parallel on multiple CPUs given the independence of the operations if needed.

(52) As indicated, the search terminates when a match is found or the database is exhausted. In the latter case, authentication terminates with failure at the end of the ID Phase. Therefore, the ID Phase also serves as a gateway that prevents an adversary from depleting a token's authentication information on the verifier in a denial-of-service attack. In the former case, the ID.sub.i of the matching verifier data set is passed to Phase 2, verifier authentication (Mutual Phase), and Phase 3, token authentication (Authen Phase). In the Mutual Phase, the same process is carried out except the token and verifier roles are reversed and the search process is omitted. It is also contemplated that the challenges used in the ID Phase can be re-used and only SelParam run using two new nonces (n.sub.3 XOR n.sub.4). The Authen Phase is similar to the ID Phase in that the token is again authenticating to the verifier, but uses a token specific set of challenges {cx}. Similar to the Mutual Phase, the search process is omitted. It is also contemplated that the Authen Phase can be omitted in applications that have lower security requirements, for example, RFID and home automation applications.

(53) Note that token privacy is preserved in the ID Phase because, with high probability, the transmitted information bss and h is different from one run of the protocol to the next, given the diversity of the parameter space provided by the Mod, S, .sub.ref, Rng.sub.ref, Margin. This diversity is exponentially increased as discussed above through the use of the Path-Select-Mask. Moreover, by creating overlap in the challenges used by different tokens in the token authentication phase, tracking is prevented in this phase as well.

(54) It should be noted that HELP uses an error avoidance scheme and therefore, the motivating factor for previously proposed reverse fuzzy extraction schemesfor example, reducing the computing burden associated with error correction on the tokendoes not exist for HELP. As a consequence, it is possible in HELP to implement an efficient helper data scheme in either direction, as proposed in the multiple phases of the authentication scheme.

(55) The Mod, S, .sub.ref, Rng.sub.ref, Margin collectively represent parameters that can be varied within limits to create distinct bitstrings from a set of measured PNs. This feature of the proposed authentication scheme offsets the increased overhead associated with storing multi-bit PNs on the verifier as an alternative to response bitstrings. However, this scheme depends heavily on high statistical quality among the generated StrongBS. This section investigates StrongBS statistical quality using the standard metrics, including Intra-chip hamming distance (HD.sub.intra), Inter-chip hamming distance (HD.sub.inter) and the NIST statistical test tools, as measures of bitstring reproducibility, uniqueness and randomness, respectively.

(56) According to one embodiment of the invention, the protocol is provided in a hardware implementation of the Advanced Encryption Standard (AES) algorithm using an AES data path component referred to as sbox-mixedcol as the source of entropy. In particular, the sbox-mixedcol is a functional unit of a 32-bit column AES that includes 4 copies of the SBOX and 1 copy of the MIXEDCOL.

(57) Data is collected from the sbox-mixedcol functional unit on 45 copies of the Xilinx Zynq 7020 FPGA, however any number of copies as well as any hardware such as ASIC is contemplated. The implementation of sbox-mixedcol requires approx. 3000 LUTs on the Xilinx Zynq 7020 FPGA and provides approx. 8 million paths. However, the protocol has also been demonstrated using a lighter-weight functional unit consisting of single AES SBOX component that possesses approx. 600 LUTs, reducing the overall implementation size (HELP+functional unit) from approx. 6000 LUTs to less than 3000 LUTs.

(58) In particular, a set of 4096 PNs are collected from the 45 chips at each of 16 TV corners. The enrollment data stored in the verifier database is collected at 25 C., 1.00V (nominal conditions), while regeneration data is collected at all combinations of the extended industrial-grade temperature-voltage specification limits for the parts, 40 C., 0 C., 25 C., 85 C., 100 C. and voltages 0.95V, 1.00V and 1.05V. A set of low-noise, high within-die variations paths are selected using Path-Select-Masks from approx. 600 rising and 600 falling 2-vector test sequences.

(59) Test data is generated by applying a set of approx. 1200 challenges to test 2048 paths with rising transitions and 2048 paths with falling transitions. PNDs are created using LFSR-selected pairings of the 2048 rising and 2048 falling edge PNs. Each of the 2048 rising edge PNs can be paired with any of the 2048 falling edge PNs, yielding 4,194,304 possible combinations, however the following results are directed to a subset of 256 of these pairing combinations.

(60) A 2-bit offset scheme, as discussed above, is applied to the PNDc to improve entropy. The verifier computes the offsets using stored enrollment data and uses it to shift the individual PNDc upwards by 0, , , or s the range given by the applied Modulus to better center the distribution over the 0-1 lines.

(61) A set of Moduli between 10 and 30, in steps of size 2, and Margins of size 2 and 3, are also investigated. The minimum value of the Modulus is given by 4*Margin+2 because four weak regions are required as shown by (a) in FIG. 4 and the two strong bit regions must be at least of size 1. For example, the smallest Modulus for a Margin of size 3 is 14, so elements in the histogram for Modulus of 10 and 12 are 0.

(62) The analysis reveals that of the 20 combinations of these parameters, 17 are useful. The only combinations that cannot be used are Modulus of 10 for Margin 2 and Moduli of 14 and 16 for Margin 3. As shown, the bitstring sizes are too small for these combinations of Margin and Moduli.

(63) The analysis also investigates two of the scaling factor combinations given by the .sub.ref and Rng.sub.ref parameters (see Equations (1) and (2)), in particular, the Mean and Maximum recommended values, which are derived from the individual distributions of the 45 chips. It is conservatively estimated that pre and Rng.sub.ref can be independently set to 10 different values between these Mean and Maximum values.

(64) Given these bounds on the configuration parameters, it is possible to generate a total of 4,194,304*17*10*10=7 billion different bitstrings using the same set of paths (PNs). As discussed above, the verifier also applies a Path-Select-Mask to each of the 2-vector sequences, which increases the number of possible bitstrings exponentially.

(65) Inter-chip hamming distance is reported in two waysActual and True. In this section, HD.sub.inter is computed using the StrongBS produced after the application of the DHD method described above.

(66) A set of StrongBS are created by AND'ing pairs of Helper Data bitstrings as follows. First, the enrollment modPNDc is used to create a set of 45 Helper Data bitstrings for each of the 45 chips. Second, Helper Data is computed using the modPNDc collected under each regeneration corner for these 45 chips. For each chip, the enrollment Helper Data bitstring is AND'ed with the corresponding regeneration Helper Data bitstrings.

(67) The 45*15 AND'ed Dual Helper Data bitstrings are used to create a corresponding set of StrongBS using the method shown in (b) and (c) of FIG. 4. It should be noted that the DHD method creates variable-sized bitstrings. The smallest bitstring is used that is produced by one of the chips in the HD.sub.interA analysis. The smallest bitstring sizes are analyzed and discussed below.

(68) HD.sub.interA is computed using the following equation:

(69) ${HD}_{interA} = \frac{1}{CC B T} {.Math.}_{t = 1}^{T} {.Math.}_{i = 1}^{C} {.Math.}_{j = 1}^{C} {.Math.}_{k = 1}^{B} ({StrongBS}_{i, t, k} .Math. {StrongBS}_{j, t, k}) 100$
Equation (3).
B and NC represent number of chips (45), number of regeneration TV corners number of bits (smallest bitstring size) and number of chip combinations (45*44/2=990), respectively. Equation (3) simply sums all the bitwise differences between each of the possible pairing of chip StrongBS, and then converts the sum into a percentage by dividing by the total number of bits that were examined. HD.sub.interA is computed in this fashion for each of the 256 seeds and averaged.

(70) The HD.sub.interA are shown in FIG. 6A and FIG. 6B for each of the Moduli and Margin combinations using Mean and Max. scaling factors for .sub.ref and Rng.sub.ref. The height of the bars are all very close to the ideal of 50%. Although an excellent result, this approach to computing Interchip-HD differs from the traditional approach because corresponding positions in the bitstrings are generated from different modPNDc. The results using the traditional approach, i.e., where the positions of the modPNDc are preserved in the bitstrings, are discussed below.

(71) The StrongBS referenced above are used as input to the NIST statistical test suite. The results using Mean Scaling and only 1 of the 256 LFSR seed pairs are presented in FIG. 7A and FIG. 7B, for Margins of 2 and 3, resp. (the results for other configuration parameters are very similar). NIST test criteria classifies a test category as passed if at least 42 of the 45 chips pass the test. The figure shows all bars are above the threshold line at 42, and therefore all test categories are passed. Bars of height 0 for NIST Tests 1, 2 and 3 identify Moduli that produced bitstrings with sizes less than the NIST requirement for those tests. The pass percentage when the NIST tests are applied to the bitstrings produced from all combinations of the investigated parameters is approx. 98.8%.

(72) FIG. 8A and FIG. 8B each illustrate a graph of true inter-chip hammering distance results using a Mean scaling factor and a Max. scaling factor according to the invention. Entropy results using a Mean scaling factor and a Max. scaling factor are shown in FIG. 9A and FIG. 9B. FIG. 10A and FIG. 10B each illustrate a graph of probability of failure results using a Mean scaling factor and a Max. scaling factor according to the invention. The smallest bitstring size results using a Mean scaling factor and a Max. scaling factor according to the invention are shown in FIG. 11A and FIG. 11B.

(73) Similar to HD.sub.interA, HD.sub.interT is computed as the average percentage across 990 pairings of bitstrings and 256 different pairs of LFSR seeds. However, the full length bitstrings of length 2048 are used and for each pairing of bitstrings, the hamming distance is computed using only bits classified as strong in both bitstrings. Under the Mean scaling factor, the HD.sub.interT vary from 30% to 50% with the smallest value of 30.2% for Margin 3 and Modulus 30 as shown by FIG. 8A. For the Max scaling, most of the HD.sub.interT values are between 40% and 50% with the smallest value of 38.7% as shown by FIG. 8B. These results are also very good and indicate that a 2-bit offset can be used effectively with this range of Moduli.

(74) Similarly, entropy is computed using the strong bits from each enrollment-generated bitstring of length 2048 and the following equation:
H(X)=.sub.i=1.sup.np.sub.i log.sub.2(p.sub.i)+(1p.sub.i)log.sub.2(1p.sub.i)Equation (4).
The frequency p.sub.i of 1s is computed as the fraction of 1s at each bit position using only those chips of the 45 which identify the bit as strong. The entropy values vary over a range from approx. 1240 to over 1900. The ideal value is 2048 in this analysis so these results indicate that each bit contributes between 0.60 and 0.93 bits of entropy.

(75) The Probability of Failure is reported as an exponent x from 10x with a value of 6 indicating 1 chance in 1 million. The HD.sub.intra is computed by pairing the enrollment StrongBS for each chip against each of the 15 regeneration StrongBS under the DHD scheme and then counting the differences (bit flips) across all combinations of the 15 DHD-generated bitstrings. The number of bit flips for all chips are summed and divided by the total number of bits inspected. An average HD.sub.intra is then computed using this process across a set of 256 LFSR seed pairs, which is then converted into an exponent representing the Probability of Failure. The results show that the Probability of Failure varies between 10-2 and 10-4, with the largest (worst case) value at 10-2.4. Therefore, less than 1% of the bits for any authentication differ between the token and verifier under worst case environmental conditions.

(76) The smallest StrongBS sizes are shown in the FIG. 11A and FIG. 11B. Using the condition that at least 80 bits are needed to meet the de facto lightweight security standard, the only parameter combinations that fail to meet this criteria are those noted above, i.e., Modulus of 10 for a Margin of 2 and Moduli of 14 and 16 for a Margin of 3.

(77) FIG. 12 gives the resource utilization and runtime overhead associated with the ID Phase and Mutual Phase of the protocol. The table shown in FIG. 12 lists the resources in the order in which they are used by the authentication protocol, with - indicating repeated use of resources previously listed. The totals at the bottom indicate that area overhead is 6038 LUTs and 1724 FFs while the runtime is approx. 1.25 seconds. An alternative, lighter-weight implementation which uses only a single AES SBOX component yields an area overhead of 2909 LUTs and 952 FFs and a runtime of approx. 2.2 seconds.

(78) The implementation of HELP also requires an 18-bit multiplier and an on-chip BRAM memory of size 7.5 KBytes. The Xilinx IP blocks used in the implementation include a MMCM and a dual-channel (64-bits) AXI-GPIO for implementing communication between the processor and programmable logic components of the Zynq 7020 FPGA. The runtime is measured using an 8-core 3.4 GHz Intel i7 desktop computer as the verifier. The authentication time of 1.25 seconds includes network transmissions between the token and verifier. The exhaustive search carried out on the verifier takes approx. 300 microseconds per entry in the database. The runtime reported uses a database with only a single entry. Therefore, applications that incorporate a relatively small number of tokens (10K or less) require a search time of approx. 1.5 seconds on average, and a total authentication time of approx. 2.75 seconds.

(79) Security properties of HELP that relate to its resistance to model building and to the size of its CRP space are now discussed. The response space refers to the number of bitstrings that each token can generate using the six user-defined parameters described above. The security analysis assumes the verifier securely stores the token's timing information that is collected during enrollment, encrypting it if necessary.

(80) As mentioned previously, the size of the challenge space is 2*(3.sup.n2.sup.n) 2-vector sequences, and the number of response bitstrings is approx. 7 billion excluding the diversity introduced by the Path-Select-Mask. The (n.sub.1 XOR n.sub.2) operation used in the protocol does not allow direct control over these configuration parameters. The Path-Select-Mask increases the number of possible response bitstrings exponentially by changing the set of PNs used in the bitstring generation process. These characteristics of HELP and the protocol collectively add significant resilience to model-building attacks.

(81) Two additional factors further increase HELP's model building resistance. The first is referred to as the distribution effect. The PNs selected by the Path-Select-Mask change the characteristics of the PND distribution, which in turn impacts how each PND is transformed through the TVCOMP process (the TVCOMP process was described earlier in reference to Equation (1) and Equation (2)). In particular, Eq. 1 uses the Ptoken and Rng.sub.token of the measured PND distribution to standardize the PNDs before applying the reverse transformation given by Equation (2). The first transformation makes the final PNDc values dependent on the other components of the PND distribution. Therefore, machine learning techniques designed to learn the relative path delays as a mechanism to break the PUF need to account for this distribution effect.

(82) With the physical model for HELP being more complex than the models developed for the arbiter PUF, it is likely that machine learning (ML) algorithms require much larger training sets to achieve good prediction capability, if it is possible at all. This is true for several reasons. First, the adversary is required to run automatic test pattern generation (ATPG) to generate the vector pairs used in the training phase of the ML attack. Although this is a one-time cost, ATPG requires long runtimes and commonly fails to find vector pairs that test paths in a hazard-free robust manner, which is required to eliminate uncertainly about which path is actually being tested during the training phase. Second, a level of uncertainty will always remain because not all paths are hazard-free robust testable. In particular, the path that dominates the timing for cases where paths reconverge and have nearly equal nominal delays will be different from chip-to-chip. Third, ML algorithms such as Probably Approximately Correct (PAC) that have been effective against arbiter PUFs, guarantee success only when the model is polynomial in size.

(83) The described embodiments are to be considered in all respects only as illustrative and not restrictive, and the scope of the invention is not limited to the foregoing description. Those of skill in the art may recognize changes, substitutions, adaptations and other modifications that may nonetheless come within the scope of the invention and range of the invention.

Privacy-preserving, mutual PUF-based authentication protocol

Inventors

Cpc classification

Classification Explorer

H04L9/0866

ELECTRICITY

Classification Explorer

G06F21/44

PHYSICS

Classification Explorer

H04L9/3234

ELECTRICITY

Classification Explorer

H04L2209/12

ELECTRICITY

Classification Explorer

H04L2209/08

ELECTRICITY

Classification Explorer

G06F21/73

PHYSICS

Classification Explorer

G06F21/445

PHYSICS

Classification Explorer

G06F21/70

PHYSICS

Classification Explorer

G06F7/588

PHYSICS

Classification Explorer

H04L9/3278

ELECTRICITY

Classification Explorer

H04L9/0662

ELECTRICITY

Classification Explorer

G06F21/30

PHYSICS

International classification

Classification Explorer

G06F21/44

PHYSICS

Classification Explorer

G06F7/58

PHYSICS

Classification Explorer

G06F21/34

PHYSICS

Classification Explorer

H04L9/32

ELECTRICITY

Classification Explorer

G06F21/70

PHYSICS

Classification Explorer

G06F21/30

PHYSICS

Classification Explorer

G06F21/73

PHYSICS

Classification Explorer

H04L9/08

ELECTRICITY

Classification Explorer

H04L9/06

ELECTRICITY

Abstract

Claims

Description