CLOCK TREE, HASH ENGINE, COMPUTING CHIP, HASH BOARD AND DATA PROCESSING DEVICE

20220271753 · 2022-08-25

    Inventors

    Cpc classification

    International classification

    Abstract

    This disclosure relates to a device performing hash algorithm A hash engine includes an operation module performing a hash operation on a data block and a clock module. The operation module includes operation stages each including registers and a combinational logic module. A digital signal based on the data block is sequentially delivered along the operation stages. Outputs of a first set of registers are coupled to an input of the combinational logic module of the current operation. Inputs of a second set of registers are coupled to an output of a combinational logic module of a previous operation stage. A clock signal, provided by the clock module to each operation stage, is sequentially delivered along a multi-stage clock driving circuits of the clock module. For the first and second sets of registers, a delivery direction of the digital signal is the same as that of the clock signal.

    Claims

    1. A hash engine, comprising: an input module, configured to receive a data block; an operation module, configured to perform a hash operation on the received data block, the operation module comprising a plurality of operation stages arranged in a pipeline structure such that a digital signal based on the data block is sequentially delivered along the plurality of operation stages, each operation stage among the plurality of operation stages comprising a plurality of registers and a combinational logic module, wherein in each current operation stage, output ends of a first set of registers among the plurality of registers are at least coupled to an input end of the combinational logic module of the current operation stage, and input ends of a second set of registers among the plurality of registers are coupled to an output end of a combinational logic module of a previous operation stage; and a clock module, configured to provide a clock signal to each of the plurality of operation stages, the clock module comprising multi-stage clock driving circuits such that the clock signal from a clock source is sequentially delivered along the multi-stage clock driving circuits, wherein for the first and second sets of registers of the plurality of operation stages, a delivery direction of the digital signal is the same as that of the clock signal.

    2. The hash engine according to claim 1, wherein in each current operation stage, a third set of registers among the plurality of registers each has its input end coupled to an output end of a corresponding register in the previous operation stage, and its output end coupled to an input end of a corresponding register in a next operation stage, wherein for the third set of registers of the plurality of operation stages, a delivery direction of the digital signal is opposite to that of the clock signal.

    3. The hash engine according to claim 2, wherein the clock module further comprises a clock buffer circuit for each register among the plurality of registers, a clock signal end of each register being coupled to an output end of the clock buffer circuit for each register, and wherein an input end of a clock buffer circuit for each register among the first and second sets of registers of each current operation stage is coupled to an output end of a clock driving circuit for the current operation stage.

    4. The hash engine according to claim 3, wherein the plurality of registers of each current operation stage further comprise one or more additional registers, one of the one or more additional registers having its input end coupled to an output end of a specific register among the first set of registers of the current operation stage, its output end coupled to an input end of a register of the next operation stage corresponding to the specific register, and its clock signal end coupled to an output end of a clock buffer circuit for the one additional register.

    5. The hash engine according to claim 4, wherein the hash engine is used for performing a SHA-256 algorithm, the plurality of registers of each current operation stage includes at least first to sixteenth registers (W.sub.0 . . . W.sub.15), the first set of registers includes first, second, tenth and fifteenth registers (W.sub.0, W.sub.1, W.sub.9, W.sub.14), and the second set of registers includes a sixteenth register (W.sub.15), the one or more additional registers include a seventeenth register (W.sub.9_t) and an eighteenth register (W.sub.14_t), wherein: the seventeenth register (W.sub.9_t) has its input end coupled to an output end of the tenth register (W.sub.9) of the current operation stage, its output end coupled to an input end of a ninth register (W.sub.8) of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the seventeenth register, and the eighteenth register (W.sub.14_t) has its input end coupled to an output end of the fifteenth register (W.sub.14) of the current operation stage, its output end coupled to an input end of a fourteenth register (W.sub.13) of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the eighteenth register.

    6. The hash engine according to claim 5, wherein the third set of registers includes third to ninth registers (W.sub.2 . . . W.sub.8) and eleventh to fourteenth registers (W.sub.10 . . . W.sub.13).

    7. The hash engine according to claim 6, wherein for the third to ninth registers (W.sub.2 . . . W.sub.8) of each operation stage, an input end of a clock buffer circuit for a k.sup.th register (W.sub.k−1) is coupled to an output end of a clock buffer circuit for a (k−1).sup.th register (W.sub.k−2) of the next operation stage, where k is an integer and 3≤k≤9, and wherein an input end of the clock buffer circuit for the seventeenth register (W.sub.9_t) is coupled to an output end of a clock buffer circuit for the ninth register (W.sub.8) of the next operation stage.

    8. The hash engine according to claim 6, wherein for the third to eighth registers (W.sub.2 . . . W.sub.7) of each operation stage, an input end of a clock buffer circuit for a k.sup.th register (W.sub.k−1) is coupled to an output end of a clock buffer circuit for a (k−2).sup.th register (W.sub.k−3) of an operation stage after next, a clock signal end of a (k−1).sup.th register (W.sub.k−2) of each operation stage is coupled to an output end of a clock buffer circuit for a k.sup.th register (W.sub.k−1) of the previous operation stage, where k is an even number and 3≤k≤8, an input end of the clock buffer circuit for the seventeenth register (W.sub.9_t) is coupled to an output end of a clock buffer circuit for an eighth register (W.sub.7) of an operation stage after next, and wherein a clock signal end of a ninth register (W.sub.8) of each operation stage is coupled to an output end of a clock buffer circuit for a seventeenth register (W.sub.9_t) of the previous operation stage.

    9. The hash engine according to claim 5, wherein the third set of registers includes third to sixth registers (W.sub.2 . . . W.sub.5), eighth to ninth registers (W.sub.7, W.sub.8), and eleventh to fourteenth registers (W.sub.10 . . . W.sub.13), the one or more additional registers further include a nineteenth register (W.sub.6_t) having its input end coupled to an output end of the seventh register (W.sub.6) of the current operation stage, its output end coupled an input end of a sixth register (W.sub.5) of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the nineteenth register, the clock buffer circuit for the nineteenth register (W.sub.6_t) having its input end coupled to an output end of a clock buffer circuit for a sixth register (W.sub.5) of the next operation stage, an input end of a clock buffer circuit of a seventh register (W.sub.6) for each current operation stage is coupled to an output end of a clock driving circuit for the current operation stage, and wherein for the third to sixth registers (W.sub.2 . . . W.sub.5) and the eighth to ninth registers (W.sub.7, W.sub.8) of each current operation stage, an input end of a clock buffer circuit for a k.sup.th register (W.sub.k−1) is coupled to an output end of a clock buffer circuit for a (k−1).sup.th register (W.sub.k−2) of the next operation stage, where k is an integer and 3≤k≤6 or 8≤k≤9.

    10. The hash engine according to claim 5, wherein the third set of registers includes third to fifth registers (W.sub.2 . . . W.sub.4), seventh to ninth registers (W.sub.6 . . . W.sub.8), and eleventh to fourteenth registers (W.sub.10 . . . W.sub.13), the one or more additional registers further include a twentieth register (W.sub.5_t) having its input end coupled to an output end of a sixth register (W.sub.5) of the current operation stage, its output end coupled to an input end of a fifth register (W.sub.4) of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the twentieth register, the clock buffer circuit for the twentieth register (W.sub.5_t) having its input end coupled to an output end of a clock buffer circuit for the fifth register (W.sub.4) of the next operation stage, an input end of a clock buffer circuit of a sixth register (W.sup.5) for each current operation stage is coupled to an output end of a clock driving circuit for the current operation stage, and wherein for the third to fifth registers (W.sub.2 . . . W.sub.4) and the seventh to ninth registers (W.sub.6 . . . W.sub.8) of each operation stage, an input end of a clock buffer circuit for a k.sup.th register (W.sub.k−1) is coupled to an output end of a clock buffer circuit for a (k−1).sup.th register (W.sub.k−2) of the next operation stage, where k is an integer and 3≤k≤5 or 7≤k≤9.

    11. The hash engine according to claim 6, wherein for the eleventh to fourteenth registers (W.sub.10 . . . W.sub.13) of each operation stage, an input end of a clock buffer circuit for a j.sup.th register (W.sub.j−1) is coupled to an output end of a clock buffer circuit for a (j−1).sup.th register (W.sub.j−2) of the next operation stage, where j is an integer and 11≤j≤14, and wherein an input end of the clock buffer circuit for the eighteenth register (W.sub.14_t) is coupled to an output end of a clock buffer circuit for a fourteenth register (W.sub.13) of the next operation stage.

    12. The hash engine according to claim 3, wherein each stage clock driving circuit among the multi-stage clock driving circuits comprises an odd number of inverters.

    13. The hash engine according to claim 3, wherein a clock buffer circuit for each register among the first and second sets of registers comprises two clock buffers, and a clock buffer circuit for each register among the third set of registers comprises one clock buffer.

    14. A clock tree circuit, comprising: a clock source, configured to provide a basic clock signal; and multi-stage clock driving circuits, wherein the basic clock signal from the clock source is sequentially delivered along the multi-stage clock driving circuits, each stage clock driving circuit among the multi-stage clock driving circuits being configured to provide a clock signal for each of a plurality of operation stages, wherein the plurality of operation stages are arranged in a pipeline structure such that a digital signal based on a received data block is sequentially delivered along the plurality of operation stages, each operation stage among the plurality of operation stages comprising a plurality of registers and a combinational logic module, wherein in each current operation stage, a first set of registers among the plurality of registers has their output ends at least coupled to an input end of the combinational logic module of the current operation stage, a second set of registers among the plurality of registers has their input ends coupled to an output end of a combinational logic module of the previous operation stage, and a third set of registers among the plurality of registers has their respective input ends coupled to respective output ends of respective corresponding registers in the previous operation stage, and their respective output ends coupled to respective input ends of respective corresponding registers in a next operation stage, wherein for the first and second sets of registers of the plurality of operation stages, a delivery direction of the digital signal is the same as that of the clock signal, and wherein for the third set of registers of the plurality of operation stages, a delivery direction of the digital signal is opposite to that of the clock signal.

    15. (canceled)

    16. A hash board comprising one or more computing chips each of which comprises one or more hash engines according to claim 1.

    17. (canceled)

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0016] The included drawings are for illustrative purposes and serve only to provide examples of possible structures and arrangements of an inventive apparatus disclosed herein and a method of applying it to a computing device. These drawings in no way limit any change in form and details that may be made to embodiments by those skilled in the art without departing from the essence and scope of the embodiments. The embodiments will be more readily understood by the following detailed description in conjunction with the accompanying drawings, wherein similar reference numerals denote similar structural elements.

    [0017] FIG. 1 is a diagram of a SHA-256 hash engine according to an embodiment of the present disclosure.

    [0018] FIG. 2A is a diagram illustrating driving registers by a forward clock tree in a pipeline architecture.

    [0019] FIG. 2B is a diagram illustrating driving registers by a reverse clock tree in a pipeline architecture.

    [0020] FIGS. 3A and 3B are diagrams illustrating setup time and hold time of a register.

    [0021] FIG. 4 is a schematic diagram illustrating a clock tree structure according to an embodiment of the present disclosure.

    [0022] FIG. 5 is a diagram illustrating a hash engine employing the clock tree structure of FIG. 4 according to an embodiment of the present disclosure.

    [0023] FIG. 6 is a diagram illustrating another hash engine employing the clock tree structure of FIG. 4 according to an embodiment of the present disclosure.

    [0024] FIG. 7 is a schematic diagram illustrating a clock tree structure according to another embodiment of the present disclosure.

    [0025] FIG. 8 is a diagram illustrating a hash engine employing the clock tree structure of FIG. 7 according to an embodiment of the present disclosure.

    [0026] Note that in the embodiments described below, a same reference numeral is shared among different drawings to denote same portions or portions having a same function, and repetitive description thereof will be omitted. In this specification, similar reference numerals and letters are used to denote similar items, and therefore, once a certain item is defined in one drawing, further discussion thereof is not required in subsequent drawings.

    [0027] For ease of understanding, positions, dimensions, ranges, etc. of structures shown in the drawings and the like sometimes do not necessarily represent their actual positions, dimensions, ranges, etc. Therefore, the disclosed disclosure is not limited to the positions, dimensions, ranges, etc. disclosed in the drawings and the like. Further, the drawings are not necessarily drawn to scale, and some features may be enlarged to show details of specific components.

    DETAILED DESCRIPTION

    [0028] Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that relative arrangements of components and steps, numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless otherwise specified.

    [0029] The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit this disclosure, its applications, or uses. That is, a hash engine herein is shown in an exemplary way to illustrate different embodiments of a circuit in the present disclosure and is not intended to be limiting. Those skilled in the art will appreciate that they are merely illustrative of exemplary ways in which the present disclosure can be practiced, rather than exhaustive ways.

    [0030] A technique, method, and device known to one of ordinary skill in the related art may not be discussed in detail, but the technique, method, and device should be regarded as part of the granted specification where appropriate.

    [0031] The present disclosure provides a novel clock tree solution that can be used in any cryptographic algorithm circuit with a pipeline architecture. For ease of description, a SHA-256 hash algorithm circuit is taken as an example for the following explanation. It will be appreciated by those skilled in the art that the SHA-256 is only one example in which the clock tree solution of the present disclosure can be applied, and that the present disclosure can also be applied in another cryptographic algorithm circuit having the pipeline structure.

    [0032] Reference is now made to FIG. 1, which is a diagram of a SHA-256 hash engine according to an embodiment of the present disclosure. Those skilled in the art will appreciate that the following description of the SHA-256 is provided for the purpose of more clearly presenting inventive concepts of the present application and is not intended to be in any way limiting. The SHA-256 mentioned herein includes any known version of SHA-256 and variations and modifications thereof.

    [0033] As shown in FIG. 1, the hash engine 10 may comprise an input module 101, an operation module 102, and a clock module 103. The input module 101 is configured to receive a data block. The operation module 102 can perform a SHA-256 hash operation on the received data block. The clock module 103 is configured to provide a required clock signal for the operation module 102.

    [0034] As shown in FIG. 1, the operation module 102 may comprise a plurality of operation stages, a 1.sup.st stage . . . i.sup.th stage . . . N.sup.th stage, arranged in a pipeline structure. N can be 32, 64, 128, etc. Each operation stage can comprise registers A to H and their corresponding operational logic, registers W.sub.0 to W.sub.15 and their corresponding combinational logic, and a memory for storing a constant K. The registers W.sub.0 to W.sub.15 are commonly referred to as extension registers because they are configured to extend the input data block. The registers A to H are commonly referred to as compression registers because they are configured to compress extended data into a hash value.

    [0035] As shown in FIG. 1, in the registers W.sub.0 to W.sub.15, outputs of the registers W.sub.1 to W.sub.15 of each stage are provided as inputs to registers W.sub.0 to W.sub.14 at a next stage, while outputs of the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14 are provided as inputs to a combinational logic, an output of which is provided as an input to a register W.sub.15 of the next stage. That is, the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14 and W.sub.15 of each stage are related to a combinational logical operation of a previous or current stage, and the remaining registers are not related to the combinational logical operation of the previous or current stage.

    [0036] The clock module 103 can provide a clock signal to the operation module 102, and specifically, to each register in the operation module 102. Typically, the clock signal output by the clock module 103 is derived from a single clock source. However, in such a chip like SHA-256, there are a large number of time sequential devices such as registers. If the time sequential devices are directly driven by using a single clock source signal, the load driving capacity will become a problem, and excessively long wiring from the clock source to clock ends of the registers results in excessively great latency. Therefore, a clock tree architecture is usually employed to provide the clock signal, i.e. inserting a buffer or an inverter between the clock source and the time sequential devices to form a clock distribution network. In a pipeline architecture, there are two clock tree structures, namely, a forward clock tree and a reverse clock tree.

    [0037] FIG. 2A is a diagram illustrating driving registers by a forward clock tree in a pipeline architecture. As shown, pipeline operation stages 202-1 . . . 202-N are driven by a clock tree composed of a clock source 200 and multi-stage clock driving circuits 201-1 . . . 201-N. Since a delivery direction (from left to right) of the clock signal is consistent with a data delivery direction (from left to right) of the pipeline, this clock tree is called the forward clock tree.

    [0038] FIG. 2B is a diagram illustrating driving registers by a reverse clock tree in a pipeline architecture. As shown, pipeline operation stages 202-N . . . 202-1 are driven by a clock tree composed of a clock source 200 and multi-stage clock driving circuits 201-1 . . . 201-N. Since a delivery direction (from right to left) of the clock signal is opposite to a data delivery direction (from left to right) of the pipeline, this clock tree is called the reverse clock tree.

    [0039] Regardless of which clock tree structure is employed, requirements for setup time and hold time of the register should be met. FIGS. 3A and 3B are diagrams illustrating setup time and hold time of a register. The setup time T.sub.setup refers to a time during which data must remain stable before a clock edge arrives. If the setup time does not meet the requirement, the data cannot be stably fed into the register at this clock edge. The hold time T.sub.hold refers to a time during which the data must remain stable after the clock edge arrives. If the hold time does not meet the requirement, the data likewise cannot be stably fed into the register.

    [0040] Here, this will be described in detail through a common circuit in digital circuit design. As shown in FIG. 3A, the circuit comprises flip-flops 301 and 303 and combinational logic 302. A data signal Q1 output by the flip-flop 301 is transferred to an input of the flip-flop 303 via the combinational logic 302, and a clock signal CLK controls the flip-flop 303 to capture the data signal. In order to make the data signal be properly captured by the flip-flop 303, the data signal should reach the input of the flip-flop 303 at a time of at least T.sub.setup before the clock edge and hold for at least T.sub.hold after the clock edge.

    [0041] On the basis that T.sub.setup and T.sub.hold are met, a transmission latency range of an intermediate combinational logic circuit can be determined. Assume that a clock cycle is T.sub.clk, an output latency of the flip-flop is T.sub.co, and the latency of the combinational logic is T.sub.comb.

    [0042] For T.sub.setup, it must meet:


    T.sub.clk−T.sub.co−T.sub.comb>T.sub.setup  (Equation 1)

    [0043] Considering a worst case, i.e., greatest output latency of the flip-flop and greatest latency of the combinational logic circuit, the above Equation 1 becomes:


    T.sub.clk−T.sub.co-max−T.sub.comb-max>T.sub.setup  (Equation 2)

    [0044] For T.sub.hold, it must meet:


    T.sub.co+T.sub.comb>T.sub.hold  (Equation 3)

    [0045] Considering a worst case, i.e., least output latency of the flip-flop and least latency of the combinational logic circuit, the above Equation 3 becomes


    T.sub.co-min+T.sub.comb-min>T.sub.hold  (Equation 4)

    [0046] In conjunction with the forward clock tree and the reverse clock tree of FIGS. 2A and 2B, assuming that the latency of the clock driving circuit of each stage is T.sub.clklatency, the above Equations 2 and 4 respectively become the following Equations.

    [0047] For the forward clock tree:

    [0048] considering T.sub.setup:


    T.sub.clk+T.sub.clklatency−T.sub.co-max−T.sub.comb-max>T.sub.setup  (Equation 5)

    [0049] that is,


    T.sub.clk>T.sub.setup+T.sub.co-max+T.sub.comb-max−T.sub.clklatency  (Equation 6)

    [0050] Considering T.sub.hold:


    T.sub.co-min+T.sub.comb-min>T.sub.hold+T.sub.clklatency  (Equation 7)

    that is,


    T.sub.co-min+T.sub.comb-min−T.sub.clklatency>T.sub.hold  (Equation 8)

    [0051] For the reverse clock tree:

    [0052] considering T.sub.setup:


    T.sub.clk−T.sub.clklatency−T.sub.co-max−T.sub.comb-max>T.sub.setup  (Equation 9)

    that is,


    T.sub.clk>T.sub.setup+T.sub.co-max+T.sub.comb-max+T.sub.clklatency  (Equation 10)

    [0053] Considering T.sub.hold:


    T.sub.co-min+T.sub.comb-min>T.sub.hold−T.sub.clklatency  (Equation 11)

    that is,


    T.sub.co-min+T.sub.comb-min+T.sub.clklatency>T.sub.hold  (Equation 12)

    [0054] Comparing the Equations 6 and 10, it can be seen that T.sub.clk of the forward clock tree can be less, i.e., the period can be less, and accordingly frequency of the chip can be faster, so as to achieve higher performance. However, T.sub.clk of the reverse clock tree needs to be greater, i.e., the period needs to be greater, so the frequency of the chip becomes slower and the performance thereof is degraded.

    [0055] However, comparing the Equations 8 and 12, it can be seen that the hold time of the flip-flop is less easily met when the forward clock tree is employed, and is more easily met when the reverse clock tree is employed. Especially if the latency of the combinational logic between two flip-flops is very little or there is even no combinational logic, i.e. T.sub.comb-min is 0, the hold time of the forward clock tree will be difficult to be met.

    [0056] The synchronous time sequential circuit works normally on the premise that the setup time and the hold time of the flip-flop are both met. The hold time is a more important index and must be met. If the hold time is not met, the chip cannot work normally. Therefore, in the prior art, the reverse clock tree is typically employed to ensure that the requirement for the hold time T.sub.hold is met. But this will sacrifice the frequency of the chip, resulting in the degraded performance of the chip.

    [0057] The present disclosure provides a novel clock tree solution that can increase the running frequency of the chip while meeting the requirement for the hold time T.sub.hold, thereby improving the chip performance.

    [0058] FIG. 4 is a schematic diagram illustrating a clock tree structure according to an embodiment of the present disclosure. As shown in FIG. 4, the clock tree may comprise a clock source 400, multi-stage clock driving circuits 401.sub.1 . . . 401.sub.M . . . , a first set of clock buffer circuits 402.sub.1 . . . 402.sub.M . . . , and a second set of clock buffer circuits 406.sub.1 . . . 406.sub.M . . . . An i.sup.th stage clock driving circuit is used for providing a clock for an i.sup.th operation stage of a pipeline. Here, i and M are less than a total stage number N of the pipeline.

    [0059] Here, the i.sup.th operation stage is taken as an example for explanation. As shown in FIG. 4, the i.sup.th operation stage of an operation module comprises a first-class register 401, a second-class register 404.sub.i, and a third-class register 407.sub.i. An output end of the first-class register 403.sub.i is connected to an input end of a combinational logic 405.sub.i of the i.sup.th operation stage in addition to an input end of a corresponding register of an (i+1).sup.th operation stage, that is, the output of the first-class register 401 needs to participate in an combinational logic operation. An input end of the second-class register 404, is connected to an output end of a combinational logic 405.sub.i−1 of an (i−1).sup.th operation stage, that is, the input of the second-class register 404, receives the output from the combinational logic of the (i−1).sup.th operation stage. Both the first-class register 403.sub.i and the second-class register 404.sub.i are associated with the combinational logical operation. However, the third-class register 407.sub.i receives output from a corresponding register of the (i−1).sup.th operation stage and provide its own output to the corresponding register of the (i+1).sup.th operation stage, that is, the third-class register 407.sub.i is independent of the combinational logical operation of the (i−1).sup.th operation stage or the i.sup.th operation stage.

    [0060] It should be noted that for simplicity of description, only one register is shown here for each class of registers. It will be appreciated by those skilled in the art that the number of registers of each class is not limited to one, but can be any number according to an actual circuit structure. Taking the SHA-256 circuit shown in FIG. 1 as an example, the first-class register 403.sub.i can include W.sub.0, W.sub.1, W.sub.9, and W.sub.14, the second-class register 404.sub.i can include W.sub.15, and the third-class register 407.sub.i can include W.sub.2 to W.sub.8 and W.sub.10 to W.sub.13. It should be noted that such classification of the registers of the SHA-256 circuit is merely an example, and those skilled in the art can make classification in different ways according to an actual situation, as will be described below.

    [0061] As shown in FIG. 4, according to the embodiment of the present disclosure, for the first-class register 403.sub.i and the second-class register 404.sub.i related to the combinational logical operation, the forward clock tree structure is employed, i.e., clock ends of the first-class register 403.sub.i and the second-class register 404.sub.i of the i.sup.th operation stage are coupled to output ends of the clock buffer circuit 402.sub.i and an input end of the clock buffer circuit 402.sub.i is coupled to an output end of the clock driving circuit 401.sub.i.

    [0062] For the third-class register 407.sub.i, which is independent of the combinational logical operation of the (i−1).sup.th or i.sup.th operation stage, the reverse clock tree structure is employed, i.e. a clock end of the third-class registers 407.sub.i of the i.sup.th operation stage is coupled to an output end of the clock buffer circuit 406.sub.i, and an input end of the clock buffer circuit 406.sub.i is coupled to an output end of a corresponding clock buffer circuit 406.sub.i+1 of the (i+1).sup.th operation stage. The output end of the corresponding clock buffer circuit 406.sub.i+1 is also coupled to a clock end of a corresponding register 407.sub.i+1 of the (i+1).sup.th operation stage. The corresponding register 407.sub.i+1 refers to the register 407.sub.i+1 of the (i+1).sup.th operation stage, to which an output end of the register 407.sub.i of the i.sup.th operation stage is connected. Taking the SHA-256 as an example, for example, an output end of the register W.sub.5 of the i.sup.th operation stage is connected to a register W.sub.4 of the (i+1)th operation stage, so that the clock end of the register W.sub.5 of the i.sup.th operation stage is coupled to the output end of its corresponding clock buffer circuit 406.sub.i, and the input end of the clock buffer circuit 406.sub.i is coupled to the output end of the clock buffer circuit 406.sub.i+1 of the (i+1).sup.th operation stage used for providing the clock signal to the register W.sub.4.

    [0063] That is, for the third-class register, an input end of a clock buffer circuit providing the clock signal to a register W.sub.k of the i.sup.th operation stage is coupled to an output end of a clock buffer circuit of the (i+1).sup.th operation stage used for providing the clock signal to a register W.sub.k−1, and so on, until an M.sup.th operation stage, at which an input end of a clock buffer circuit 406.sub.M providing the clock signal to a third-class register 407.sub.M is coupled to an output end of a clock buffer circuit 402.sub.M of the M.sup.th operation stage used for providing the clock signal to a first-class register 403.sub.M and a second-class register 404.sub.M. Taking the SHA-256 circuit shown in FIG. 1 as an example, an input end of a clock buffer circuit providing the clock signal to a register W.sub.2 of the (M−1).sup.th operation stage should be coupled to an output end of a clock buffer circuit of the M.sup.th operation stage used for providing the clock signal to a register W.sub.1, and W.sub.1 belongs to the first-class register, i.e., an input end of the clock buffer circuit of the M.sup.th operation stage used for providing the clock signal to the register W.sub.1 is coupled to an output end of the clock driving circuit 401.sub.M, so that at the M.sup.th operation stage, output of the clock buffer circuit 402.sub.M providing the clock signal to the register W.sub.1 is, after passing through the clock buffer circuit 406.sub.M again, input to the clock buffer circuit of the (M−1).sup.th operation stage used for providing the clock signal to the register W.sub.2.

    [0064] According to the above Equations 8 and 12, since the first-class register 403.sub.i and the second-class register 404.sub.i participate in the combinational logical operation, T.sub.comb-min is not 0 and often has greater latency relative to the clock signal, T.sub.hold can also be met even if the forward clock tree is employed. Meanwhile, the third-class register 407.sub.i that does not participate in the combinational logical operation can also meet T.sub.hold because it employs the inverse clock tree. At the same time, the clock module overall employs the forward clock tree structure, so that the running frequency of the chip can be improved, and thus, the chip performance is improved.

    [0065] An application example of the above inventive concept of the present disclosure will be described below in conjunction with the circuit structure of the SHA-256.

    [0066] FIG. 5 is a diagram illustrating a hash engine employing the clock tree structure of FIG. 4 according to an embodiment of the present disclosure. A solid arrow in FIG. 5 indicates a delivery direction of the clock signal, and a dotted arrow indicates a delivery direction of the data. Note that unnecessary illustrations and descriptions are omitted here to avoid obscuring the subject matter. For example, the hash engine of FIG. 5 omits compression registers A to H, and only extension registers W.sub.0 to W.sub.15 are shown. Further, for simplicity, FIG. 5 shows only data delivery and clock delivery of some of registers in each operation stage, and data delivery and clock delivery of other registers are omitted. Data delivery and clock delivery of registers in operation stages are readily contemplated by those skilled in the art in light of the teachings of the present disclosure.

    [0067] As shown in FIG. 5, the hash engine may comprise a plurality of operation stages, each of which comprises a plurality of registers W.sub.0 to W.sub.15 and is driven by a corresponding clock driving circuit 501. According to the embodiment shown in FIG. 5, the hash engine overall employs the forward clock tree structure, and locally employs the reverse clock tree structure. As described above, clocks of the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14 and W.sub.15 of each operation stage are coupled to a main clock tree, and a clock of a remaining registers W.sub.k (W.sub.2 to W.sub.8 and W.sub.10 to W.sub.13) are passed from a clock of a register W.sub.k−1 of a next operation stage. FIG. 5 omits clock buffer circuits for ease of explanation, and a delivery path of a clock signal of a register is only indicated by the solid arrow. It will be appreciated by those skilled in the art in light of the teachings of the present disclosure that a clock end of each register is coupled to an output end of a corresponding clock buffer circuit.

    [0068] For the register W.sub.9 of each operation stage, it participates in the combinational logical operation, and thus receives the clock signal from the forward clock tree. At the same time, the data of the register W.sub.9 also needs to be transferred to a register W.sub.8 of a next operation stage, thus the clock signal of the register W.sub.8 of the next operation stage needs to be transferred to the register W.sub.9 of the current operation stage to meet the requirement for the reverse clock tree. The register W.sub.14 is similar.

    [0069] To this end, in the embodiment of FIG. 5, each operation stage comprises a seventeenth register W.sub.9_t and an eighteenth register W.sub.14_t in addition to first to sixteenth registers W.sub.0 to W.sub.15.

    [0070] The seventeenth register W.sub.9_t has its input end coupled to an output end of the tenth register W.sub.9, its output end coupled to an input end of a ninth register W.sub.8 of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the seventeenth register. An input end of the clock buffer circuit for the seventeenth register W.sub.9_t is coupled to an output end of a clock buffer circuit for a ninth register W.sub.8 of the next operation stage. That is, the clock of the seventeenth register W.sub.9_t is transferred from the clock of the W.sub.8 of the next operation stage.

    [0071] The eighteenth register W.sub.14_t has its input end coupled to an output end of the fifteenth register W.sub.14, its output end coupled to an input end of a fourteenth register W.sub.13 of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the eighteenth register. An input end of the clock buffer circuit for the eighteenth register W.sub.14_t is coupled to an output end of a clock buffer circuit for a fourteenth register W.sub.13 of the next operation stage. That is, the clock of the register W.sub.14_t is transferred from the clock of the register W.sub.13 of the next operation stage.

    [0072] From the perspective of the overall pipeline, the clocks of the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14, W.sub.15 of the i.sup.th operation stage are coupled to the master clock tree. The clock of the register W.sub.9_t of the i.sup.th operation stage is transferred from the clock of the register W.sub.8 of the (i+1).sup.th operation stage. The clock of the register W.sub.8 of the (i+1).sup.th operation stage is transferred from the clock of the register W.sub.7 of the (i+2).sup.th operation stage. And so on, the clock of the register W.sub.2 of the (i+7).sup.th operation stage is transferred from the clock of the register W.sub.1 of the (i+8).sup.th operation stage. The register W.sub.9 of the i.sup.th operation stage transfers the clock to the register W.sub.10 of the (i−1).sup.th operation stage. And so on, the register W.sub.13 of the (i−4).sup.th operation stage transfers the clock to the register W.sub.14_t of the (i−5).sup.th operation stage.

    [0073] By adding the seventeenth register W.sub.9_t and the eighteenth register W.sub.14_t configured as above, both the forward clock tree and the reverse clock tree may be employed for the pipeline structure, so that the requirement for T.sub.hold of the register is met, while the running frequency of the chip is enhanced, and thus the chip performance is improved.

    [0074] FIG. 6 is a diagram illustrating another hash engine employing the clock tree structure of FIG. 4 according to an embodiment of the present disclosure. It should be noted that the same portions as FIG. 5 will not be repeated herein, and only portions different from FIG. 5 will be described.

    [0075] Since the reverse clock tree causes the clock reverse delays by T.sub.clklatency along each stage, T.sub.setup of a register may not be met according to the Equation 9 after passing through a certain number of stages. To this end, as shown in FIG. 6, each operation stages can further comprise a nineteenth register W.sub.6_t, which is similar in circuit arrangement to the seventeenth register W.sub.9_t and the eighteenth register W.sub.14_t, in addition to the first to sixteenth registers W.sub.0 to W.sub.15 and the seventeenth register W.sub.9_t and the eighteenth register W.sub.14_t. That is, the nineteenth register W.sub.6_t of each operation stage has its input end coupled to an output end of the seventh register W.sub.6 of the current operation stage, its output end coupled to an input end of a sixth register W.sub.5 of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the register W.sub.6_t. The clock buffer circuit for the register W.sub.6_t has its input end coupled to an output end of a clock buffer circuit for the sixth register W.sub.5 of the next operation stage. That is, a clock of the register W.sub.6_t is transferred from a clock of the register W.sub.5 of the next operation stage.

    [0076] According to the embodiment shown in FIG. 6, the hash engine overall employs the forward clock tree structure, and locally employs a reverse clock tree structure. In each operation stage, in addition to the clocks of the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14, W.sub.15 being coupled to the master clock tree, a clock of W.sub.6 is also coupled to the master clock tree. However, a clock of the remaining register W.sub.k (W.sub.2 to W.sub.5, W.sub.7 to W.sub.8 and W.sub.10 to W.sub.13) is transferred from a clock of W.sub.k−1 of the next operation stage.

    [0077] From the perspective of the overall pipeline, the clocks of the registers W.sub.0, W.sub.1, W.sub.6, W.sub.9, W.sub.14, W.sub.15 of the i.sup.th operation stage are coupled to the master clock tree. A clock of the register W.sub.6_t is transferred from a clock of a register W.sub.5 of the (i+1).sup.th operation stage. The clock of the register W.sub.5 of the (i+1).sup.th operation stage is transferred from a clock of a register W.sub.4 of the (i+2).sup.th operation stage. And so on, a clock of a register W.sub.2 of the (i+4).sup.th operation stage is transferred from a clock of a register W.sub.1 of the (i+5).sup.th operation stage. The register W.sub.6 of the i.sup.th operation stage transfers the clock to a register W.sub.7 of the (i−1).sup.th operation stage. And so on, a register W.sub.8 of the (i−2).sup.th operation stage transfers the clock to a register W.sub.9_t of the (i−3).sup.th operation stage. A register W.sub.9 of the (i−3).sup.th operation stage transfers the clock to a register W.sub.10 of the (i−4).sup.th operation stage, and so on.

    [0078] In the embodiment shown in FIG. 6, since the nineteenth register W.sub.6_t is added, the clock path from W.sub.1 to W.sub.9_t is divided into two parts, and the clock path of each part is shortened relative to the whole reverse clock path, so that the requirement for T.sub.setup of the register can be met.

    [0079] It should be understood by those skilled in the art that the specific insertion position of the nineteenth register is not limited to the position shown in FIG. 6, but can be in another position as long as the requirement for T.sub.setup of the register can be met. The insertion position of the added register is typically selected to be at a register in the middle of the clock path from W.sub.1 to W.sub.9_t. For example, insertion of W.sub.5_t between W.sub.5 and W.sub.6 can be also performed. The circuit arrangement when W.sub.5_t is inserted is similar to that when W.sub.6_t is inserted as described above, which will not be repeated herein.

    [0080] FIG. 7 is a schematic diagram illustrating a clock tree structure according to another embodiment of the present disclosure. The forward clock tree of FIG. 7 is the same as that shown in FIG. 4 and therefore will not be repeated herein. Unlike the portion of the reverse clock tree shown in FIG. 4, in the embodiment of FIG. 7, a clock signal of the third-class register 407.sub.i of the i.sup.th operation stage is transferred from a clock signal of a corresponding register 407.sub.i+2 of the (i+2).sup.th operation stage, and the clock signal provided to the third-class register 407.sub.i of the i.sup.th operation stage is also provided to a corresponding third-class register 407.sub.i+1 of the (i+1).sup.th operation stage.

    [0081] That is, a clock end of the third-class register 407.sub.i of the i.sup.th operation stage is coupled to an output end of a clock buffer circuit 406.sub.i, while an input end of the clock buffer circuit 406.sub.i is coupled to an output end of a corresponding clock buffer circuit 406.sub.i+2 of the (i+2).sup.th operation stage. The output end of the corresponding clock buffer circuit 406.sub.i+2 is also coupled to a clock end of a corresponding register 407.sub.i+2 of the (i+2).sup.th operation stage. At the same time, the clock end of the corresponding register 407.sub.i+1 of the (i+1).sup.th operation stage is also coupled to the output end of the clock buffer circuit 406.sub.i.

    [0082] As described above with reference to FIG. 4, the corresponding registers 407.sub.i+1 of the (i+1).sup.th operation stage and the corresponding registers 407.sub.i+2 of the (i+2).sup.th operation stage refer to the register 407.sub.i+1 of the (i+1).sup.th operation stage and the register 407.sub.i+2 of the (i+2).sup.th operation stage, to which the output signal of the register 407.sub.i of the i.sup.th operation stage is transferred. Taking the SHA-256 as an example, for example, output of the register W.sub.5 of the i.sup.th operation stage is transferred to a register W.sub.4 of the (i+1).sup.th operation stage, and output of the register W.sub.4 of the (i+1).sup.th operation stage is transferred to a register W.sub.3 of the (i+2).sup.th operation stage, so that a clock end of the register W.sub.5 of the i.sup.th operation stage is coupled to an output end of its corresponding clock buffer circuit 406.sub.i, and a clock end of the register W.sub.4 of the (i+1).sup.th operation stage is also coupled to the output end of the clock buffer circuit 406.sub.i, while an input end of the clock buffer circuit 406.sub.i is coupled to an output end of a clock buffer circuit 406.sub.i+2 of the (i+2).sup.th operation stage used for providing a clock signal to the register W.sub.3.

    [0083] That is, an input end of a clock buffer circuit providing the clock signal to a register W.sub.k of the i.sup.th operation stage is coupled to an output end of a clock buffer circuit of the (i+2).sup.th operation stage used for providing the clock signal to a register W.sub.k−2, and so on until the M.sup.th operation stage, at which an input end of a clock buffer circuit 406.sub.M providing the clock signal to a third-class register 407.sub.M is coupled to an output end of a clock buffer circuit 402.sub.M providing the clock signal to a first-class register 403.sub.M and a second-class register 404.sub.M of the Mt.sup.h operation stage.

    [0084] For example, taking the SHA-256 as an example, an input end of a clock buffer circuit providing the clock signal to a register W.sub.3 of the (M−2).sup.th operation stage should be coupled to an output end of a clock buffer circuit of the M.sup.th operation stage used for providing the clock signal to a register W.sub.1, and W.sub.1 belongs to the first-class register, that is, an input end of the clock buffer circuit 402.sub.M of the M.sup.th operation stage used for providing the clock signal to a register W.sub.1 is coupled to an output end of a clock driving circuit 401.sub.M, so that at the M.sup.th operation stage, output of the clock buffer circuit 402.sub.M providing the clock signal to the register W.sub.1, after passing through the clock buffer circuit 406.sub.M again, is input to the clock buffer circuit of the (M−2).sup.th operation stage providing the clock signal to the register W.sub.3. This will be described in detail with reference to FIG. 8.

    [0085] Likewise, the clock module in the embodiment overall employs the forward clock tree structure, so that the running frequency of the chip can be improved, and thus the chip performance can be improved. At the same time, since the third-class register which does not participate in the combinational logical operation employs the reverse clock tree, the requirement for T.sub.hold can be met.

    [0086] An application example of the clock tree structure of FIG. 7 is described below in conjunction with the circuit structure of SHA-256.

    [0087] FIG. 8 is a diagram illustrating a hash engine employing the clock tree structure of FIG. 7 according to an embodiment of the present disclosure. Likewise, a solid arrow in FIG. 8 indicates a delivery direction of the clock signal, and a dotted arrow indicates a delivery direction of the data. Further, for simplicity, FIG. 8 shows only data transfer and clock transfer of some of registers in each operation stage, and data transfer and clock transfer of other registers are omitted. According to the disclosure of FIG. 8, data transfer and clock transfer of each register in each operation stage will be readily contemplated by those skilled in the art. As for the same portions as FIGS. 5 and 6, description thereof will be omitted.

    [0088] As shown in FIG. 8, the hash engine comprises a plurality of operation stages, each of which comprises a plurality of registers W.sub.0 to W.sub.15. Further, similar to the embodiment of FIG. 5, each operation stage further comprises a seventeenth register W.sub.9_t and an eighteenth register W.sub.14_t. These registers are driven by the corresponding clock driving circuit 501. The configuration of the eighteenth register W.sub.14_t is the same as that of FIG. 5. The configuration of the seventeenth register W.sub.9_t is different from that of FIG. 5.

    [0089] According to the embodiment shown in FIG. 8, the hash engine overall employs a forward clock tree structure, and locally employs a reverse clock tree structure. As mentioned above, clocks of the registers W.sub.0, W.sub.1, W.sub.9, W.sub.14 and W.sub.15 of each operation stage is coupled to the master clock tree, clock transfer from the register W.sub.9 to the register W.sub.14_t is the same as that of FIG. 5, a clock of a k.sup.th register W.sub.k−1 (W.sub.3, W.sub.5 and W.sub.7) is transferred from a clock of a register W.sub.k−3 of the operation stage after next, and a clock of the register W.sub.k−2 (W.sub.2, W.sub.4 and W.sub.6) is the same as that of a register W.sub.k−1 of the previous operation stage. k is an even number and 3≤k≤8. A clock of a register W.sub.8 is the same as that of a register W.sub.9_t of the previous operation stage.

    [0090] From the perspective of the overall pipeline, the register W.sub.9_t of the i.sup.th operation stage has its input end coupled to an output end of the register W.sub.9, its output coupled to an input end of a ninth register W.sub.8 of the next operation stage, and its clock signal end coupled to an output end of a clock buffer circuit for the seventeenth register. An input end of the clock buffer circuit for the seventeenth register W.sub.9_t is coupled to an output end of a clock buffer circuit for an eighth register W.sub.7 of the (i+2).sup.th operation stage. That is, a clock signal of the register W.sub.9_t of the i.sup.th operation stage is transferred from a clock of a register W.sub.7 of the (i+2).sup.th operation stage. The clock of the register W.sub.7 of the (i+2).sup.th operation stage is transferred from a clock of a register W.sub.5 of the (i+4).sup.th operation stage. And so on, a clock of a register W.sub.3 of the (i+6).sup.th operation stage is transferred from a clock of a register W.sub.1 of the (i+8).sup.th operation stage.

    [0091] At the same time, a clock end of a register W.sub.8 of the (i+1).sup.th operation stage is also coupled to an output end of a clock buffer circuit for the seventeenth register W.sub.9_t of the i.sup.th operation stage. That is, a clock of the register W.sub.8 of the (i+1).sup.th operation stage is the same as the clock of the register W.sub.9_t of the i.sup.th operation stage. A clock of a register W.sub.6 of the (i+3).sup.th operation stage is the same as the clock of the register W.sub.7 of the (i+2).sup.th operation stage. And so on, a clock of a register W.sub.2 of the (i+7).sup.th operation stage is the same as the clock of the register W.sub.3 of the (i+6).sup.th operation stage.

    [0092] The embodiment of FIG. 8 can also meet the requirement for T.sub.setup of the register due to nearly half reduction in the number of stages of the reverse clock path from W.sub.1 to W.sub.9_t, while meeting the requirement for T.sub.hold of the register. Compared to the embodiment of FIG. 6, the embodiment of FIG. 8 does not need to insert one additional stage of register, so that the number of registers can be further reduced.

    [0093] In embodiments according to the present disclosure, the aforementioned registers can include edge triggered registers, such as rising edge triggered registers and/or falling edge triggered registers. The register can comprise a D flip-flop (DFF) and/or latch, wherein the latch can, for example, be a latch employing a pulse-type clock signal.

    [0094] According to an embodiment of the present disclosure, each stage clock driving circuit among the aforementioned multi-stage clock driving circuits can comprise an odd number of inverters. For example, each stage clock driving circuit can comprise one inverter.

    [0095] According to an embodiment of the present disclosure, the clock buffer circuit for registers employing the forward clock tree comprises two clock buffers, while the clock buffer circuit for registers employing the reverse clock tree comprises one clock buffer.

    [0096] It will be appreciated by those skilled in the art that although the concepts of the present disclosure have been described above in conjunction with one circuit structure of the SHA-256, the circuit structure is not intended to constitute any limitation of the concepts of the present disclosure. The concepts of the present disclosure can be applied to any known version of SHA-256 and variations and modifications thereof. The concepts of the present disclosure can even be applied to any computing circuit having the pipeline structure and comprising the time sequential devices.

    [0097] According to embodiments of the present disclosure, the hash engine as described above can be implemented as a computing chip.

    [0098] Those skilled in the art will appreciate that the circuit and/or chip according to the present disclosure can be implemented by using a Hardware Description Language (HDL) such as Verilog or VHDL. The HDL description can be synthesized for a cell library designed for a given integrated circuit manufacturing technology and can be modified for timing, power, and other reasons to obtain a final design database, and the final design database can be transmitted to a factory for the production of an integrated circuit by a semiconductor manufacturing system. The semiconductor manufacturing system may produce the integrated circuit by depositing semiconductor material, e.g., on a wafer, which can include a mask, removing material, changing the shape of the deposited material, modifying the material (e.g., modifying a dielectric constant by doping the material or using ultraviolet processing), and so forth. The integrated circuit can include transistors and can also include other circuit elements (e.g., passive elements such as capacitors, resistors, inductors, etc.) and interconnections between the transistors and the circuit elements.

    [0099] According to embodiments of the present disclosure, the computing chip as described above can be comprised in a hash board. Specifically, the hash board can include one or more computing chips. Multiple computing chips can perform computing tasks in parallel.

    [0100] According to embodiments of the present disclosure, the hash board as described above can be comprised in a computing device, which is preferably used for performing cryptocurrency mining. For example, the computing device can be a Bitcoin mining machine. Specifically, the cryptocurrency mining machine can include one or more hash boards. Multiple hash boards can perform computing tasks in parallel, such as executing the SHA-256 algorithm.

    [0101] In all examples shown and discussed herein, any specific value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments can have different values.

    [0102] It will be further understood that a term “comprise/include”, when used herein, specify the presence of stated features, entirety, steps, operations, units, and/or components, but do not preclude the presence or addition of one or more other features, entirety, steps, operations, units, components, and/or combinations thereof.

    [0103] While some specific embodiments of the present disclosure have been shown in detail by way of examples, it should be understood by those skilled in the art that the above examples are intended to be illustrative only and do not limit the scope of the present disclosure. It should be appreciated by those skilled in the art that the above embodiments can be modified without departing from the scope and essence of the present disclosure. The scope of the present disclosure is defined by the attached claims.