Dedicated SSR pipeline stage of router for express traversal (EXTRA) NoC

10554584 ยท 2020-02-04

Assignee

Inventors

Cpc classification

International classification

Abstract

This invention is related to an Express Traversal (EXTRA) Network on Chip (NoC) comprising a number of EXTRA routers. The EXTRA NoC comprises a Buffer Write and Route Computation (BW/RC) pipeline, a Switch Allocation-Local (SA-L) pipeline, a Setup Request (SR) pipeline, a Switch Allocation-Global (SA-G) pipeline, and a Switch Traversal and Link Traversal (ST/LT) pipeline. The BW/RC pipeline is configured to write an incoming flit to an input buffer(s) of a start EXTRA router and compute the route for the incoming head flit by selecting an output port to depart from the start EXTRA router. The SA-L pipeline is configured to arbitrate the start EXTRA router to choose an input port and an output port for a winning flit. The SR pipeline is configured to handle the transmission of a number of SR signals from the start EXTRA router to downstream EXTRA routers.

Claims

1. An Express Traversal (EXTRA) Network on Chip (NoC) comprising a plurality of EXTRA routers, the EXTRA NoC comprising: a buffer write and route computation (BW/RC) pipeline configured to write an incoming flit to an input buffer of a start EXTRA router and compute a route for the incoming head flit by selecting an output port to depart from the start EXTRA router; a switch allocation-local (SA-L) pipeline configured to arbitrate the start EXTRA router to choose an input port and an output port for a winning flit; a setup request (SR) pipeline configured to handle transmission of a plurality of SR signals from the start EXTRA router to downstream EXTRA routers via SR wires; a switch allocation-global (SA-G) pipeline configured to: receive the SR signals from the start EXTRA router via the SR wires; and arbitrate, based on the SR signals received from the start EXTRA router, three signals including: a buffer write enable (BW.sub.ena) signal for a local buffered flit, a bypass mux (BM.sub.sel) signal for a first crossbar switch, and a crossbar select (XB.sub.sel) signal for a second crossbar switch of a selected output port, to build an express path for the winning flit to traverse multiple hops to a destination EXTRA router within one cycle of the start EXTRA router; and a switch traversal and link traversal (ST/LT) pipeline configured to: traverse the winning flit to the selected output port of the start EXTRA router, and transmit the winning flit to the destination EXTRA router bypassing, via the express path, at least one EXTRA router between the start EXTRA router and destination EXTRA router.

2. The EXTRA NoC according to claim 1 wherein the plurality of SR signals are generated by the SA-L pipeline.

3. The EXTRA NoC according to claim 1 wherein the plurality of SR signals are generated by the SR pipeline.

4. The EXTRA NoC according to claim 1 further comprising a plurality of registers inserted between any two adjacent pipelines of the BW/RC, SA-L, SR, SA-G, and ST/LT pipelines.

5. The EXTRA NoC according to claim 4, wherein the plurality of registers are clocked synchronously.

6. A method of traversing flits in an Express Traversal (EXTRA) Network on Chip (NoC) having a plurality of EXTRA routers, the method comprising: (A) in a buffer write and route computation (BW/RC) pipeline: writing an incoming flit to an input buffer(s) of a start EXTRA router, and computing a route for the incoming head flit by selecting an output port to depart from the start EXTRA router; (B) in a switch allocation-local (SA-L) pipeline: arbitrating the start EXTRA router to choose an input port and an output port for a winning flit; (C) in a setup request (SR) pipeline: handling transmission of a plurality of SR signals from the start EXTRA router to downstream EXTRA routers via SR wires; (D) in a switch allocation-global (SA-G) pipeline: receiving the SR signals from the start EXTRA router via the SR wires, and arbitrating, based on the SR signals received from the start EXTRA router, three signals including: a buffer write enable (BW.sub.ena) signal for a local buffered flit, a bypass mux (BM.sub.sel) signal for a first crossbar switch, and a crossbar select (XB.sub.sel) signal for a second crossbar switch of a selected output port, to build an express path for the winning flit to traverse multiple hops to a destination EXTRA router within one cycle of the start EXTRA router; and (E) in a switch traversal and link traversal (ST/LT) pipeline: traversing the winning flit to the selected output port of the start EXTRA router, transmiting the winning flit to a destination EXTRA router, and bypassing, via the express path, at least one EXTRA router between the start EXTRA router and destination EXTRA router.

7. The method according to claim 6 wherein the plurality of SR signals are generated by the SA-L pipeline.

8. The method according to claim 6 wherein the plurality of SR signals are generated by the SR pipeline.

9. The method according to claim 6 wherein a time period of the pipelines are regulated by a plurality of registers inserted between any two adjacent pipelines of the BW/RC, SA-L, SR, SA-G, and ST/LT pipelines.

10. The method according to claim 9, wherein the plurality of registers are clocked synchronously.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above advantages and features in accordance with this disclosure are described in the following detailed description and are shown in the following drawings:

(2) FIG. 1 illustrates a SMART router microarchitecture;

(3) FIG. 2 illustrates an example of a flit traversing through a SMART NoC;

(4) FIG. 3 illustrates the SSR wires connecting the SSR to each of the downstream SA-G;

(5) FIG. 4 illustrates a timing diagram of a SMART router pipeline;

(6) FIG. 5 illustrates a router architecture of the SMART router;

(7) FIG. 6 illustrates a timing diagram of an EXTRA router pipeline in accordance with an embodiment of this disclosure;

(8) FIG. 7 illustrates a router architecture of the EXTRA router in accordance with an embodiment of this disclosure;

(9) FIG. 8 illustrates a representative block diagram of the arrangement of the pipeline stages and registers of a SMART router; and

(10) FIG. 9 illustrates a representative block diagram of the arrangement of the pipeline stages and registers of the EXTRA router in accordance with an embodiment of this disclosure.

DETAILED DESCRIPTION

(11) This disclosure relates to an EXTRA NoC. Particularly, this disclosure relates to separating one of the pipeline stages in the SMART NoC to improve the clock frequency of an EXTRA router.

(12) The details of a SMART router can be found in the following reference, T Krishna et al., Breaking the On-Ship Latency Barrier Using SMART, in High-Performance Computer Architecture (HPCA) 2013. As this disclosure is a modification of the SMART router, certain details of the SMART router are omitted for brevity.

(13) To enable higher clock frequency, it is proposed that the SA-G pipeline stage is separated into two pipeline stages. Through this method, the clock frequency of the EXTRA routers can be increased. As a result, the latency that the flits and packets traverse through the EXTRA NoC can be greatly reduced. Further details will now be described.

(14) FIG. 4 illustrates a timing diagram of a SMART router pipeline of the example shown in FIG. 2. FIG. 5 illustrates a router architecture of the SMART router. For simplicity, only two input ports, namely Core.sub.in 610 and West.sub.in 620 and two output ports North.sub.out 630 and East.sub.out 640 are shown.

(15) As mentioned above, there are four pipeline stages for a SMART router. In the example as shown in FIG. 4, a winning flit in router R0 needs to traverse to router R3. In other words, the winning flit from among its buffered (local) flits 615a or 615b in router R0 wishes to hop thrice to reach router R3. Hence, during the third pipeline stage, SSR signals, to indicate a 3-hop path request, are generated and transmitted to downstream routers R1-R3 so that during the SA-G pipeline stage, respective BWena, BMsel, and XBsel signals are set accordingly to build an express path for the winning flit of R0 to traverse multiple hops within one cycle to router R3.

(16) The example as shown in FIG. 4 proceeds with the first pipeline stage (i.e. BW/RC pipeline) of writing an incoming head flit to an input buffer(s) and computing the destination of the incoming head flit by choosing an output port to depart from the start router, based on the destination information in the incoming head flit. In this instance, the start router is the router R0. In the second pipeline stage (i.e. SA-L pipeline), router R0 arbitrates locally to choose input/output port winners. Particularly, router R0 chooses a winning flit from among its buffered (local) flits for each output port. In this instance, assuming the winning flit is selected from among the buffered (local) flits of 615a, router R0 arbitrates locally to select C.sub.in 610 as input port and E.sub.out as output port.

(17) In the third pipeline stage (i.e. SA-G pipeline), the routers R0-R3 arbitrate among the SSR signals they received to set the BW.sub.ena, BM.sub.sel, and XB.sub.sel signals accordingly to build an express path for a winning flit in router R0 to traverse multiple hops within one cycle to router R3. Hence, R0 begins the third pipeline stage (i.e. SA-G pipeline) in the third cycle by generating SSR signals, via the SSR generator 710. It then transmits SSR signals, via the register 720, to the downstream routers R1, R2 and R3. In response to receiving the SSR signals from R0, the SA-G of R1 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, the SA-G of R2 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, and the SA-G of R3 sets BW.sub.ena as 1 to receive input and BM.sub.sel to 0 to stop bypass. During the third pipeline stage, instead of the winning flit traversing to the crossbar 670, the winning flit is being delayed by one cycle via the register 660.

(18) In the fourth cycle, routers R0-R3 proceed to the fourth pipeline stage (i.e. ST/LT pipeline) where the winning flit traverses the crossbar switch to the selected output port in router R0 and is subsequently transmitted to router R3 bypassing routers R1 and R2.

(19) As illustrated by the example in FIG. 4, during the third pipeline stage, appropriate SSR signals are generated and transmitted to downstream routers in order for respective SA-G to arbitrate the routers in a suitable mode, either in bypass mode or normal mode. Since SA-G takes place after receiving the SSR from the upstream routers, SSR and SA-G occur serially during the third pipeline stage. Thus, the time required is typically longer than the other 3 pipeline stages. Each pipeline stage takes certain amount of time to complete. However, in order to ensure one clock frequency can be used, the clock frequency has to be derived based on the pipeline that requires the longest amount of time. In this instance, the clock frequency is based on the third pipeline stage. For example, assuming the first pipeline stage (i.e. BW/RC pipeline) takes 0.8 ns, second pipeline stage (i.e. SA-L pipeline) takes 0.9 ns, third pipeline stage (i.e. SSR and SA-G pipeline) takes 2 ns, fourth pipeline stage (i.e. ST/LT pipeline) takes 0.9 ns, the minimum clock frequency has to be based on the third pipeline stage which equates to 0.5 GHz (i.e. inverse of 2 ns).

(20) Similar to the SMART NoC, the EXTRA NoC consists of a number of EXTRA routers for sending messages in packets (or a portion of packets known as flits) where the flits can traverse multi-hops within one cycle by setting the three major control signals accordingly. In accordance with an embodiment of this disclosure, the EXTRA router consists of five pipeline stages: 1) BW/RC pipeline, 2) SA-L pipeline, 3) Setup Request (SR) pipeline, 4) SA-G pipeline, and 5) ST/LT pipeline. In the EXTRA router, SR and SA-G are separated into two pipeline stages. Hence, SR and SA-G can be performed serially under two separate pipeline stages. This increases the clock frequency and reduces the latency that the flits traverse through the EXTRA NoC as will be shown in FIGS. 6 and 7 below.

(21) FIG. 6 illustrates a timing diagram of the EXTRA router pipeline stages with 4 routers, namely, R0, R1, R2 and R3. Similar to the example as shown in FIG. 4, FIG. 6 also illustrates an example of a winning flit in R0 wishes to hop thrice to R3.

(22) The first, second and fifth pipeline stages processed in FIG. 6 are similar to first, second and fourth pipeline stages in FIG. 4. The main difference between FIGS. 4 and 6 is that the third pipeline stage in FIG. 4 is being separated into two pipeline stages. In the EXTRA router pipeline stages, 5 cycles are required to complete the 5 pipeline stages for the example shown in FIG. 6.

(23) For brevity, only the third and fourth pipeline stages would be discussed since the first, second and fifth pipeline stages remain the same. In the third pipeline stage (i.e. SR pipeline), SR signal is generated for the winning flit determined in the SA-L pipeline stage. The SR signals are then transmitted to downstream routers (i.e. R0 transmitted to R1-R3) via the SR wires 711. Similar to SSR wires 310, SR wires are dedicated repeated wires to connect the EXTRA routers so that upstream SR is communicatively connected to the SA-G of downstream EXTRA routers. During the third pipeline stage, the SA-L winning flit is being delayed by one cycle via the register 660.

(24) The time required to transmit the SR signal to downstream routers depends on the length of the SR wires 711. Thus, the time taken to transmit SR signals to downstream routers would increase as HPC increases since longer SR wires 711 are required to connect the upstream router to the downstream routers. Hence, alternatively, in order to shorten the time period for the third pipeline stage, the SR signals may be generated in the second pipeline stage (i.e. SA-L pipeline). Essentially, the SR pipeline is for handling the transmission of the SR signals from the start router, R0, to downstream routers, R1-R3. In other words, the SR signals may be generated either in the SA-L pipeline or the SR pipeline.

(25) In the fourth pipeline stage (i.e. SA-G pipeline), the SA-G receives SR signals from upstream router R0 and proceeds to arbitrate BW.sub.ena, BM.sub.sel, and XB.sub.sel accordingly. In this instance, R1 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, R2 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, and R3 sets BW.sub.ena as 1 to receive input and BM.sub.sel to 0 to stop bypass. During the fourth stage, the winning flit determined in the SA-L pipeline stage is being delayed by another cycle via the register 650.

(26) FIG. 7 illustrates a router architecture of the EXTRA router in accordance with this disclosure. For simplicity, only two input ports, namely Core.sub.in 610 and West.sub.in 620 and two output ports North.sub.out 630 and East.sub.out 640 are shown.

(27) In order to separate the SSR/SA-G pipeline in the original SMART architecture into two pipeline stages, i.e. SR pipeline and SA-G pipeline, an additional register 650 is added before the input of crossbar switch 670. Particularly, additional register 650 is provided between the register 660 at the output of the SA-L pipeline and the register 661 at the input of the ST/LT pipeline. The additional register 650 is required to delay the winning flit from the start router from traversing to the crossbar switch 670 by one cycle. In other words, the two registers 650 and 660 are required to delay the winning flit by two cycles since the original third pipeline stage is being separated into third and fourth pipeline stages.

(28) Registers are inserted in between pipeline stages and are clocked synchronously. Hence, register 680 is added before the input of SA-G to separate SA-G pipeline from SR pipeline. One skilled in the art will recognise that FIGS. 5 and 7 are meant for the purposes of illustrating the separation of the SA-G pipeline into two pipelines. Hence, only the relevant registers to illustrate the separation of the SA-G pipeline are produced in FIGS. 5 and 7.

(29) FIGS. 8 and 9 are representative block diagrams to illustrate the arrangement of the registers and the pipeline stages of a SMART router and EXTRA router respectively. As shown in FIG. 8, the SMART router comprises of 4 pipeline stages 810-840 with registers 851-855 being inserted between each of the pipeline stages, namely, BW/RC pipeline, SA-L pipeline, SA-G pipeline and ST/LT pipeline. As shown in FIG. 9, the EXTRA router comprises of 5 pipeline stages 910-950 with registers 961-966 being inserted between each of the pipeline stages, namely, BW/RC pipeline, SA-L pipeline, SR pipeline, SA-G pipeline and ST/LT pipeline.

(30) The time between each clock signal is set to be greater than the longest delay between pipeline stages, so that when the registers are clocked, the data that is written to them is the final result of the previous stage. Since the original third pipeline stage is being separated into two pipeline stages (i.e. third and fourth pipeline stages), the time required in the original third pipeline stage is also divided. For example, in this instance, assuming the first pipeline stage (i.e. BW/RC) takes 0.8 ns, second pipeline stage (i.e. SA-L) takes 0.9 ns, third pipeline stage (i.e. SR) takes 1 ns, fourth pipeline stage (SA-G) takes 0.9 ns, fifth pipeline stage (i.e. ST/LT) takes 0.9 ns, the clock frequency has to be based on the third pipeline stage which equates to 1 GHz (i.e. inverse of 1ns). Inevitably, this also reduces latency when compared to the original SMART router since a higher clock frequency is used. Particularly, the idle time for BW/RC, SA-L and ST/LT pipelines in the EXTRA router is reduced since a higher clock frequency is used when compared to the original SMART router configuration.

(31) The above is a description of embodiments of an EXTRA NoC in accordance with the present disclosure. It is foreseeable that those skilled in the art can and will design alternative EXTRA NoC based on this disclosure that infringe upon this invention as set forth in the following claims.