Dedicated SSR pipeline stage of router for express traversal (EXTRA) NoC
10554584 ยท 2020-02-04
Assignee
Inventors
Cpc classification
H04L12/433
ELECTRICITY
International classification
Abstract
This invention is related to an Express Traversal (EXTRA) Network on Chip (NoC) comprising a number of EXTRA routers. The EXTRA NoC comprises a Buffer Write and Route Computation (BW/RC) pipeline, a Switch Allocation-Local (SA-L) pipeline, a Setup Request (SR) pipeline, a Switch Allocation-Global (SA-G) pipeline, and a Switch Traversal and Link Traversal (ST/LT) pipeline. The BW/RC pipeline is configured to write an incoming flit to an input buffer(s) of a start EXTRA router and compute the route for the incoming head flit by selecting an output port to depart from the start EXTRA router. The SA-L pipeline is configured to arbitrate the start EXTRA router to choose an input port and an output port for a winning flit. The SR pipeline is configured to handle the transmission of a number of SR signals from the start EXTRA router to downstream EXTRA routers.
Claims
1. An Express Traversal (EXTRA) Network on Chip (NoC) comprising a plurality of EXTRA routers, the EXTRA NoC comprising: a buffer write and route computation (BW/RC) pipeline configured to write an incoming flit to an input buffer of a start EXTRA router and compute a route for the incoming head flit by selecting an output port to depart from the start EXTRA router; a switch allocation-local (SA-L) pipeline configured to arbitrate the start EXTRA router to choose an input port and an output port for a winning flit; a setup request (SR) pipeline configured to handle transmission of a plurality of SR signals from the start EXTRA router to downstream EXTRA routers via SR wires; a switch allocation-global (SA-G) pipeline configured to: receive the SR signals from the start EXTRA router via the SR wires; and arbitrate, based on the SR signals received from the start EXTRA router, three signals including: a buffer write enable (BW.sub.ena) signal for a local buffered flit, a bypass mux (BM.sub.sel) signal for a first crossbar switch, and a crossbar select (XB.sub.sel) signal for a second crossbar switch of a selected output port, to build an express path for the winning flit to traverse multiple hops to a destination EXTRA router within one cycle of the start EXTRA router; and a switch traversal and link traversal (ST/LT) pipeline configured to: traverse the winning flit to the selected output port of the start EXTRA router, and transmit the winning flit to the destination EXTRA router bypassing, via the express path, at least one EXTRA router between the start EXTRA router and destination EXTRA router.
2. The EXTRA NoC according to claim 1 wherein the plurality of SR signals are generated by the SA-L pipeline.
3. The EXTRA NoC according to claim 1 wherein the plurality of SR signals are generated by the SR pipeline.
4. The EXTRA NoC according to claim 1 further comprising a plurality of registers inserted between any two adjacent pipelines of the BW/RC, SA-L, SR, SA-G, and ST/LT pipelines.
5. The EXTRA NoC according to claim 4, wherein the plurality of registers are clocked synchronously.
6. A method of traversing flits in an Express Traversal (EXTRA) Network on Chip (NoC) having a plurality of EXTRA routers, the method comprising: (A) in a buffer write and route computation (BW/RC) pipeline: writing an incoming flit to an input buffer(s) of a start EXTRA router, and computing a route for the incoming head flit by selecting an output port to depart from the start EXTRA router; (B) in a switch allocation-local (SA-L) pipeline: arbitrating the start EXTRA router to choose an input port and an output port for a winning flit; (C) in a setup request (SR) pipeline: handling transmission of a plurality of SR signals from the start EXTRA router to downstream EXTRA routers via SR wires; (D) in a switch allocation-global (SA-G) pipeline: receiving the SR signals from the start EXTRA router via the SR wires, and arbitrating, based on the SR signals received from the start EXTRA router, three signals including: a buffer write enable (BW.sub.ena) signal for a local buffered flit, a bypass mux (BM.sub.sel) signal for a first crossbar switch, and a crossbar select (XB.sub.sel) signal for a second crossbar switch of a selected output port, to build an express path for the winning flit to traverse multiple hops to a destination EXTRA router within one cycle of the start EXTRA router; and (E) in a switch traversal and link traversal (ST/LT) pipeline: traversing the winning flit to the selected output port of the start EXTRA router, transmiting the winning flit to a destination EXTRA router, and bypassing, via the express path, at least one EXTRA router between the start EXTRA router and destination EXTRA router.
7. The method according to claim 6 wherein the plurality of SR signals are generated by the SA-L pipeline.
8. The method according to claim 6 wherein the plurality of SR signals are generated by the SR pipeline.
9. The method according to claim 6 wherein a time period of the pipelines are regulated by a plurality of registers inserted between any two adjacent pipelines of the BW/RC, SA-L, SR, SA-G, and ST/LT pipelines.
10. The method according to claim 9, wherein the plurality of registers are clocked synchronously.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The above advantages and features in accordance with this disclosure are described in the following detailed description and are shown in the following drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) This disclosure relates to an EXTRA NoC. Particularly, this disclosure relates to separating one of the pipeline stages in the SMART NoC to improve the clock frequency of an EXTRA router.
(12) The details of a SMART router can be found in the following reference, T Krishna et al., Breaking the On-Ship Latency Barrier Using SMART, in High-Performance Computer Architecture (HPCA) 2013. As this disclosure is a modification of the SMART router, certain details of the SMART router are omitted for brevity.
(13) To enable higher clock frequency, it is proposed that the SA-G pipeline stage is separated into two pipeline stages. Through this method, the clock frequency of the EXTRA routers can be increased. As a result, the latency that the flits and packets traverse through the EXTRA NoC can be greatly reduced. Further details will now be described.
(14)
(15) As mentioned above, there are four pipeline stages for a SMART router. In the example as shown in
(16) The example as shown in
(17) In the third pipeline stage (i.e. SA-G pipeline), the routers R0-R3 arbitrate among the SSR signals they received to set the BW.sub.ena, BM.sub.sel, and XB.sub.sel signals accordingly to build an express path for a winning flit in router R0 to traverse multiple hops within one cycle to router R3. Hence, R0 begins the third pipeline stage (i.e. SA-G pipeline) in the third cycle by generating SSR signals, via the SSR generator 710. It then transmits SSR signals, via the register 720, to the downstream routers R1, R2 and R3. In response to receiving the SSR signals from R0, the SA-G of R1 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, the SA-G of R2 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, and the SA-G of R3 sets BW.sub.ena as 1 to receive input and BM.sub.sel to 0 to stop bypass. During the third pipeline stage, instead of the winning flit traversing to the crossbar 670, the winning flit is being delayed by one cycle via the register 660.
(18) In the fourth cycle, routers R0-R3 proceed to the fourth pipeline stage (i.e. ST/LT pipeline) where the winning flit traverses the crossbar switch to the selected output port in router R0 and is subsequently transmitted to router R3 bypassing routers R1 and R2.
(19) As illustrated by the example in
(20) Similar to the SMART NoC, the EXTRA NoC consists of a number of EXTRA routers for sending messages in packets (or a portion of packets known as flits) where the flits can traverse multi-hops within one cycle by setting the three major control signals accordingly. In accordance with an embodiment of this disclosure, the EXTRA router consists of five pipeline stages: 1) BW/RC pipeline, 2) SA-L pipeline, 3) Setup Request (SR) pipeline, 4) SA-G pipeline, and 5) ST/LT pipeline. In the EXTRA router, SR and SA-G are separated into two pipeline stages. Hence, SR and SA-G can be performed serially under two separate pipeline stages. This increases the clock frequency and reduces the latency that the flits traverse through the EXTRA NoC as will be shown in
(21)
(22) The first, second and fifth pipeline stages processed in
(23) For brevity, only the third and fourth pipeline stages would be discussed since the first, second and fifth pipeline stages remain the same. In the third pipeline stage (i.e. SR pipeline), SR signal is generated for the winning flit determined in the SA-L pipeline stage. The SR signals are then transmitted to downstream routers (i.e. R0 transmitted to R1-R3) via the SR wires 711. Similar to SSR wires 310, SR wires are dedicated repeated wires to connect the EXTRA routers so that upstream SR is communicatively connected to the SA-G of downstream EXTRA routers. During the third pipeline stage, the SA-L winning flit is being delayed by one cycle via the register 660.
(24) The time required to transmit the SR signal to downstream routers depends on the length of the SR wires 711. Thus, the time taken to transmit SR signals to downstream routers would increase as HPC increases since longer SR wires 711 are required to connect the upstream router to the downstream routers. Hence, alternatively, in order to shorten the time period for the third pipeline stage, the SR signals may be generated in the second pipeline stage (i.e. SA-L pipeline). Essentially, the SR pipeline is for handling the transmission of the SR signals from the start router, R0, to downstream routers, R1-R3. In other words, the SR signals may be generated either in the SA-L pipeline or the SR pipeline.
(25) In the fourth pipeline stage (i.e. SA-G pipeline), the SA-G receives SR signals from upstream router R0 and proceeds to arbitrate BW.sub.ena, BM.sub.sel, and XB.sub.sel accordingly. In this instance, R1 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, R2 sets BM.sub.sel as bypass and XB.sub.sel as W.sub.in to E.sub.out, and R3 sets BW.sub.ena as 1 to receive input and BM.sub.sel to 0 to stop bypass. During the fourth stage, the winning flit determined in the SA-L pipeline stage is being delayed by another cycle via the register 650.
(26)
(27) In order to separate the SSR/SA-G pipeline in the original SMART architecture into two pipeline stages, i.e. SR pipeline and SA-G pipeline, an additional register 650 is added before the input of crossbar switch 670. Particularly, additional register 650 is provided between the register 660 at the output of the SA-L pipeline and the register 661 at the input of the ST/LT pipeline. The additional register 650 is required to delay the winning flit from the start router from traversing to the crossbar switch 670 by one cycle. In other words, the two registers 650 and 660 are required to delay the winning flit by two cycles since the original third pipeline stage is being separated into third and fourth pipeline stages.
(28) Registers are inserted in between pipeline stages and are clocked synchronously. Hence, register 680 is added before the input of SA-G to separate SA-G pipeline from SR pipeline. One skilled in the art will recognise that
(29)
(30) The time between each clock signal is set to be greater than the longest delay between pipeline stages, so that when the registers are clocked, the data that is written to them is the final result of the previous stage. Since the original third pipeline stage is being separated into two pipeline stages (i.e. third and fourth pipeline stages), the time required in the original third pipeline stage is also divided. For example, in this instance, assuming the first pipeline stage (i.e. BW/RC) takes 0.8 ns, second pipeline stage (i.e. SA-L) takes 0.9 ns, third pipeline stage (i.e. SR) takes 1 ns, fourth pipeline stage (SA-G) takes 0.9 ns, fifth pipeline stage (i.e. ST/LT) takes 0.9 ns, the clock frequency has to be based on the third pipeline stage which equates to 1 GHz (i.e. inverse of 1ns). Inevitably, this also reduces latency when compared to the original SMART router since a higher clock frequency is used. Particularly, the idle time for BW/RC, SA-L and ST/LT pipelines in the EXTRA router is reduced since a higher clock frequency is used when compared to the original SMART router configuration.
(31) The above is a description of embodiments of an EXTRA NoC in accordance with the present disclosure. It is foreseeable that those skilled in the art can and will design alternative EXTRA NoC based on this disclosure that infringe upon this invention as set forth in the following claims.