ADVANCED CENTRALIZED CHRONOS NoC
20220358069 · 2022-11-10
Inventors
- Stefano GIACONI (San Diego, CA, US)
- Giacomo RINALDI (SAN DIEGO, CA, US)
- Matheus GIBILUKA (San Diego, CA, US)
Cpc classification
G06F13/4022
PHYSICS
G06F13/385
PHYSICS
International classification
Abstract
System and methods for an Advance Centralized Chronos Network on Chip (ACC-NoC) design are disclosed. The ACC-NoC is able to efficiently satisfy interconnect traffic requirements of modern Systems of Chip and simplify top level timing closure while providing high throughput and low latency. The ACC-NoC in a System on Chip may include a centralized intelligent switch and arbitration engine communicatively coupled to different intellectual property (IP) blocks through series of one or more Chronos Channels which transmit data using delay insensitive (DI) codes and quasi-delay-insensitive (QDI) logic.
Claims
1. A Network-on-Chip (NOC) comprising: a switch and arbitration engine; a plurality of intellectual property (IP) block interfaces; communication channels communicatively coupled between the switch and arbitration engine and each of the plurality of IP block interfaces, wherein each of the communication channels is configured to encode data using delay insensitive coding and transmit the encoded data using a quasi-delay insensitive logic circuit and a clock-less temporal compression ratio.
2. The NOC of claim 1, wherein each of the communication channels is configured to serially distribute portions of the encoded data into a plurality of temporal slots based, in part, on the clock-less temporal compression ratio.
3. The NOC of claim 1, wherein the communication channels are configured to decouple a clock of the switch and arbitration engine from the plurality of IP block interfaces.
4. The NOC of claim 1, wherein the communication channels are configured to: transmit data using an asynchronous signal and transform the asynchronous signal into a synchronous domain at each of the plurality of IP block interfaces.
5. A Network-on-Chip (NOC) system comprising: a plurality of intellectual property (IP) blocks; a centralized switch block; and communication channels coupled between the centralized switch block and one or more of the plurality of IP blocks, wherein each of the communication channels is configured (i) to transmit data between the centralized switch block and the one or more of the plurality of IP blocks and (ii) to encode the data using delay insensitive coding and transmit the encoded data using a quasi-delay insensitive logic and a clock-less temporal compression ratio.
6. The NOC system of claim 5, wherein the communication channels are configured to decouple a first clock of the centralized switch block from second clocks of the one or more of the plurality of IP blocks.
7. The NOC system of claim 5, wherein the centralized switch block comprises one of a crossbar and a network-on-chip.
8. The NOC system of claim 5, wherein each of the communication channels is insensitive to process, voltage, and temperature (PVT) variations.
9. The NOC system of claim 5, wherein the communication channels are configured to serially distribute portions of the encoded data into a plurality of temporal slots based, in part, on the clock-less temporal compression ratio and serially transmit the encoded data as temporally-compressed delay-insensitive asynchronous data.
10. The NOC system of claim 5, wherein the delay insensitive coding comprises analog signals.
11. The NOC system of claim 5, wherein a latency of each of the communication channels is independent of clock frequencies of the NOC system.
12. The NOC system of claim 5, wherein each of the communication channels is configured to translate a traditional handshake communication protocol into a compressed delay insensitive communication protocol wherein original control signals are not propagated to the communicative channel but embedded in the data itself.
13. A System on Chip (SoC) comprising: a high speed (HS) switch block; a medium speed (MS) switch block; one or more fast IP blocks; one or more medium speed IP blocks; first communication channels coupled between the HS switch block and each of the one or more fast IP blocks; second communication channels coupled between the MS switch block and each of the one or more medium speed IP blocks; and a third communication channel coupled between the HS switch block and the MS switch block, wherein each of the first communication channels, the second communication channels, and the third communication channel is configured to encode data using delay insensitive coding and transmit the encoded data using a quasi-delay insensitive logic circuit and a clock-less temporal compression ratio.
14. The SoC of claim 13, wherein a latency of each of the first communication channels, the second communication channels, and the third communication channel is independent of a clock frequency of the SoC.
15. The SoC of claim 13, wherein the one or more fast IP blocks comprises one or more of: a double data rate (DDR) block, a microcontroller unit (MCU), an array processor (AP), a tensor processing unit (TPU), and a graphics processing unit (GPU).
16. The SoC of claim 13, wherein the one or more medium speed IP blocks comprises one or more of: an ethernet and a universal serial bus block.
17. The SoC of claim 13, wherein each of the first communication channels, the second communication channels, and the third communication channel includes a first interface and a second interface, wherein a signal frequency at the first interface is decoupled from a signal frequency at the second interface.
18. The SoC of claim 13, wherein a latency of each of the first communication channels is independent of a clock frequency of the HS switch block.
19. The NOC system of claim 13, wherein each of the first communication channels, the second communication channels, and the third communication channel is configured to translate a traditional handshake communication protocol into a compressed delay insensitive communication protocol wherein original control signals are not propagated to the communicative channel but embedded in the data itself.
20. The NOC system of claim 13, wherein each of the first communication channels, the second communication channels, and the third communication channel is configured to serially distribute portions of the encoded data into a plurality of temporal slots based, in part, on the clock-less temporal compression ratio and serially transmit the encoded data as temporally-compressed delay-insensitive asynchronous data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The above and other aspects and features of the present inventive concept will be more apparent by describing example embodiments with reference to the accompanying drawings, in which:
[0019]
[0020]
[0021]
DETAILED DESCRIPTION
[0022] While certain embodiments are described, these embodiments are presented by way of example only, and are not intended to limit the scope of protection. The methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the example methods and systems described herein may be made without departing from the scope of protection.
[0023] This invention describes an Advanced Centralized Chronos NoC which is able to efficiently satisfy the interconnect traffic requirement of modern SoC, simplifying top level timing closure while providing high throughput and low latency.
[0024]
[0025] To implement a Chronos Channel in a target technology, different circuits can be employed.
[0026] An encoder 111 is responsible for transforming the input data (e.g., input data received from a producer IP block to be transmitted to a consumer IP block), which is represented using “m” wires, into encoded data that uses “k” wires and a specific DI code. A Chronos Channel requires “j” encoders 111, where “j” is the size of the input data divided by the size of the DI code of choice. Also, encoder blocks 111 may require input control signals to indicate the validity of the data in their inputs. A clock signal (clockA) can be used for synchronous data inputs and an enable signal (enableA) can be used to enable or disable data consumption in order to fulfil specific data transmission protocol requirements. These encoder blocks 111 also generate an output control signal to indicate when the Chronos Channel is full and cannot accept new data. Note that data in either the inputs or the outputs of an encoder 111 can be digital or analog.
[0027] The TC 112 splits a “j” sized set of encoded data in “j/i” (or the temporal compression ratio) “i” sized sets of encoded data. Then, the TC 112 issues each of the “j/i” sets in its outputs, one at a time. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. Note that the maximum time to transmit each of the “j/i” sets is the delay of the slot defined by the target cycle time divided by the compression ratio. In this way, and assuming that the remaining parts of the circuit will also be able to consume the data while guaranteeing cycle time performance, all the “j/i” sets will be sent in one cycle time. The outputs of the TC 112 can feed either a repeater 130 or the TD 122 directly. Also, note that in case “j/i” is not a natural number, but rather a positive rational number, the TC 112 will use only the required number of its outputs in the transmission of the last slots of data. Nevertheless, the division of the cycle time in slots will still be a natural number defined as the ceiling function of “j/i”.
[0028] Repeaters 130 have memory elements and are capable of holding encoded data and sending it to a next repeater or the TD 122. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. Furthermore, the maximum time to transmit each of the “j/i” sets is also the delay of the slot defined by the target cycle time divided by the compression ratio. Note that repeaters 130 may or may not be required in a Chronos Channel, as they are used to fix slot delay violations in long paths that fail to meet cycle time requirements or to improve signal strength. Also, note that different numbers of repeaters 130 may be required for the different outputs of a TC 112. This is valid because, in a Chronos Channel, there is no global control signal dictating how events flow through the data path. Rather, each path from an output of a TC 112 to the input of a TD 122 has an independent flow control. Again, the only restriction is the specified cycle time.
[0029] The TD 122 merges “q/i” sets of encoded data, each with size “i”, in a single set of encoded data with size “q”. Then the TD 122 issues the whole “q” sized set in its outputs, which feed the decoder blocks 121. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. In this circuit, the maximum time to consume each of the “q/i” sets is the delay of the slot defined by the target cycle time divided by its compression ratio. Note that, in some embodiments, TDs 122 can have a different compression ratio than that of the TC 112 and can generate sets with a different size from those originally consumed by the TC 112. This is particularly useful when connecting transmitters and receivers with different clock frequencies. Also, if the compression ratio of the TD 122 is a positive rational number, it will only use the required number of its inputs in the consumption of the last slots of data.
[0030] The decoder 121 is responsible for transforming input encoded data, which is represented using “k” wires and a specific DI code, back to the original input data that used “m” wires. In various embodiments, the decoder 121 is configured to transform the input encoded data to form a representation of the data signals input to the encoders 111, the representation being compliant to an input data format of the consumer IP block. To decode data, a Chronos Channel needs “q” decoders, as defined in the compression ratio of the TD 133. A decoder block may also require input control signals to indicate that data in its outputs was successfully collected. To do so, a clock signal (clockB) can be used, for synchronous data outputs, and an enable signal (enableB) can be used to enable or disable the generation of new data in the outputs of the Chronos Channel, to fulfil specific data transmission protocol requirements. Furthermore, decoders 121 also generate an output control signal to indicate when they are empty, which means there is no data in the Chronos Channel to be consumed. Note that data in either the inputs or the outputs of a decoder 121 can be digital or analog.
[0031] Another important concept in a Chronos Channel is the definition of TX and RX blocks. As
[0032] Due to the asynchronous communication between TX and RX blocks 110 and 120, a Chronos Channel can interface transmitters and receivers that operate at different frequencies and with different data bus widths (as the compression ratios can be different in the TX and RX blocks 110 and 120). However, to avoid data loss, it must be ensured that the receiver consumes data as fast as the producer generates new data. To do so, the output throughput must be greater or equal to the input throughput. More specifically, recalling
[0033] The usage of controllers coupled to the TX 110 and RX 120 can enable avoiding the requirement of constrained frequencies between transmitter and receiver blocks. Such controllers must be able to implement a communication protocol using the control signals provided by the TX and RX blocks 110 and 120. Note that these signals allow implementing a variety of communication protocols, such as (and not limited to) handshake- or credit-based protocols. The coupling of controllers to a Chronos Channel generates what is called a Chronos Link, and enables leveraging the full flexibility of Chronos Channels. This is because transmitters and receivers connected to Chronos Links can be completely asynchronous to each other and communication may be established by a handshake procedure without any need to perform complex timing closure. An example of such an implementation is given in U.S. Pat. No. 9,977,853, the disclosure of which is incorporated herein by reference in its entirety.
[0034] Further examples of the Chronos Chanel are described in U.S. Pat. Nos. 9,977,852 and 9,977,853, the disclosures of which are incorporated herein by reference in their entireties as if set forth in full.
[0035]
[0036] The proposed architecture of the ACC-NoC in
[0037] The architecture of
[0038] The architecture of