HIGH PRECISION MULTI-CHIP CLOCK SYNCHRONIZATION
20210152269 · 2021-05-20
Inventors
Cpc classification
H04Q9/00
ELECTRICITY
H04L7/0008
ELECTRICITY
H04J3/0667
ELECTRICITY
H03K5/135
ELECTRICITY
G06F1/12
PHYSICS
International classification
H03K5/135
ELECTRICITY
Abstract
A sensor network, which includes a sensor controller serially coupled to a plurality of sensor modules, is configured to program the sensor modules so as to transfer measurement data to the sensor controller and to synchronize the sensor modules to picosecond accuracy via on-chip or on-module custom circuits and a physical layer protocol. The sensor network has applications for use in PET, LiDAR or FLIM applications. Synchronization, within picosecond accuracy, is achieved through use of a picosecond time digitization circuit. Specifically, the picosecond time digitization circuit is used to measure on-chip delays with high accuracy and precision. The delay measurements are directly comparable between separate chips even with voltage and temperature variations between chips.
Claims
1. A circuit, comprising: a first integrated circuit (“IC”) and a second IC; a delay loop comprising at least one transmission path between the first IC and the second IC for calculating a propagation delay time, T.sub.latency, between the first IC and the second IC; a synchronizing circuit on the second IC for synchronizing timing of the first IC to timing of the second IC, comprising: a circuit for generating a plurality of timing signals for the second IC; at least one time digitizer unit (TDU) for receiving a global offset, comprising global timing information and the T.sub.latency, and for generating a timestamp, synchronized to the timing of the first IC, in response to an event strobe, the TDU comprising: a circuit for receiving the timing signals and for generating a count representing a plurality of least significant bits (LSBs) of the timestamp in response to the event strobe; and at least one adder for adding the global offset to the LSBs of the timestamp to generate the timestamp for the event.
2. The circuit as set forth in claim 1, wherein the timing circuit for generating a plurality of timing signals comprises a phase lock loop (PLL) for generating timing signals, wherein a gate delay of the PLL comprises a least significant bit of less than or equal to 50 picoseconds for the timestamp.
3. The circuit as set forth in claim 2, wherein the PLL comprises a phase-frequency detector, a charge pump, loop filter, M-stage ring oscillator, phase interpolator and divider circuits for phase locking at least one of the timing signals to a system reference clock.
4. The circuit as set forth in claim 1, wherein the TDU further comprises: a plurality of flip-flops and an encoder for receiving the timing signals from the timing circuit and the strobe and for generating the at least one of the least significant bits (LSBs) of the timestamp; a counter for generating a count; and wherein, the adder adds the global offset, the count and the LSBs of the timestamp to generate the timestamp for the event.
5. The circuit as set forth in claim 4, wherein: the flip-flops and the encoder further for generating a n bit binary representation for the LSBs of the timestamps.
6. The circuit as set forth in claim 1, wherein the delay loop further comprises: a first TDU and a transmitter, located on the first IC, a first transmission path between the first IC and a second IC, a loopback path in the second IC, a second transmission path between the second IC and the first IC, a receiver on the first IC and a second TDU on the first IC; and datapath logic, located on the first IC, for propagating a signal in the delay loop and for calculating the propagation delay time, T.sub.latency, from the timestamps of the first and second TDUs.
7. The circuit as set forth in claim 1, further comprising a plurality of the TDUs for generating a plurality of timestamps on the second IC.
8. A method for synchronizing two integrated circuits, comprising: calculating a propagation delay time, T.sub.latency, between a first IC and a second IC; synchronizing timing of the first IC to timing of the second IC, by: generating a plurality of timing signals for the second IC; receiving a global offset, comprising global timing information and the T.sub.latency; generating a timestamp at the second IC, in at least one time digitizer unit (TDU), synchronized to the timing of the first IC, in response to an event strobe by: receiving the timing signals and for generating a count representing a plurality of least significant bits (LSBs) of the timestamp in response to the event strobe; and adding the global offset to the LSBs of the timestamp to generate the timestamp for the event.
9. The method as set forth in claim 8, wherein generating a plurality of timing signals comprises generating a plurality of timing signals in a phase lock loop (PLL), wherein a gate delay of the PLL comprises a least significant bit of less than or equal to 50 picoseconds for the timestamp.
10. The method as set forth in claim 9, wherein generating a plurality of timing signals comprises phase locking at least one of the timing signals to a system reference clock in the PLL that comprises a phase-frequency detector, a charge pump, loop filter, M-stage ring oscillator, phase interpolator and divider circuits.
11. The method as set forth in claim 8, wherein the TDU further comprises: a plurality of flip-flops and an encoder for receiving the timing signals from the timing circuit and the strobe and for generating the at least one of the least significant bits (LSBs) of the timestamp; a counter for generating a count; and an adder for adding the global offset, the count and the LSBs of the timestamp to generate the timestamp for the event.
12. The method as set forth in claim 11, further comprising generating a n bit binary representation for the LSBs of the timestamps
13. The method as set forth in claim 8, wherein calculating a propagation delay time, T.sub.latency, between a first IC and a second IC further comprises: propagating a signal, from the first IC, through a first TDU and a transmitter, located on the first IC, a first transmission path between the first IC and a second IC, a loopback path in the second IC, a second transmission path between the second IC and the first IC, a receiver on the first IC and a second TDU on the first IC; and calculating the propagation delay time, T.sub.latency, from the timestamps of the first and second TDUs.
14. The method as set forth in claim 8, further comprising generating a plurality of timestamps on the second IC from a plurality of the TDU.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.
[0011]
[0012]
[0013]
[0014]
[0015]
DETAILED DESCRIPTION
[0016] Today, there exists networking protocols that allow for the time synchronization of local area network (LAN) elements such as routers, switches and network interface cards within computers, test instrumentation or factory machines. Precision Time Protocol (PTP) is designed to be compatible with Ethernet, the dominant LAN technology in use today. PTP is designed for sub-microsecond time accuracy. White Rabbit (WR) is based on PTP and Synchronous Ethernet and is designed for sub-nanosecond accuracy. White Rabbit was designed for the instrument synchronization, control and data transfers required for large-scale particle physics experiments (e.g. the LHC experiment at CERN).
[0017] Both PTP and WR involve the use of network elements that have time stamping capability. The time at which an Ethernet frame is sent or received can be recorded by all compliant network elements. A master-slave hierarchy is specified that allows the master network elements to update the current time at the slave elements. Because master and slave are connected via an electrical cable or fiber optic cable there will be a time delay (or latency) in communication between them. The protocol gives masters the ability to estimate the time delay with which they communicate with their slaves. This allows the masters to pass their local time to the slaves along with the delay information. The slaves then update their local time to the master's time plus the master-to-slave message delay. The result is that the master and slave time readings are nominally identical.
[0018] PTP and WR do not give sufficient time synchronization for the aforementioned PET, LiDAR and FLIM applications where time measurement accuracies of picosecond to tens of picoseconds are required. WR has been shown to achieve hundreds of picoseconds of timing accuracy. It involves a coarse time stamping technology with complex sequence of message passing between master and slave that allows the slave to shift its internal clock in fine phase steps so as to achieve synchronization. WR is designed so as to minimize the need for custom integrated circuits designed specifically for the synchronization process.
[0019] To achieve accuracy in the range 1-100 ps, custom integrated circuits are required to measure time stamps with sub-100 ps accuracy. These circuits must be combined with a physical layer protocol for measuring the latency in the connections between network elements.
[0020]
[0021] The sensor network (200) allow the sensor controller (210) to a) configure and program the sensor modules (230, 240, 250 and 260), b) the sensor modules (230, 240, 250 and 260) to transfer measurement data to the sensor controller (210), and c) synchronize all the sensor modules (230, 240, 250 and 260) to picosecond accuracy via on-chip or on-module custom circuits and a physical layer protocol.
[0022]
[0023] A PLL (300) consists of a 4-stage ring oscillator (implemented using differential circuits) (302), a phase interpolator (308), divider circuits (310 and 312), a phase-frequency detector (PFD) (306), a charge pump (CP) and loop filter (LF) (304) is phase-locked to the system reference clock (305). With the reference clock (305) at 156.25 MHz and the dividers (310 and 312), as shown in
[0024] Note that each TDU (320, 322, 324 and 326) samples the in-phase and quadrature outputs of the divide-by-2 circuit (310). This allows the correct divider state to be sampled while correcting for non-zero divider delay. The correct divider output is chosen based on the sampled state of the ring oscillator stage that clocks the divider.
[0025] The time stamping circuit further includes counter 314, shift left 316, summing circuits (317, 318, 334, 336, 338 and 332). These circuits (314, 316, 317, 318, 334, 336, 338 and 332) allow a global time offset to be injected into the 64-b time representation so as to correct the local chip time relative to a master time with a precision of one TDU LSB. The phase interpolator (308) allows the 4 VCO (302) output signals to be shifted together in sub-LSB steps. This allows for fine time control that would allow chip-to-chip synchronization to less than one LSB.
[0026] An arbitrary number of TDUs can be used on a chip. However, it is important that the propagation delays of the 6 clock signals from the PLL be the same for all TDUs. This can be achieved by trace matching and repeater delay matching methods that are part of the known art in custom integrated circuit design.
[0027] Note that the LSB precision of the time stamping circuit can be improved by running the ring oscillator (302) faster (by dissipating additional power) or by implementing the circuit (303) in a more advanced process technology. In addition, interpolation between ring oscillator (302) stages can be used to reduce the LSB size further. An LSB of <10 ps should be readily achievable with commercially-available 16 nm finFET CMOS processes.
[0028] Synchronization is performed via a master-slave algorithm where, for two chips next to each other on the daisy chain, the chip electrically closer to the controller is the master and the chip farther from the controller is the slave. The master causes the slave to update its internal time to match that of the master.
[0029] The sensor network (200,
[0030]
[0031] The master can measure the round drip delay using its on-chip TDUs (414 and 420). The round trip delay can be expressed as:
T.sub.round=DTX1+DD12+DRX2+DLB+DTX2+DD21+DRX1
[0032] DTX1 represents the delay through the transmitter (418) (which may include the serializer) on master IC (412). DRX1 represents the delay through the receiver (435) (which may include the deserializer). DTX2 and DRX2 represent transmitter (450) and receiver (435) delays on slave IC (430). DLB represents the delay through the loopback path (440) on slave IC (430). DD12 represents the propagation delay in the interconnect (426) carrying data from master IC 412 to slave IC 430. DD21 represents the propagation delay in the interconnect (455) carrying data from slave IC 430 to master IC (412). Note that the interconnects (426 and 455) may be a printed circuit board traces, electrical cables, or fiber optic cables.
[0033] The desired quantity is the latency from master to slave:
T.sub.latency=DTX1+DD12+DRX2
[0034] Once T.sub.latency is known by the master IC (412), then it can send its own internal time reading to the slave IC (430) along with the value of T.sub.latency. The slave IC (430) then updates its internal time to master's time plus the T.sub.latency. If the link is completely symmetrical and DLB is zero then the latency could be determined by simply halving the T.sub.round measurement value. In practice this is unrealistic because of implementation details associated with the serializer/deserializer (SerDes) circuits used to realize modern high-speed data networks. The delay through a SerDes TX and RX can vary by multiple bit intervals with the initialization state of the serializer and deserializer subcircuits. Such circuits always include dividers, which will initialize in non-deterministic states. Therefore the delay through a serializer or deserializer is not known unless it is specifically reset. In addition, because of chip-to-chip power supply and temperature variations, there is no guarantee that DTX1 will be the same as DTX2 and that DRX1 will be the same as DRX2. Finally, DLB will not be zero.
[0035]
[0036] Using the on-chip TDUs (512, 514, 532, 536, 548, 550, 565, 580), DTX1, DRX1, DTX2, DRX2 and DLB can be measured. By passing data back and forth between the master IC (510) and the slave IC (530), the value of DD12+DD21 can be determined by the master IC (510). Assuming the two passive interconnect (525 and 555) delays are equal, then the master-to-slave latency T.sub.latency can be determined using equation (2). For a bidirectional interface (525 and 555) made from matched printed circuit board (PCB) traces, it is a good assumption that DD12 and DD21 are equal. If fiber optic cables are used for the bidirectional interface (525 and 555), then the delay asymmetry can be characterized over temperature and cable length and used to correct the value of DD12.
[0037] Note that synchronization between network elements, connected via electrical traces on a PCB, electrical cables or fiber optic cables, is disclosed herein. However, these embodiments may be extended to wireless connections between network elements, such as RF links and free-space optics.
[0038] Although the present invention has been described in terms of specific exemplary embodiments, it will be appreciated that various modifications and alterations might be made by those skilled in the art without departing from the spirit and scope of the invention.