Methods and Apparatus for Fault Tolerance in Multi-Wavelength Optical Interconnect Networks
20230318701 ยท 2023-10-05
Inventors
- Soheil Hashemi (Broomfield, CO, US)
- Ryan Boesch (Louisville, CO, US)
- David R. Thomas (Boulder, CO, US)
Cpc classification
International classification
Abstract
Systems and methods for enabling robust fault tolerance targeting runtime failures in multi-wavelength optical links. The proposed embodiment relies on built-in lane redundancy where failure can be detected and repaired during runtime and in an online fashion. Features allow out-of-band and side-band communication.
Claims
1. Apparatus for detecting and repairing faults in an optical communication system comprising: spaced apart nodes connected by an optical fiber; each node having an optical engine comprising multiple optical transmitters and multiple optical receivers including a redundant transmitter and a redundant receiver, wavelength multiplexer/demultiplexer, and link control circuitry; wherein the optical transmitters emit at differing wavelengths and are coupled into the optical fiber through the wavelength multiplexer/demultiplexer, and additional wavelengths are demultiplexed from the optical fiber to the optical receivers through the wavelength multiplexer/demultiplexer; wherein a lane is defined as a transmitter at an optical engine at a first node, the optical fiber, and a receiver at an optical engine at a second node; wherein link control circuitry is configured to detect a faulty lane while the apparatus is communicating; wherein link control circuitry at the first node and link control circuitry at the second node are further configured to communicate with each other to identify the faulty lane, send data to the redundant lane, deskew the redundant lane data, and turn off the faulty lane.
2. The apparatus of claim 1 wherein multiple lanes are designated as redundant lanes to replace multiple faulty lanes.
3. The apparatus of claim 1 wherein the optical multiplexer/demultiplexers are thin-film filter zig-zag multiplexer/demultiplexers with some filter bands reserved for redundant wavelengths.
4. The apparatus of claim 1 wherein each optical engine includes two links, each link comprising multiple optical transmitters and multiple optical receivers including a redundant transmitter and a redundant receiver, and link control circuitry.
5. The apparatus of claim 4 wherein each optical engine comprises 32 links.
6. The apparatus of claim 1 wherein the optical transmitters and optical receivers are configured to be surface normal to the optical engine.
7. The apparatus of claim 6 wherein optical transmitters and optical receivers are directly integrated on a silicon logic layer of an optical engine comprising physical and data link layers.
8. The apparatus of claim 6, wherein the optical transmitters are vertical-cavity surface-emitting lasers with cavities tuned for wavelengths partitioned across a wavelength band, some of those wavelengths being redundant; and wherein the receivers are broadband photodetectors responsive across the wavelength band.
9. A method of detecting and repairing faults in an optical communication system having nodes spaced apart from each other and connected via an optical fiber, the method comprising the steps of: providing at each node an optical engine comprising multiple primary optical transmitters and multiple primary optical receivers, one redundant optical transmitter and one redundant optical receiver, link control circuitry, and a wavelength multiplexer/demultiplexer; providing multiple primary lanes between the nodes, wherein a lane is defined as a primary transmitter at an optical engine at a far-side node, the optical fiber, and a primary receiver at an optical engine at a near-side node; communicating between the nodes via the primary lanes; transmitting from optical transmitters at differing wavelengths; coupling the differing wavelength transmissions into the optical fiber via the wavelength multiplexer/demultiplexer; demultiplexing the differing wavelength transmissions from the optical fiber to the optical receivers via the wavelength multiplexer/demultiplexer; monitoring communication and detecting faulty primary lanes while communicating; identifying a faulty primary lane at the near-side receiver of the faulty primary lane; failover event communication of the faulty primary lane from the near-side of the faulty primary lane to the far-side of the faulty primary lane; creating a redundant lane using a redundant transmitter on the optical engine containing the primary transmitter of the faulty primary lane and a redundant receiver on the optical engine containing the primary receiver of the faulty primary lane; training the redundant lane; sending data from the far-side primary transmitter of the faulty lane on the far-side redundant transmitter of the redundant lane as well; deskewing the redundant lane based on data at the primary receiver of the faulty primary lane; and disabling the faulty primary lane after the deskewing step.
10. The method of claim 9 wherein multiple redundant transmitters and redundant receivers are provided to allow multiple redundant lanes to be created;
11. The method of claim 9 wherein the step of detecting a faulty lane evaluates communication errors.
12. The method of claim 11 wherein the step of detecting faulty lanes evaluates retry logs.
13. The method of claim 9 wherein the step of detecting faulty lanes evaluates error counts.
14. The method of claim 9 wherein the step of detecting faulty lanes performs an eye scan.
15. The method of claim 9 wherein the step of detecting faulty lanes utilizes an analog to digital converter (ADC) histogram.
16. The method of claim 9, wherein the failover event communicating step is performed using an idle redundant transmitter as sideband.
17. The method of claim 9, wherein the failover event communicating step is performed using an out-of-band fabric manager.
18. The method of claim 9, wherein, during the deskew step, the data sent on the redundant lane is mirroring the data sent on the faulty lane.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION OF THE INVENTION
[0021]
TABLE-US-00001 TABLE 1 100 Carrier board or substrate 110 Link end-points 110A Near side link end-point 110B Far side link end-point 120 Electrical channel 130 Fabric manager 135 Out-of-band management fabric 150 Optical engine IC (OE) 150A Near side OE 150B Far side OE 151 Data interface 152 Per-link link control logic 153 Individual link 154 Optical element (transmit or receive) 155 Transmit optical element 156 Receive optical element 157 Redundant transmit element 158 Redundant receive element 159 Redundant lanes 160 Primary lanes 170 Node 200 Optical multiplexer/demultiplexer 250 Optical fiber 500 Initial link training 502 Normal operation 504 Failure detected? 506 Identify faulty lane 508 Signal to link partner to begin switching the problem lane to the redundant 510 Redundant lane training 512 Link partner mirrors data from problem lane to redundant lane, deskew redundant lane with this information 514 Link partner disables problem lane transmitter
[0022] Table 1 lists elements of the present invention and their corresponding reference numbers.
[0023]
[0024]
[0025] In
[0026]
[0027]
[0028] Following lane training, in step 512 the selected far-side redundant transmitter 157 switches to mirroring the faulty lane data, enabling the receiver to deskew the redundant lane. Finally the faulty lane is turned off in step 514 and the link resumes normal operation 502.