HEATSINK DEBRIS DETECTION AND DISLODGING

20250305782 ยท 2025-10-02

    Inventors

    Cpc classification

    International classification

    Abstract

    Systems and methods are provided for removing debris from a heatsink. A debris detection and dislodging module receives one or more sensor readings that exceed a configurable threshold at a heatsink of a component to be cooled. In response to the received sensor readings, enabling, by the debris detection and dislodging module, a controller to activate a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled. The dislodging apparatus sweeps the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold. Coolant flowing through a coldplate assembly in the heatsink carries the debris out of the fins of the heatsink.

    Claims

    1. A method for removing debris from a heatsink comprising: receiving, by a debris detection and dislodging module, one or more sensor readings that exceed a configurable threshold at a heatsink of a component to be cooled; in response to the received sensor readings, enabling, by the debris detection and dislodging module, a controller to activate a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled; and sweeping, by the dislodging apparatus, the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold, and wherein coolant flowing through a goldplate assembly in the heatsink carries the debris out of the fins of the heatsink.

    2. The method of claim 1, wherein the debris detection and dislodging module generates a notification when a filter requires cleaning.

    3. The method of claim 1, wherein a duration of the sweeping is a configurable count of jobs completed within a configurable window amount of time, wherein a first event triggers starting a timer and accumulating a counter of jobs, and wherein the counter and timer are reset to zero when the time ends without the configurable count of jobs completing.

    4. The method of claim 1, wherein a sweeping is triggered based on extracted sensor data and utilization data of the component to be cooled exceeding a threshold percentage.

    5. The method of claim 1, wherein a utilization threshold is defined as a percent deviation from configured preset sensor readings.

    6. The method of claim 1, wherein the dislodging apparatus performs at least one sweep of the component to be cooled, each of the sweepings beginning at a position with an action of rotating downward to position parallel members between the fins of the heatsink and drawing the dislodging apparatus through the component to be cooled before returning to the beginning position.

    7. The method of claim 1, wherein the dislodging apparatus performs more at least one sweep of the component to be cooled, each of the sweepings being an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.

    8. The method of claim 1, wherein the debris detection and dislodging module detects a difference between a baseline value and a configurable threshold to determine debris is present and a dislodge event is required.

    9. A computer program product for removing debris from a heatsink, the computer program product comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: receiving, by a debris detection and dislodging module, one or more sensor readings that exceed a configurable threshold at a heatsink of a component to be cooled; in response to the received sensor readings, enabling, by the debris detection and dislodging module, a controller to activate a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled; and sweeping, by the dislodging apparatus, the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold, and wherein coolant flowing through a coldplate assembly in the heatsink carries the debris out of the fins of the heatsink.

    10. The computer program product of claim 9, wherein the debris detection and dislodging module generates a notification when the filter requires cleaning.

    11. The computer program product of claim 9, further comprising one or more sensors, wherein the sensors include different sensors that measure temperature, power, pressure, or flow.

    12. The computer program product of claim 9, wherein a duration of the sweeping is a preset amount of time.

    13. The computer program product of claim 9, wherein the dislodging apparatus performs more than one sweep of the component to be cooled.

    14. The computer program product of claim 9, wherein each of the sweepings is an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.

    15. The computer program product of claim 9, wherein the debris detection and dislodging module detects a difference between a baseline value and the threshold to determine debris is present and a dislodge event is required.

    16. A computer system removing debris from a heatsink, comprising: one or more processors; a memory coupled to at least one of the processors; a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions of: receiving, by a debris detection and dislodging module, one or more sensor readings that exceed a configurable threshold at a heatsink of a component to be cooled; in response to the received sensor readings, enabling, by the debris detection and dislodging module, a controller to activate a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled; and sweeping, by the dislodging apparatus, the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold, and wherein coolant flowing through a coldplate assembly in the heatsink carries the debris out of the fins of the heatsink.

    17. The computer system of claim 16, wherein the debris detection and dislodging module generates a notification when the filter requires cleaning.

    18. The computer system of claim 16, further comprising one or more sensors, wherein the sensors include different sensors that measure temperature, power, pressure, or flow.

    19. The computer system of claim 16, wherein each of the sweepings is an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.

    20. The computer system of claim 16, wherein the debris detection and dislodging module detects a difference between a baseline value and the threshold to determine debris is present and a dislodge event is required.

    Description

    BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

    [0005] Embodiments are further directed to computer systems and computer program products having substantially the same features as the above-described computer-implemented method.

    [0006] FIG. 1 illustrates the operating environment of a computer server embodying a system for analyzing and efficiently routing incoming issues;

    [0007] FIG. 2 illustrates a network diagram of a system for detecting and dislodging heatsink debris;

    [0008] FIG. 3A illustrates a dislodging apparatus in its rest position during normal operation;

    [0009] FIG. 3B illustrates a dislodging apparatus in its uppermost position during dislodging action;

    [0010] FIG. 3C illustrates an alternate embodiment of a dislodging apparatus;

    [0011] FIG. 3D illustrates a view of the parallel members of FIGS. 3C and 3E;

    [0012] FIG. 3E illustrates an alternate embodiment of a dislodging apparatus;

    [0013] FIG. 4 illustrates a flow chart for the operation of a debris detection and dislodging module; and

    [0014] FIG. 5 illustrates a high-level implementation of a cooling loop.

    DETAILED DESCRIPTION OF THE INVENTION

    [0015] Heatsinks within water-cooled systems represent some of the narrowest channels through which water must flow. These heatsink channels increase the surface area that contacts the chilled water, enabling heat transfer of heat away from the chip. The fins, shown in the heatsinks in FIGS. 3A, 3B, 3C, and 3E, form the heatsink channels, and are primarily where blockages in the water-cooled systems occur. Coldplate heatsink assemblies use both rigid piping and flexible hose systems. The coldplate heatsink assemblies include a constant flow liquid cooling loop, such as that illustrated in FIG. 5, to continuously pull heat away from the component to be cooled. Systems experience an expected amount of wear during normal operation. Metal shavings, small components of the system, and other random debris can detach and begin circulating in the system before collecting in the heatsink channels and creating blockages.

    [0016] These blockages increase water pressure and decrease flow rate, which contribute to increases in operating temperatures. As a result, the chip can fail in the field, causing the system in which it is installed to fail as well.

    [0017] Embodiments of the present invention increase the lifespan of the chip in the system by dislodging small metal pieces and similar debris (debris) before blockages occur in the heatsink channels. The various embodiments include a dislodging apparatus that automatically removes the debris from the heatsink, thereby contributing to the lifespan of the chip being increased by preventing overheating. Employed within a coldplate heatsink, the debris is dislodged from between the fins in-situ, without opening the system. Embodiments include a method to detect a blockage and cause the dislodging apparatus to eject the blockage such that heatsink cooling is optimized. These novel structural changes are possible without significantly altering flow rate through the system and are controlled via a novel method of blockage remediation without opening the closed system.

    [0018] Embodiments of the present invention can be implemented generally in environments where heatsinks are present, for example, heatsink manufacturers, electronics manufacturers, and coldplate manufacturers. While embodiments of the present invention are primarily directed to coldplate heatsinks, they can apply to air cooled heatsinks, as long as there is a device downstream to catch any debris. In that case, air from fans would be flowing instead of liquid flowing between the fins of the heatsink.

    [0019] For example, FIG. 5 is a high-level image of a representative cooling loop 500 where embodiments of the present invention can be practiced. This embodiment shows a closed liquid cooling loop. Here, the pumps drive hot liquid up through the glycol-to-air heat exchanger. The liquid is hot having drawn heat away from the processors using the coldplate heatsinks in the processor drawers (not shown). Fans blow across the glycol-to-air heat exchanger to cool the liquid, after which the cooled liquid is pumped up through supply manifold hoses (hoses not shown) to the coldplate heatsinks in the processor drawers. In some embodiments, there are four coldplate heatsinks, but other embodiments may have any number of coldplate heatsinks depending on the implementation details. The return path between the processor drawers and the pumps is a preferred location for a filter, although the filter can be elsewhere in the loop.

    [0020] FIG. 1, an illustration is presented of the operating environment of a networked computer, according to an embodiment of the present invention.

    [0021] Computing environment 100 contains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, such as a system for heatsink debris detection and dislodging 200 (system), embodied in the debris detection and dislodging module 210 and the dislodging apparatus 255 of the coldplate assembly 250. In addition to block 200, computing environment 100 includes, for example, computer 101, wide area network (WAN) 102, end user device (EUD) 103, remote server 104, public cloud 105, and private cloud 106. In this embodiment, computer 101 includes processor set 110 (including processing circuitry 120 and cache 121), communication fabric 111, volatile memory 112, persistent storage 113 (including operating system 122 and block 200, as identified above), peripheral device set 114 (including user interface (UI), device set 123, storage 124, and Internet of Things (IoT) sensor set 125), and network module 115. Remote server 104 includes remote database 130. Public cloud 105 includes gateway 140, cloud orchestration module 141, host physical machine set 142, virtual machine set 143, and container set 144.

    [0022] COMPUTER 101 may take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network, or querying a database, such as remote database 130. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment 100, detailed discussion is focused on a single computer, specifically computer 101, to keep the presentation as simple as possible. Computer 101 may be located in a cloud, even though it is not shown in a cloud in FIG. 1. On the other hand, computer 101 is not required to be in a cloud except to any extent as may be affirmatively indicated.

    [0023] PROCESSOR SET 110 includes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitry 120 may be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitry 120 may implement multiple processor threads and/or multiple processor cores. Cache 121 is memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set 110. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located off chip. In some computing environments, processor set 110 may be designed for working with qubits and performing quantum computing.

    [0024] Computer readable program instructions are typically loaded onto computer 101 to cause a series of operational steps to be performed by processor set 110 of computer 101 and thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as the inventive methods). These computer readable program instructions are stored in various types of computer readable storage media, such as cache 121 and the other storage media discussed below. The program instructions, and associated data, are accessed by processor set 110 to control and direct performance of the inventive methods. In computing environment 100, at least some of the instructions for performing the inventive methods may be stored in block 200 in persistent storage 113.

    [0025] COMMUNICATION FABRIC 111 is the signal conduction paths that allow the various components of computer 101 to communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.

    [0026] VOLATILE MEMORY 112 is any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, the volatile memory is characterized by random access, but this is not required unless affirmatively indicated. In computer 101, the volatile memory 112 is located in a single package and is internal to computer 101, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer 101.

    [0027] PERSISTENT STORAGE 113 is any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computer 101 and/or directly to persistent storage 113. Persistent storage 113 may be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid-state storage devices. Operating system 122 may take several forms, such as various known proprietary operating systems or open-source Portable Operating System Interface type operating systems that employ a kernel. The code included in block 200 typically includes at least some of the computer code involved in performing the inventive methods.

    [0028] PERIPHERAL DEVICE SET 114 includes the set of peripheral devices of computer 101. Data communication connections between the peripheral devices and the other components of computer 101 may be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion type connections (for example, secure digital (SD) card), connections made through local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device set 123 may include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storage 124 is external storage, such as an external hard drive, or insertable storage, such as an SD card. Storage 124 may be persistent and/or volatile. In some embodiments, storage 124 may take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computer 101 is required to have a large amount of storage (for example, where computer 101 locally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor set 125 is made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.

    [0029] NETWORK MODULE 115 is the collection of computer software, hardware, and firmware that allows computer 101 to communicate with other computers through WAN 102. Network module 115 may include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network module 115 are performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network module 115 are performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computer 101 from an external computer or external storage device through a network adapter card or network interface included in network module 115.

    [0030] WAN 102 is any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.

    [0031] END USER DEVICE (EUD) 103 is any computer system that is used and controlled by an end user (for example, an administrator that operates computer 101), and may take any of the forms discussed above in connection with computer 101. For example, EUD 103 can be the external application by which an end user connects to the control node through WAN 102. In some embodiments, EUD 103 may be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.

    [0032] REMOTE SERVER 104 is any computer system that serves at least some data and/or functionality to computer 101. Remote server 104 may be controlled and used by the same entity that operates computer 101. Remote server 104 represents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer 101. For example, in a hypothetical case where computer 101 is designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computer 101 from remote database 130 of remote server 104.

    [0033] PUBLIC CLOUD 105 is any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economies of scale. The direct and active management of the computing resources of public cloud 105 is performed by the computer hardware and/or software of cloud orchestration module 141. The computing resources provided by public cloud 105 are typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set 142, which is the universe of physical computers in and/or available to public cloud 105. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine set 143 and/or containers from container set 144. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration module 141 manages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gateway 140 is the collection of computer software, hardware, and firmware that allows public cloud 105 to communicate through WAN 102.

    [0034] Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as images. A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.

    [0035] PRIVATE CLOUD 106 is similar to public cloud 105, except that the computing resources are only available for use by a single enterprise. While private cloud 106 is depicted as being in communication with WAN 102, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration, management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloud 105 and private cloud 106 are both part of a larger hybrid cloud.

    [0036] FIG. 2 shows a network diagram for a system 200 for detecting and dislodging heatsink debris.

    [0037] The system 200 includes a debris and dislodging module 210, one or more sensors 215, and a printed circuit board (PCB) 220 which are all interconnected via wired and or wireless network 205. The system 200 also contains at least one motor/actuator 245, a coldplate assembly 250, and a component to be cooled 270. In this context, an actuator is a device that moves an object to a different position, typically in a linear motion, thereby actuating the motion. A motor translates an energy source into rotation motion. In different embodiments, a linear or rotational motor can be used. The actuator can be combined with the motor to move the dislodging apparatus 255. Additionally, embodiments of the present invention can be practiced using either one or more motors, one or more actuators, or a combination of motors and actuators.

    [0038] The network 205 may be any wired and/or wireless communication protocol that allows data to be transferred between components of the system (e.g., PCIe, I.sup.2C, Bluetooth, Wi-Fi, Cellular (e.g., 3G, 4G, 5G), Ethernet, fiber optics, etc.).

    [0039] The debris and dislodging module 210 reads in sensors 215 data to determine when debris is caught in heatsink 260 and controls motor/actuator 245, via motor controller 225, connected to the dislodging apparatus 255. The debris and dislodging module 210 is contained in a memory module (not shown) on the PCB 120 or in another computing device (not shown). The method executed by debris and dislodging module 210 is described further with reference to FIG. 4.

    [0040] The sensors 215 include temperature, flow, and pressure sensors, or a combination thereof located on the component to be cooled 270 (e.g., processor, integrated circuit (IC) module). The sensors 215 detect a change in condition, i.e., increase in temperature or flow, under similar past utilization/conditions, thereby indicating that the heatsink 260 is not performing as efficiently as it had previously.

    [0041] The system 200 and the component to be cooled 270, either separately or together, can track data from similar past utilization/conditions. This data includes utilization, logs, metrics, events, telemetry data, and environmental conditions. Much of this data is already being tracked on computer systems such as the computing environment 100 and is accessible through the computer 101 single point of control, such as the support element (SE) or baseboard management controller (BMC). The data from the sensors 215 are tracked in addition to this utilization/condition data. In some embodiments, additional sensors may not be needed as there may already be thermal diodes on the component being cooled 270 to track temperature. Additionally, there may already be flow sensors in the cooling loop that produce data for other purposes, but that the system 200 can capture and incorporate.

    [0042] In this case, it is possible that the heatsink 260 has debris caught between its fins. In one or more embodiments, the sensors 215 may be flow sensors in the cooling path of the coldplate and a change in flow under similar past conditions indicates that the heatsink 260 is not performing as efficiently as it has previously and may have debris caught between fins.

    [0043] The PCB 220 is the electronics board where signals exist and where components are mounted that control and power motor/actuator 245 connected to dislodging apparatus 255. Other embodiments may not use a PCB 220 directly. For example, a coldplate manifold can have a heatsink, but also provide cooling for components downstream in the airflow or cooling loop path. A heatsink may also be connected to a component not mounted to a board, such as for a component mounted directly off the wafer, or for a component not mounted to a PCB 220, such as a motor.

    [0044] In one or more embodiments, the component to be cooled 270 is mounted on the PCB 220. In one or more embodiments, one or more components 225-240 are separate components that are not connected to a PCB 220. For example, a coldplate manifold could have a heatsink, but also provide cooling for components downstream in the airflow or cooling loop path. A heatsink may also be connected to a component not mounted to a board (e.g., directly off the wafer or another component not mounted to a PCB such as a motor or battery pack).

    [0045] The PCB 220 contains the motor controller 225, a motor enable signal 230, power 235, and a ground 240.

    [0046] The motor controller 225 receives program instructions from the debris and dislodging module 210 to generate a motor enable signal 230 for the motor/actuator 245. The power 235 and ground 240 provide the required power to the motor/actuator 245 when required. In one or more embodiments, the power required is dependent upon the weight of the dislodging apparatus and the fluids flow rate through the coldplate assembly 250.

    [0047] There are gravitational and frictional forces for the dislodging apparatus 255 that the motor/actuator 245 that controls the dislodging apparatus 255 must overcome. There is also force against the dislodging apparatus 255 from the fluid flowing past it. The motor/actuator 245 should be sized to overcome these forces acting against the dislodging apparatus 255. If the heatsink 260 is small with few fins and the flow rate is low, a small motor with less output power is required versus a large heatsink 260 with many fins and a high flow rate which would need more power to operate efficiently and prevent the dislodging apparatus 255 from overloading the motor/actuator 245 or receiving sufficient force to stop the dislodging apparatus 255.

    [0048] In one or more embodiments, the motor/actuator 245 may operate off power planes (e.g., 3.3Vdc, 5Vdc, 12Vdc) through a connector mounted to the PCB 220 or may be wired directly to pins/vias on the PCB 220. The motor/actuator 245 controls dislodging apparatus 255 upon receiving a motor enable signal 230 from the motor controller 225. In one or more embodiments, a motor is used to move the dislodging apparatus 255. In one or more embodiments, a linear or rotational actuator is used to move the dislodging apparatus 255. For example, an actuator is effective where more power is required to move the dislodging apparatus 255 along the linear path along the standoffs 310. A linear motor would be more beneficial if low force is sufficient in the given application.

    [0049] In one or more embodiments, a plurality of motor/actuator 245 can be used to ensure that the dislodging apparatus 255 remains on a desired track (e.g., does not tilt, cantilever, or break).

    [0050] FIGS. 3A and 3B illustrate a dislodging apparatus 255. FIG. 3A illustrates the dislodging apparatus 255 in its rest position during normal operation. FIG. 3B illustrates the dislodging apparatus 255 in its uppermost position during a dislodging action. The dislodging apparatus 255 can be at an acute angle to the base of the heatsink 260, for example 1 mm. In this case, one side of the dislodging apparatus 255 raises up slightly before the other side to form the acute angle, then the entire dislodging apparatus 255 raises up and down while maintaining the acute angle.

    [0051] The coldplate assembly 250 includes the dislodging apparatus 255, the heatsink 260, a filter 265, guide rails 310, and bumpers 320. The return path between the processor drawers and the pumps is a preferred location for the filter 265, although the filter 265 can be elsewhere in the cooling loop.

    [0052] The guide rails 310 ensure that the dislodging apparatus 255 remains straight when activated. The bumpers 320 ensure that the dislodging apparatus 255 and the heatsink 260 do not collide and cause damage to one another. The guide rails 310 and bumpers 320 are each optional. In some embodiments, the linear arm of the motor/actuator 245 can be used in place of these standoffs. When using the guide rails 310 and the bumpers 320, the arm(s) of the motor/actuator 245 can attach to the dislodging apparatus 255 at any point on the dislodging apparatus 255 to move it up and down.

    [0053] The heatsink 260 is located in the liquid flow path of the coldplate assembly 250 and helps to spread heat from the component to be cooled 270 for optimized cooling.

    [0054] The filter 265 is located downstream of the heatsink 260 within the coldplate assembly 250. The filter 265 catches any debris 325 that was dislodged from the heatsink 260 by the dislodging apparatus 255.

    [0055] The component to be cooled 270 (e.g., processor, IC package) is the entity that requires liquid cooling by the coldplate assembly 250. In one or more embodiments, the component to be cooled 270 is mounted to the PCB 220.

    [0056] FIG. 3B illustrates the dislodging apparatus 255 in its uppermost position halfway through a dislodging action with debris 325 removed from the system. The dislodging action ends with the return of the dislodging apparatus 255 to the position shown in FIG. 3A.

    [0057] In FIG. 3B, in response to a dislodge event being triggered by debris 325 being detected by the change in condition beyond a threshold by the sensors 215, the detection and dislodging module 210 sends instructions to the dislodging apparatus 255 to start in the position shown in FIG. 3A, rise to the position shown in FIG. 3B, and return to the position shown in FIG. 3B. During the movement of the dislodging apparatus 255, debris 325 that was lodged between the heatsink 260 fins is jarred free, such that liquid flowing through the coldplate assembly 250 over the heatsink 260 carries the debris 325 to the filter 265.

    [0058] FIG. 3C illustrates an alternate embodiment of a dislodging apparatus 255. In the figure, the dislodging apparatus 255 rotates down into position as shown by arrow 312. The dislodging apparatus 255 comprises at least one perpendicular member positioned adjacent to and perpendicular to the fins of the heatsink 260. The dislodging apparatus 255 is located at the opposite end of the heatsink 260 that is attached to the component to be cooled 270.

    [0059] At least one straight parallel member 313 for each channel is attached to the perpendicular member such that the parallel member 313 is positioned in the channel formed between two fins and parallel to the fins (x-axis). Subsequently upon activation, the perpendicular member of the dislodging apparatus 255 rotates downward in the direction shown by arrow 312 causing the attached parallel members 313 to change position from parallel to the fins to perpendicular to the fins (y-axis). The dislodging apparatus 255 is drawn through the fins of the heatsink 260 (either by actuator, motor, or combined actuator/motor) in the direction 311, thereby causing the parallel members 313 to dislodge and remove the debris 325. It should be noted that in some implementations, the dislodging apparatus 255 can be drawn in direction 311A as well or can be repeatedly drawn through the fins of the heatsink 260 as needed.

    [0060] FIG. 3D is a view of hook-like features that are implemented in some embodiments on the parallel members 313 of FIG. 3C and the curved members 314 of FIG. 3E. FIG. 3D a and c are profile views of the hook-like features, and b is a front view. These hook-like features are sized to occupy the channels between the fins of the heatsink 260 to catch and sweep away debris 325. While the hook-like features are shown as barbs, other shapes are possible, depending on the implementation.

    [0061] FIG. 3E illustrates an alternate embodiment of a dislodging apparatus 255. In the Figure, elements numbered similarly to those in FIG. 3C perform similar functions. The dislodging apparatus 255 comprises at least one perpendicular member positioned adjacent to and perpendicular to the fins of the heatsink 260. The dislodging apparatus 255 is located at the opposite end of the heatsink 260 that is attached to the component to be cooled 270.

    [0062] At least one curved member 314 for each channel is attached to the perpendicular member such that the curved member 314 is in the channel formed between two fins and is parallel to the fins (x-axis). Subsequently upon activation, the perpendicular member of the dislodging apparatus 255 rotates downward into position shown by arrow 312 causing the attached curved members 314 to change position from parallel to the fins to perpendicular to the fins (y-axis). In some embodiments, the dislodging apparatus 255 is drawn through the fins of the heatsink 260 (either by actuator, motor, or combined actuator/motor) in the direction 311, thereby causing the curved members 314 to dislodge and remove the debris 325. It should be noted that in some implementations, the dislodging apparatus 255 can be drawn in direction 311A as well or can be repeatedly drawn through the fins of the heatsink 260 in either or both directions as needed. In implementations the multiple rows of curved members 314 can be designed such that their downward rotation occupies all the space between the fins and does not require that the dislodging apparatus 255 be drawn through the heatsink 260.

    [0063] Three rows of curved members are shown. However, in both FIGS. 3C and 3E, any number of rows can be implemented.

    [0064] Although both parallel members 313 and curved members 314 are shown, other configurations of members are possible, depending on the implementation and the component to be cooled.

    [0065] FIG. 4 illustrates a flow chart for the operation of the debris and dislodging module 210 of the system 200. The debris and dislodging module 210 reads in the sensors 215 data to determine when debris is caught in the heatsink 260. The debris and dislodging module 210 controls the motor/actuator 245, via the motor controller 225, connected to dislodging apparatus 255 to dislodge debris.

    [0066] Beginning at 405, the debris detection and dislodging module 210 disables the motor enable signal 230 that controls the motor/actuator 245 via the motor controller 225 (i.e., turns off the motor/actuator 245). In some embodiments, a switch can be implemented to remove the power 235. In one or more embodiments, the debris detection and dislodging module 210 ensures that the dislodging apparatus 255 is in a desired state before disabling the motor enable signal 230 from the motor controller 225 (i.e., the dislodging apparatus 255 is all the way up or all the way down such that it does not hinder the performance of the heatsink 260 by blocking flow.

    [0067] In some embodiments, the position of the dislodging apparatus 255 is detected by reading sensors 215. Additional sensors 215 monitor the location of the dislodging apparatus 255, for example, by extracting the linear position of the motor/actuator 245 arm(s). In some embodiments, this is done based on expected time that it takes for the dislodging apparatus 255 to complete one sweep with an additional buffer added in case a piece of debris slows down movement of dislodging apparatus 255. The debris and dislodging module 210 extracts sensor data along with the current utilization of the component to be cooled. A utilization threshold can be defined as a percent deviation from typical sensor readings under similar utilization (e.g., >5% temperature or pressure increase from the average under similar utilization).

    [0068] In a simpler embodiment, the threshold can be set based on preset conditions derived from lab/development data (e.g., at >10% utilization the sensors should not exceed x, at 10%-20% utilization sensor readings should not exceed y, for example).

    [0069] At 410, the debris detection and dislodging module 210 monitors the one or more sensors 215 (e.g., flow, pressure, temperature, etc.) within the coldplate assembly loop 250 to determine if debris 325 is caught in the heatsink 260.

    [0070] At 415, the debris detection and dislodging module 210 determines if a threshold sensors 215 reading has been triggered.

    [0071] In one or more embodiments, the sensors 215 are temperature sensors located on the component to be cooled 270 (e.g., processor, integrated circuit (IC) module) where a dynamic temperature threshold is used based on the current utilization/conditions. In one or more embodiments, the sensors 215 are flow sensors in the cooling path and the threshold is a specified flow rate based on the current utilization/conditions. In one or more embodiments, the sensors 215 are pressure sensors in the cooling path and the threshold is a specified pressure level based on the current utilization/conditions.

    [0072] If the threshold sensors 215 reading has not been triggered (415 No branch), the debris detection and dislodging module 210 loops back to 410 to continue monitoring.

    [0073] If the threshold sensors 215 reading has been triggered (415 Yes branch), the debris detection and dislodging module 210 continues to 420 to enable the motor controller 225 to activate the motor/actuator 245, thereby controlling the dislodging apparatus 255.

    [0074] At 425, the debris detection and dislodging module 210 delays while the dislodging apparatus 255 completes the job of sweeping through the heatsink 260. The delay can be a preset time with a buffer to account for any slowdown caused by the debris movement or based on sensor or positional readings from the motors/actuators. The exact amount of the delay could be determined in the lab or during development or based on the actuator/motor speed.

    [0075] In some embodiments, separate signals will be sent to change the direction of the actuators/motors. One signal to send it up, one signal to send it down. Each of those may use either a time delay (with buffer) or sensor readings as mentioned above. Typical sensor readings under similar utilization conditions are monitored. The sweeping process is started based on a detected difference. The sweeping continues until the sensor readings return back to expected levels based on what is expected for the current utilization. In some embodiments, a max number of sweeps, for example twenty, can be defined just so the sweeping does not continue if the sensor readings do not return to normal for some reason.

    [0076] In one or more embodiments, the dislodging apparatus 255 may complete more than one full sweep. A full sweep action comprises starting at the initial bottom position as shown in FIG. 3A then moving all the way up through the fins, and all the way back down, or according to the movements described in FIGS. 3A and 3C. In one or more embodiments, the dislodging apparatus 255 is enabled until a sensors 215 reading returns back to an expected level below the threshold. In one or more embodiments, the dislodging apparatus 255 may be enabled for a preset amount of time. In one or more embodiments, the debris detection and dislodging module uses the threshold to detect a difference from a baseline value to indicate that debris 325 is present and that a dislodge event is required.

    [0077] At 430, the debris detection and dislodging module 210 determines whether a threshold number of jobs (i.e., enablements of the motor controller 225) were run within a threshold amount of time (e.g., three jobs in one hour, 5 jobs in one day, etc.). This is tracked by a counter within a window of time. The timer starts after the first event is triggered and is reset to zero if the time limit is reached and the counter has not reached a threshold number within that window.

    [0078] If the threshold number of jobs were not run within a threshold amount of time (430 No branch), this indicates that previous sweeps were successful in removing debris 325 since the threshold was not reached and the debris detection and dislodging module 210 loops back to 405 to disable the motor controller 225.

    [0079] In one or more embodiments, the debris detection and dislodging module 210 may send a notification for successful dislodgement of debris 325. However, the filter 255 does not need changing after every dislodging apparatus 255 operation. The system 200 could be configured to suggest a filter change after a threshold count of successful debris dislodgements have been detected since the last filter change.

    [0080] If a threshold number of jobs were run within a threshold amount of time (330 Yes branch), the debris detection and dislodging module 210 generates a notification (e.g., call home, system reference code (SRC), etc.) such that the heatsink 260 (or subcomponent that contains the heatsink 260) can be serviced (manually cleaned) or replaced due to the inability to remove debris 325 from the heatsink 260.

    [0081] As may be used herein, the terms substantially and approximately provides an industry-accepted tolerance for its corresponding term and/or relativity between items. Such an industry-accepted tolerance ranges from less than one percent to fifty percent and corresponds to, but is not limited to, component values, integrated circuit process variations, temperature variations, rise and fall times, and/or thermal noise. Such relativity between items ranges from a difference of a few percent to magnitude differences. As may also be used herein, the term(s) configured to, operably coupled to, coupled to, and/or coupling includes direct coupling between items and/or indirect coupling between items via an intervening item (e.g., an item includes, but is not limited to, a component, an element, a circuit, and/or a module) where, for an example of indirect coupling, the intervening item does not modify the information of a signal but may adjust its current level, voltage level, and/or power level. As may further be used herein, inferred coupling (i.e., where one element is coupled to another element by inference) includes direct and indirect coupling between two items in the same manner as coupled to. As may even further be used herein, the term configured to, operable to, coupled to, or operably coupled to indicates that an item includes one or more of power connections, input(s), output(s), etc., to perform, when activated, one or more its corresponding functions and may further include inferred coupling to one or more other items. As may still further be used herein, the term associated with, includes direct and/or indirect coupling of separate items and/or one item being embedded within another item.

    [0082] One or more embodiments have been described above with the aid of method steps illustrating the performance of specified functions and relationships thereof. The boundaries and sequence of these functional building blocks and method steps have been arbitrarily defined herein for convenience of description. Alternate boundaries and sequences can be defined so long as the specified functions and relationships are appropriately performed. Any such alternate boundaries or sequences are thus within the scope and spirit of the claims. Further, the boundaries of these functional building blocks have been arbitrarily defined for convenience of description. Alternate boundaries could be defined as long as the certain significant functions are appropriately performed. Similarly, flow diagram blocks may also have been arbitrarily defined herein to illustrate certain significant functionality.

    [0083] To the extent used, the flow diagram block boundaries and sequence could have been defined otherwise and still perform the certain significant functionality. Such alternate definitions of both functional building blocks and flow diagram blocks and sequences are thus within the scope and spirit of the claims. One of average skill in the art will also recognize that the functional building blocks, and other illustrative blocks, modules, and components herein, can be implemented as illustrated or by discrete components, application specific integrated circuits, processors executing appropriate software and the like or any combination thereof.

    [0084] The one or more embodiments are used herein to illustrate one or more aspects, one or more features, one or more concepts, and/or one or more examples. A physical embodiment of an apparatus, an article of manufacture, a machine, and/or of a process may include one or more of the aspects, features, concepts, examples, etc. described with reference to one or more of the embodiments discussed herein. Further, from Figure to Figure, the embodiments may incorporate the same or similarly named functions, steps, modules, etc. that may use the same or different reference numbers and, as such, the functions, steps, modules, etc. may be the same or similar functions, steps, modules, etc. or different ones.

    [0085] The term module is used in the description of one or more of the embodiments. A module implements one or more functions via a device such as a processor or other processing device or other hardware that may include or operate in association with a memory that stores operational instructions. A module may operate independently and/or in conjunction with software and/or firmware. As also used herein, a module may contain one or more sub-modules, each of which may be one or more modules.

    [0086] As may further be used herein, a computer readable memory includes one or more memory elements. A memory element may be a separate memory device, multiple memory devices, or a set of memory locations within a memory device. Such a memory device may be a read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. The memory device may be in a form a solid-state memory, a hard drive memory, cloud memory, thumb drive, server memory, computing device memory, and/or other physical medium for storing digital information. A computer readable memory/storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

    [0087] While particular combinations of various functions and features of the one or more embodiments have been expressly described herein, other combinations of these features and functions are likewise possible. The present disclosure is not limited by the particular examples disclosed herein and expressly incorporates these other combinations.