Which Cooling Method is Best for AI Transportables?
January 24, 2023
By Braden Cooper, Product Marketing Manager
The most powerful artificial intelligence computing hardware is designed to thrive in a datacenter environment where there is uncapped clean power, near limitless cooling capacity, and a vibration-free environment. The growth of AI use cases in vehicles including automated crop management, autonomous long-haul freight, and military ISR aircraft necessitates the use of datacenter-oriented hardware in vehicles – particularly for initial developments while more customized size, weight, and power (SWaP) optimized embedded platforms are developed. The transition from friendly environmental conditions to the rigors of the road require system designs which mitigate the thermal, structural, and other challenging environmental conditions of the transportable application. The thermal design is in a critical state – with the latest AI-oriented GPUs and CPUs reaching heat flux densities never before seen. Advanced thermal management designs provide a path to solving the heat flux challenge – but each come with advantages and disadvantages in implementation. This infographic highlights some of the methods which can be used to cool systems in AI transportable applications.
The best cooling method depends on many variables – from heat flux density to the SWaP constraints. With these existing technologies and ongoing industry innovation – powerful enterprise hardware can be used to solve the most demanding AI transportable challenges. The next few years are pivotal in the advancement of thermal management within datacenters – as immersion cooling and improved thermal interface materials see wider adoption. Transitioning these same cooling methods to AI Transportables solves the need for higher compute capacity at the location of data generation.
With the latest high-performance GPUs and CPUs reaching TDP’s of greater than 500W, innovative cooling solutions are needed to bring maximum performance to the harshest environments. While some cooling methods are acceptable for datacenters, the size, weight, temperature, power, noise, and vibration constraints of vehicles introduce new challenges.
1. Conduction (Natural Convection) Heat moves from heat generating components to the case of the system via conduction through a combination of thermal interface materials and heat pipes. The enclosure then dissipates heat to the surrounding environment - often through fins built into the chassis. Key Factors: - Heat dissipated through system enclosure - Heat moves through system via contact with thermal interface materials or heat pipes Pros: - High shock/vibration tolerance - Passive cooling - no added power consumption to cool Cons: - Limited cooling capacity - Limited system performance - Thermal interface limits repairability
2. Forced Convection (Air/Fans) Heat is conducted from components to heatsinks, transferred to air provided by fans, then exhausted out of the enclosure. Fan quantity, size, and electrical properties dictate the effectiveness and supported temperature range of the system. Key Factors: - Trade-offs between size, noise, power, and cooling capacity - Uses environment air - no external heat exchanger required Pros: - Wide range of supported heat loads and environmental conditions - Low cost per performance makes good candidate for medium heat requirements Cons: - High noise output - Fan serviceability challenges - Not effective for high heat output components
3. Direct-to-Chip Liquid Cooling Heat is transferred to a fluid being pumped through a coldplate and cooling loop which touches the primary heat sources within a system. The hot fluid exits the system and is cooled by an external heat exchanger before recirculating into the system. All-in-one systems cool the liquid through an integrated radiator within or attached to the system Key Factors: - Fluid properties and flow rate dictate performance - Industrial grade components limit risk of leaks - Variety of fluids to fit different applications and heat loads Pros: - Wide range of supported heat loads and environmental conditions - Low cost per performance makes good candidate for medium heat requirements Cons: - Limited effectiveness in extreme ambient temperatures - Dependent on heat exchanger to cool fluid
4. Single-phase Immersion Cooling The system is immersed in a non-conductive fluid. Heat is transfered to the fluid from the heat generating components, then the fluid exits the system and is cooled by an external heat exchanger before recirculating into the system. The fluid is often directed across the primary heat sources by pumps to improve cooling capacity and efficiency. Key Factors: - Fluid properties and system design dictate performance - High mass density of fluid changes SWaP profile considerably Pros: - High thermal efficiency enables high ambient temperature applications - Can dampen impact of vibration based on system design Cons: - Additional weight limits transportable applications - Limited field serviceability - Dependent on heat exchanger to cool fluid
5. Two-phase Immersion Cooling The system is immersed in a non-conductive fluid which has a boiling point near the target operating point of a key heat generating component. Once the fluid reaches its boiling point, the fluid changes phases to a gaseous state and rises to the surface of the system, pulling heat out of the fluid. The gas is then cooled and recondenses on a condensing coil to recirculate within the system. Key Factors: - Latent heat property of fluid dictate performance - High thermal efficiency enables extreme environmental applications Pros: - Supports highest ambient temperature of all methods - Small power overhead to enable cooling cycle Cons: - Engineered fluids expensive and application specific - Fluid property variations at altitude limit aerospace applications
The integration of artificial intelligence (AI) into military operations has revolutionized battlefield strategies, decision-making, and operational efficiency. Among these advancements, AI inference nodes deployed directly on soldiers represents a cutting-edge innovation. These nodes, compact computational devices, enable real-time AI processing and analytics, empowering soldiers with enhanced situational awareness, decision support, and operational effectiveness. However, such technology also brings challenges, particularly in power management, size, and weight constraints. This blog delves into the advantages and disadvantages of implementing AI inference nodes on soldiers, focusing on these critical aspects.
The evolution of IT infrastructure spans several decades and is marked by significant advancements in computing technology, networking, storage, and management practices. Data Centers have historically relied on Converged or Hyper-Converged infrastructures when deploying their hardware which proved to limited in flexibility, efficiency, scalability, and support for the Artificial Intelligence / Machine Learning (AI/ML) modern workloads of today.
“Edge Computing” is a term which has been widely adopted by the tech sector. Dominant leaders in accelerated computing have designated “Edge” as one of their fastest-growing segments, with FY24 revenue projected to be nearly $100 billion. The boom in the market for Edge Computing has become so significant that it is increasingly common to see companies create their own edge-related spinoff terms such as ‘Rugged Edge’, ‘Edge AI’, ‘Extreme Edge’, and a whole slew of other new buzzwords.