Menu

Immersion Cooling for Transportable HPC

October 27, 2022

Immersion Cooling

By Braden Cooper, Product Marketing Manager

The latest high-performance computing systems for AI (Artificial Intelligence) applications generate more heat than ever before. Datacenters have begun adoption of immersion cooling solutions that immerse the temperature-sensitive electronics in a non-conductive fluid which efficiently dissipates the heat. In parallel, many AI edge applications are transitioning from low-performing embedded systems to solutions which incorporate more advanced enterprise compute hardware. To solve both the thermal and structural challenges of the rugged edge, system integrators look to immersion cooling technology to meet their environmental specifications.

The NVIDIA SXM form factor GPUs used in HGX platforms, which dissipated up to 500W in the A100, have increased to 700W per GPU in the H100. Platforms integrating the HGX H100 4-GPU or the HGX H100 8-GPU backplanes must dissipate an additional 800-1600W of heat compared to existing A100 based platforms. The heat generated by these devices has introduced a thermal dissipation requirement beyond what the existing industry is equipped to handle. In rugged edge environments, AI compute integrators seek to leverage the advances in datacenter liquid cooling while solving the complexities of environmental parameters for the target application.

The characterization of a target environment can vary depending on the location of system integration. For example, an autonomous trucking vehicle requiring high-performance AI computing may see temperatures ranging from below freezing in winter months to extreme heat conditions – exacerbated by local climates. The trucks will also experience the rigors of road travel including vibration, shock, and humidity conditions. Meanwhile, integrating a high-performance solution on an aircraft may require a system with a more extreme operational temperature range, which mitigates the impact of altitude and lower air-density on cooling. A common theme to the rugged design of systems in these AI Transportable applications is the need for a robust cooling strategy designed to support a wide temperature range and alleviate mechanical stress from external vibration or shock loads.

Immersion cooling is a strong candidate to solve both the thermal and structural elements of these applications. From a thermal standpoint, liquid immersion cooling (either single-phase or two-phase) offers the greatest thermal efficiency of any liquid cooling method. What this efficiency means is that liquid immersed systems can operate at a higher external ambient temperature than systems with a direct-to-chip liquid cooling implementation, and significantly higher ambient temperatures than systems with air-cooling (forced or natural convection). On the structural side, the immersion fluid itself acts as a dampening mechanism – mitigating the impact of vibration forces on the electronics.

The two methods of immersion cooling, single-phase and two-phase, each have pros and cons when implemented in edge environments. Single-phase immersion cooling uses a fluid which maintains its liquid state across the entire target temperature range. This cooling method works similarly to air cooling, in that the fluid is typically directed across a heatsink attached to a heat dissipating surface. The warm fluid is then pumped to an external heat exchanger, which then recirculates the cooled fluid back into the system. By using the specialized immersion fluid, this method cools more efficiently than direct-to-chip cooling with a lower delta between ambient and maximum fluid temperatures. Two-phase cooling by comparison makes use of the natural heat dissipation properties of the evaporation cycle of the fluid. In this method, a fluid is selected which has a boiling point below the maximum operating temperature of key heat dissipating components. Once the fluid around the components reaches its boiling point, it turns from a liquid to gaseous state, pulling the heat away from the hot components. The gas is then cooled via contact with a condensing coil until it condenses and is recirculated back into the system. Two-phase cooling is typically more efficient than single-phase, but does add complexities, as the system must be sufficiently sealed to prevent the fluid in a gaseous state from escaping. Additionally, altitude can change the boiling point property of the fluid, requiring either strict altitude limits or a pressurized system to maintain fluid properties.

OSS Rigel Edge Supercomputer Two-Phase Immersion System

OSS' Rigel Edge Supercomputer Two-Phase Immersion System

Immersion technology has already made a significant impact on datacenter high-performance computing scale out – providing an efficient upgrade path to support the next generation of AI computing hardware. Adapting these technologies to rugged edge environments is inevitable as thermal dissipation requirements continue to grow. As AI deployments become more common in autonomous vehicles and other edge domains, system integrators will look to immersion technology as a strong candidate to solve the thermal and structural challenges of the environments.

Click the buttons below to share this blog post!

Return to the main Blog page




Leave a comment

Comments will be approved before showing up.


Also in One Stop Systems Blog

Edge Computing
The Four Types of Edge Computing

April 17, 2024

“Edge Computing” is a term which has been widely adopted by the tech sector. Dominant leaders in accelerated computing have designated “Edge” as one of their fastest-growing segments, with FY24 revenue projected to be nearly $100 billion. The boom in the market for Edge Computing has become so significant that it is increasingly common to see companies create their own edge-related spinoff terms such as ‘Rugged Edge’, ‘Edge AI’, ‘Extreme Edge’, and a whole slew of other new buzzwords. 

Continue Reading

Datalogging in Autonomous Military
Unveiling the Strategic Edge: Datalogging in Autonomous Military Vehicles

March 11, 2024

The landscape of modern warfare is undergoing a profound transformation with the integration of cutting-edge technologies, and at the forefront of this evolution are autonomous military vehicles. Datalogging, a seemingly inconspicuous yet indispensable technology, plays a pivotal role in shaping the capabilities and effectiveness of these autonomous marvels. In this blog post, we delve into the critical role of datalogging in autonomous military vehicles and its impact on the future of defense strategies.

Continue Reading

Redundancy and Management of Rugged Edge Servers
Redundancy and Management of Rugged Edge Servers

February 13, 2024 2 Comments

Computer server redundancy, including backup power supplies, RAID storage devices and applications that automatically fail-over, keeps critical systems up and running longer than non-redundant systems. Similarly, effective system monitoring can provide early warning of failures and allow system managers to remotely manage these systems, further improving application uptime. While the concepts of computer system redundancy and system management are well-established in all levels of computing, from the personal computer to the largest hyperscale datacenters, the unique challenges of placing datacenter-class computing elements performing AI applications in mobile edge environments, like aircraft, ships, and land vehicles, brings unique challenges to system redundancy and management. 

Continue Reading

You are now leaving the OSS website