Immersion Cooling for Transportable HPC

October 27, 2022

Immersion Cooling

By Braden Cooper, Product Marketing Manager

The latest high-performance computing systems for AI (Artificial Intelligence) applications generate more heat than ever before. Datacenters have begun adoption of immersion cooling solutions that immerse the temperature-sensitive electronics in a non-conductive fluid which efficiently dissipates the heat. In parallel, many AI edge applications are transitioning from low-performing embedded systems to solutions which incorporate more advanced enterprise compute hardware. To solve both the thermal and structural challenges of the rugged edge, system integrators look to immersion cooling technology to meet their environmental specifications.

The NVIDIA SXM form factor GPUs used in HGX platforms, which dissipated up to 500W in the A100, have increased to 700W per GPU in the H100. Platforms integrating the HGX H100 4-GPU or the HGX H100 8-GPU backplanes must dissipate an additional 800-1600W of heat compared to existing A100 based platforms. The heat generated by these devices has introduced a thermal dissipation requirement beyond what the existing industry is equipped to handle. In rugged edge environments, AI compute integrators seek to leverage the advances in datacenter liquid cooling while solving the complexities of environmental parameters for the target application.

The characterization of a target environment can vary depending on the location of system integration. For example, an autonomous trucking vehicle requiring high-performance AI computing may see temperatures ranging from below freezing in winter months to extreme heat conditions – exacerbated by local climates. The trucks will also experience the rigors of road travel including vibration, shock, and humidity conditions. Meanwhile, integrating a high-performance solution on an aircraft may require a system with a more extreme operational temperature range, which mitigates the impact of altitude and lower air-density on cooling. A common theme to the rugged design of systems in these AI Transportable applications is the need for a robust cooling strategy designed to support a wide temperature range and alleviate mechanical stress from external vibration or shock loads.

Immersion cooling is a strong candidate to solve both the thermal and structural elements of these applications. From a thermal standpoint, liquid immersion cooling (either single-phase or two-phase) offers the greatest thermal efficiency of any liquid cooling method. What this efficiency means is that liquid immersed systems can operate at a higher external ambient temperature than systems with a direct-to-chip liquid cooling implementation, and significantly higher ambient temperatures than systems with air-cooling (forced or natural convection). On the structural side, the immersion fluid itself acts as a dampening mechanism – mitigating the impact of vibration forces on the electronics.

The two methods of immersion cooling, single-phase and two-phase, each have pros and cons when implemented in edge environments. Single-phase immersion cooling uses a fluid which maintains its liquid state across the entire target temperature range. This cooling method works similarly to air cooling, in that the fluid is typically directed across a heatsink attached to a heat dissipating surface. The warm fluid is then pumped to an external heat exchanger, which then recirculates the cooled fluid back into the system. By using the specialized immersion fluid, this method cools more efficiently than direct-to-chip cooling with a lower delta between ambient and maximum fluid temperatures. Two-phase cooling by comparison makes use of the natural heat dissipation properties of the evaporation cycle of the fluid. In this method, a fluid is selected which has a boiling point below the maximum operating temperature of key heat dissipating components. Once the fluid around the components reaches its boiling point, it turns from a liquid to gaseous state, pulling the heat away from the hot components. The gas is then cooled via contact with a condensing coil until it condenses and is recirculated back into the system. Two-phase cooling is typically more efficient than single-phase, but does add complexities, as the system must be sufficiently sealed to prevent the fluid in a gaseous state from escaping. Additionally, altitude can change the boiling point property of the fluid, requiring either strict altitude limits or a pressurized system to maintain fluid properties.

OSS Rigel Edge Supercomputer Two-Phase Immersion System

OSS' Rigel Edge Supercomputer Two-Phase Immersion System

Immersion technology has already made a significant impact on datacenter high-performance computing scale out – providing an efficient upgrade path to support the next generation of AI computing hardware. Adapting these technologies to rugged edge environments is inevitable as thermal dissipation requirements continue to grow. As AI deployments become more common in autonomous vehicles and other edge domains, system integrators will look to immersion technology as a strong candidate to solve the thermal and structural challenges of the environments.

Click the buttons below to share this blog post!

Return to the main Blog page

Leave a comment

Comments will be approved before showing up.

Also in One Stop Systems Blog

One Stop Systems' Annual Report
The Plan

January 31, 2023

The beginning of the year at a public company is inundated with a myriad of tasks which require a significant amount of time. The first task is writing annual performance reviews for each employee. This includes a review and assessment of achievement of goals from the previous year, and the establishing of goals for the new year. These are required to be written in SMART goal format; Specific, Measurable, Achievable, Relevant and Time based. 

Continue Reading

Which Cooling Method is Best for AI Transportables?
Which Cooling Method is Best for AI Transportables?

January 24, 2023

The most powerful artificial intelligence computing hardware is designed to thrive in a datacenter environment where there is uncapped clean power, near limitless cooling capacity, and a vibration-free environment. The growth of AI use cases in vehicles including automated crop management, autonomous long-haul freight, and military ISR aircraft necessitates the use of datacenter-oriented hardware in vehicles – particularly for initial developments while more customized size, weight, and power (SWaP) optimized embedded platforms are developed. 

Continue Reading

Features and Benefits of Centauri Rugged High-Speed Storage
[VIDEO] Features and Benefits of Centauri Rugged High-Speed Storage

January 17, 2023

In this video, Jaan Mannik, Director of Commercial Sales at OSS, does a quick walkthrough of Centauri Storage Expansion. Centauri offers rugged high-speed storage in a compact chassis. Built as a modular storage expansion to the OSS 3U SDS, Centauri can store up to 256 TB of NVMe storage in its 8-drive canister. These canisters allow for tool-less bulk or individual drive removal and can be hot-swapped for ease of use in fast-paced environments. The system is compatible with 2.5" NVMe drives, and its PCIe Gen4 hardware facilitates high-speed storage throughput. 

Continue Reading

You are now leaving the OSS website