By Tom Fries, Government/Defense Sales Manager
A June 4, 2020 congressional report, “Intelligence, Surveillance, and Reconnaissance Design for Great Power Competition”, reinforced the importance of the United States’ military continuing to develop ISR (intelligence, surveillance and reconnaissance) capabilities as threats grow around the world. The report states that the goal is to “make rapid sense of that data; securely deliver that data to weapons, weapon systems, and commanders; and possess a workforce that can execute its mission in competition and combat, at a pace greater than the enemy.”
ISR applications can combine FPGAs and GPUs because they excel when working together in an environment requiring low latency, searching through real time data, providing immediate intelligence to military personnel. ISR equipment often sits in edge transportable tactical command centers near the battlefield. In these scenarios, the designers of the ISR equipment face multiple challenges:
In this post, I will review the unique products One Stop Systems (OSS) offers to meet these challenges.
Designing the ISR system to allow the GPUs and FPGAs to communicate directly, and thus eliminating the CPU and reducing latency, is important to achieving critical response time requirements. Also, because edge transportable tactical command centers have limited rack space, valuable space can be saved by increasing the density of GPUs and FPGAs behind CPUs -- for example, 8 GPUs and 8 FPGAs behind a single CPU complex. This creates two challenges, however. The first is that most computer BIOS’ are not robust enough to support 16 or more PCIe devices from a CPU. The second challenge is finding a rugged server to support a large number of PCIe devices.
Added to the above challenges are the mechanical and power requirements of GPUs like the NVIDIA A6000, which are popular in ISR applications. The A6000 is double-wide and requires 300 watts, taxing the thermal and power capabilities of many servers.
OSS provides practical solutions to meet the challenges of edge transportable environments – limited rack space, robust power and thermal requirements, and rugged systems for harsh environments. Revisiting the requirement of 8 GPUs and 8 FPGAs behind CPUs, OSS offers a rugged short depth server – 3U SDS. This rugged server is only 20” deep, making it ideal for cramped environments. It offers the flexibility of either Intel or AMD CPU Gen4 CPUs, and features two removable canisters that each hold eight 2.5” SATA or NVMe drives.
To address the requirement of putting 8 GPUs and 8 FPGAs behind the CPU, OSS’s 4UP PCIe Gen4 expansion system can be connected to the server via x16 Gen4 links (both copper and fiber cable options). The 3U SDS and 4UP expansion system are a powerful combination.
The 3U SDS motherboards use a custom OSS BIOS that supports in excess of 120 PCIe devices. So, in our above use case, two 4UPs can be attached to the 3U SDS server, one with 8 GPUs, and the other with 8 FPGAs. This could be taken a step further, with two additional 4UPs providing for 16 GPUs, and 16 FPGAs supported from a single CPU complex. Like the 3U SDS, the 4UP is rugged to handle harsh environments. The 4UP expansion system also has a variant with I/O slots on the front of the chassis, for situations where quick access to PCIe cards from the front of a rack is needed.
Another strength of the 3U SDS and 4UP expansion system is their versatility in configuring to the requirements of the ISR application. The 3U SDS can stand alone as a powerful compute node with multiple GPUs and/or FPGAs installed in it. Or, multiple 4UP expansion systems can be attached to the 3U SDS. OSS has designed and built compute node configurations with over 20 GPUs and 30 FPGAs connected to one server.
OSS remains the leader in leading edge PCIe products. Later in 2022, both the 3U SDS and 4UP expansion system will be available as PCIe Gen5.
Sign up for our newsletter, at the bottom of this page, to stay up-to-date with news from OSS!
Click the buttons below to share this blog post!
The rugged edge computing landscape is becoming increasingly complex with new generations of technologies, such as the latest AI focused GPUs, releasing annually rather than every 2-3 years. Whether the end application is commercial or defense, rugged edge servers must not only deliver cutting-edge compute performance but also withstand extreme environmental conditions.
When the PCI-SIG formally added support for 675W add-in card devices in the PCI Express Card Electromechanical (CEM) specification in August 2023, NVIDIA’s most powerful CEM GPU, the NVIDIA H100 80GB had a maximum power consumption of 350W. While some devices were starting to push the limits of datacenter thermodynamics – high density systems of many 675W devices seemed like a distant reality. However, with power constraints uncapped and the need for higher performing GPUs skyrocketing, the industry quickly came out with devices taking full advantage of the new specification capability. NVIDIA quickly replaced the H100 80GB with the H100 NVL, increasing power density to 400W. While this small jump was manageable for existing installations, NVIDIA then dove all-in with the H200 NVL released in late 2024 at 600W. The rapid transition from 350W to 600W has put power and cooling technologies in the spotlight in a race to solve this next generation challenge.