Menu

High Density GPUs & FPGAs in Edge Environments

August 23, 2022

High Density GPUs

By Tom Fries, Government/Defense Sales Manager

A June 4, 2020 congressional report, “Intelligence, Surveillance, and Reconnaissance Design for Great Power Competition”, reinforced the importance of the United States’ military continuing to develop ISR (intelligence, surveillance and reconnaissance) capabilities as threats grow around the world. The report states that the goal is to “make rapid sense of that data; securely deliver that data to weapons, weapon systems, and commanders; and possess a workforce that can execute its mission in competition and combat, at a pace greater than the enemy.”  

ISR applications can combine FPGAs and GPUs because they excel when working together in an environment requiring low latency, searching through real time data, providing immediate intelligence to military personnel. ISR equipment often sits in edge transportable tactical command centers near the battlefield. In these scenarios, the designers of the ISR equipment face multiple challenges:  Edge Environments Diagram

  • Optimizing the number of FPGAs and GPUs in limited space
  • Meeting the power requirements of GPUs, and potentially FPGAs with limited power availability
  • Installing high density GPUs and FPGAs behind CPUs with the limitations of server BIOS
  • Accomplishing the above goals with equipment that can meet the stress of an edge transportable environment 

In this post, I will review the unique products One Stop Systems (OSS) offers to meet these challenges. 

Designing the ISR system to allow the GPUs and FPGAs to communicate directly, and thus eliminating the CPU and reducing latency, is important to achieving critical response time requirements. Also, because edge transportable tactical command centers have limited rack space, valuable space can be saved by increasing the density of GPUs and FPGAs behind CPUs -- for example, 8 GPUs and 8 FPGAs behind a single CPU complex. This creates two challenges, however. The first is that most computer BIOS’ are not robust enough to support 16 or more PCIe devices from a CPU. The second challenge is finding a rugged server to support a large number of PCIe devices. 

Added to the above challenges are the mechanical and power requirements of GPUs like the NVIDIA A6000, which are popular in ISR applications. The A6000 is double-wide and requires 300 watts, taxing the thermal and power capabilities of many servers.  

OSS provides practical solutions to meet the challenges of edge transportable environments – limited rack space, robust power and thermal requirements, and rugged systems for harsh environments. Revisiting the requirement of 8 GPUs and 8 FPGAs behind CPUs, OSS offers a rugged short depth server – 3U SDS. This rugged server is only 20” deep, making it ideal for cramped environments. It offers the flexibility of either Intel or AMD CPU Gen4 CPUs, and features two removable canisters that each hold eight 2.5” SATA or NVMe drives. 

3U SDS Server

                         

To address the requirement of putting 8 GPUs and 8 FPGAs behind the CPU, OSS’s 4UP PCIe Gen4 expansion system can be connected to the server via x16 Gen4 links (both copper and fiber cable options). The 3U SDS and 4UP expansion system are a powerful combination.  

The 3U SDS motherboards use a custom OSS BIOS that supports in excess of 120 PCIe devices. So, in our above use case, two 4UPs can be attached to the 3U SDS server, one with 8 GPUs, and the other with 8 FPGAs. This could be taken a step further, with two additional 4UPs providing for 16 GPUs, and 16 FPGAs supported from a single CPU complex. Like the 3U SDS, the 4UP is rugged to handle harsh environments. The 4UP expansion system also has a variant with I/O slots on the front of the chassis, for situations where quick access to PCIe cards from the front of a rack is needed. 

4UP Expansion System


Another strength of the 3U SDS and 4UP expansion system is their versatility in configuring to the requirements of the ISR application. The 3U SDS can stand alone as a powerful compute node with multiple GPUs and/or FPGAs installed in it. Or, multiple 4UP expansion systems can be attached to the 3U SDS. OSS has designed and built compute node configurations with over 20 GPUs and 30 FPGAs connected to one server. 

OSS remains the leader in leading edge PCIe products. Later in 2022, both the 3U SDS and 4UP expansion system will be available as PCIe Gen5.

Sign up for our newsletter, at the bottom of this page, to stay up-to-date with news from OSS!
  

Click the buttons below to share this blog post!

Return to the main Blog page




Leave a comment

Comments will be approved before showing up.


Also in One Stop Systems Blog

Edge Computing
The Four Types of Edge Computing

April 17, 2024

“Edge Computing” is a term which has been widely adopted by the tech sector. Dominant leaders in accelerated computing have designated “Edge” as one of their fastest-growing segments, with FY24 revenue projected to be nearly $100 billion. The boom in the market for Edge Computing has become so significant that it is increasingly common to see companies create their own edge-related spinoff terms such as ‘Rugged Edge’, ‘Edge AI’, ‘Extreme Edge’, and a whole slew of other new buzzwords. 

Continue Reading

Datalogging in Autonomous Military
Unveiling the Strategic Edge: Datalogging in Autonomous Military Vehicles

March 11, 2024

The landscape of modern warfare is undergoing a profound transformation with the integration of cutting-edge technologies, and at the forefront of this evolution are autonomous military vehicles. Datalogging, a seemingly inconspicuous yet indispensable technology, plays a pivotal role in shaping the capabilities and effectiveness of these autonomous marvels. In this blog post, we delve into the critical role of datalogging in autonomous military vehicles and its impact on the future of defense strategies.

Continue Reading

Redundancy and Management of Rugged Edge Servers
Redundancy and Management of Rugged Edge Servers

February 13, 2024 2 Comments

Computer server redundancy, including backup power supplies, RAID storage devices and applications that automatically fail-over, keeps critical systems up and running longer than non-redundant systems. Similarly, effective system monitoring can provide early warning of failures and allow system managers to remotely manage these systems, further improving application uptime. While the concepts of computer system redundancy and system management are well-established in all levels of computing, from the personal computer to the largest hyperscale datacenters, the unique challenges of placing datacenter-class computing elements performing AI applications in mobile edge environments, like aircraft, ships, and land vehicles, brings unique challenges to system redundancy and management. 

Continue Reading

You are now leaving the OSS website