A previous blog post discussed innovative AI hardware options available to the US Government and their allies for packing a variety of AI applications into a military vehicle on the move. This post will take a deeper dive into the features and benefits that VPX and Tier 1 server manufactures lack, as compared to PCIe-based AI Transportable server manufacturers that focus on Military AI.
The need to keep US and allied troops out of harm’s way, while still pursuing battlefield superiority, increasingly requires a need for battlefield assets throughout the military theater to become fully autonomous. Currently, most unmanned military vehicles are controlled remotely, but the military is expanding the role of autonomy within surface ships, submarine vessels, aircraft, and land vehicles to identify and take action on current and future threats. If autonomous navigation was the only role of the vehicle, then a low power, embedded VPX system may be able to handle that task. However, it is increasingly common for the list of AI applications on a single vessel to include multiple applications running simultaneously and concurrently with autonomous navigation, VPX technology quickly fails to meet the increasing computational performance requirements of autonomous military AI applications. Conversely, a PCIe-based server from a Tier 1 OEM such as Dell, HP or IBM may have enough computational horsepower to run a full suite of military AI applications, but a traditional datacenter server has no chance to survive the harsh environments a military vehicle is likely to encounter.
Enter AI Transportables.
One Stop Systems’ line of AI Transportable hardware is designed from the ground-up to take the latest technologies found in the most powerful datacenters, optimize them for performance and reliability in land, sea, and air vehicles which require an entire suite of military AI inference applications. This process encompasses more than taking a datacenter server and repackaging into a semi-rugged design, like some companies do. Additionally, AI Transportable hardware is not built on the concept of taking an embedded VPX design and trying to enhance it through customizing a standard. Our hardware is not designed to break power barriers imposed by the VPX standard, nor do we use other tricks to try and get more performance out of VPX. An AI Transportable server is purpose-built with every consideration of the rigors of land, sea, and air vehicles. These considerations include (but are not limited to) unique power inputs, specialized cooling requirements, networking, dense I/O scalability, and single pane of glass out-of-band monitoring, management, and control, all centered around handling multiple AI applications in a vehicle.
Figure 1 shows how AI Transportable hardware checks all the boxes required by military AI applications as compared to a typical Tier 1 OEM server, semi-rugged server and VPX.
When multiple AI applications need to run simultaneously on a single vehicle, one concept is crucial to the PCIe-based AI Transportable dominance over VPX: multi-instance GPU (or MIG). A VPX GPU does not support MIG, and can only run one small native AI inference application at a time. There are middleware vendors that can add a virtualization layer to a VPX GPU and divide a single GPU into several virtual instances, but this comes with the undesirable drawback of increased latency. Conversely, on an AI Transportable server, like the OSS Rigel Edge Supercomputer, four NVIDIA A100 80GB/s GPUs can run a single large AI inference application using NVLink, or each A100 GPU can be split into 7 MIG instances, allowing for up to 28 separate native AI inference applications to be run simultaneously from a single server (Figure 1). MIG also has the added benefit of allowing the user to individually configure the ideal amount of GPU Tensor cores, memory, and cache for each application (see NVIDIA MIG blog here). In addition, each MIG instance can be securely isolated with confidential computing at the hardware and hypervisor level in the latest available GPUs.
An example of MIG being used for an AI intensive workload on a military vehicle is an autonomous drone (Figure 2). The drone could use the Rigel AI Transportable system with 28 available MIGs. 10 of the 28 available MIGs can be allocated to the autonomous navigation AI function, while 9 other applications such as threat detection, object recognition, countermeasure deployment, autonomous weapon selection, predictive maintenance, terrain mapping, RF spectrum analysis, multi-vehicle orchestration, and natural language processing can each utilize 2 MIGs each (for a total of 28 MIGs) on the same server while sharing data from a single set of input sensors via the large shared system memory space. VPX would need to split these tasks into various blades each with its own memory domain, introducing network latency to be able to share the critical sensor information, slowing AI reaction time.
Now, VPX still plays a vital role for the government and its allies in specific low-compute-requirement applications, ultra-rugged deployments, or where a single AI inference application is required. However, for program managers open to disruptive technology, AI Transportable servers can be preferable alternatives to VPX hardware for rugged HPC applications, as AI Transportable hardware can be reliably deployed in 90% of the ultra-rugged environments, which may require higher compute-performance than the current VPX hardware can offer (see the blog post here). The most powerful GPU announced on the VPX roadmap is the dual NVIDIA A4500 that will offer 35 TFLOPS of FP32 with 32GB RAM total and a PCIe interconnect between GPUs. However, the A4500 VPX GPU is intended more for professional graphics than for performing AI inference in vehicles. By contrast, the OSS Rigel Rugged Supercomputer is designed for AI Transportable applications and uses four full-featured A100 GPUs with 80 TFLOPS of FP32. For AI inference applications, such as autonomous navigation, the Rigel Rugged Supercomputer offers over 4,800 TOPS of INT8 performance with 80GB RAM and NVLink interconnect plus MIG that can run up to 28 AI applications at one time across the GPUs (Figure 3).
To take the comparison between VPX hardware and AI Transportable servers one step further, the VPX bladed architecture takes an extremely inefficient approach to scaling performance by adding more blades across a networked backplane, or by adding more systems to service additional applications. Networking VPX blades together on a backplane requires multiple memory domains for each blade and a method for sharing memory and caches between GPUs, FPGAs and processors. AI Transportable servers utilize PCIe standards, meaning that RDMA moves memory data between processing modules over PCIe and Ethernet, while CXL will create a coherent cache between these modules when PCIe Gen5 is generally available in early 2023. Additionally, VPX announcements tend to lag behind the AI Transportable and Tier 1 OEM announcements by 3 years or more. The most current VPX product roadmap announcements define upcoming products as only PCIe Gen4, which does not support CXL cache coherency. AI Transportable servers like the OSS Rigel Edge Supercomputer take memory sharing to the datacenter level by using RDMA over 256Gb PCIe or 200Gb Ethernet, sharing up to 4TB of shared main memory space, instead of disbursing memory across multiple processing blades, and cache coherent NVLink between GPU processing modules is a technology which is shipping in AI Transportable servers today. By mid-2023, AI Transportable servers will be utilizing PCIe Gen5 with speeds up to 512Gb and Ethernet networking speeds of up to400Gb. Additionally, with H100 GPU support, AI Transportable servers will offer 6x the inference capacity and support both NVLink and CXL cache coherency to minimize AI workload latency.
Finally, there is a significant price differential between VPX and AI Transportable servers. In the MQ-9 application example above, a multi-VPX solution that was considered, which could only handle 5 of the 10 AI inference applications running on the drone, would cost over $500,000. An equivalent AI Transportable server that handles all 10 AI inference applications with plenty of room to spare, has a price tag under $250,000. It is easy to see why AI Transportable servers are attractive to the military AI landscape by providing the necessary reliability in harsh environments, without compromising on datacenter-class performance. AI Transportable servers can keep up with the AI processing demands of the Pentagon, US Federal Agencies, allied militaries, and the Chief Digital and Artificial Intelligence Officer (CDAO).
Click the buttons below to share this blog post!
Comments will be approved before showing up.
In this video, Jaan Mannik, Director of Commercial Sales at OSS, does a quick walkthrough of Centauri Storage Expansion. Centauri offers rugged high-speed storage in a compact chassis. Built as a modular storage expansion to the OSS 3U SDS, Centauri can store up to 256 TB of NVMe storage in its 8-drive canister. These canisters allow for tool-less bulk or individual drive removal and can be hot-swapped for ease of use in fast-paced environments. The system is compatible with 2.5" NVMe drives, and its PCIe Gen4 hardware facilitates high-speed storage throughput.