Preservation of VPX Investment and AI Transportable GPU Platforms

May 24, 2022

Military Plane

By Jim Reardon, One Stop Systems

The OpenVPX standards occupy a special place in deployed military compute platforms.  Designed as a successor to VME64, the OpenVPX (ANSI 65) and related standards such as SOSA have offered the opportunity to package COTS systems that meet the challenging environmental and electrical requirements of military vehicles across all services.  AI Transportable systems incorporate GPUs and switched fabrics that were not envisioned by the existing standard and threaten to force significant new investment in product development before deployment of next generation systems such as C4ISR

AI Transportable systems, such as the One Stop Systems' Rigel, achieve extraordinary compute performance by use of multiple GPUs and new generation switched fabrics, such as PCIe Gen4 and NVIDIA® NVLink™.  Today, these elements do not map well into the legacy standard.  Among the significant challenges are power supply, conduction cooling, and the limitations of the OpenVPX backplane definition that limit PCIe lane capacity. To preserve the significant industry and government investment in OpenVPX products, a hybrid solution is needed to gain the performance benefits of GPUs, such as NVIDIA® A100 Tensor Core GPUs in next-generation deployments.

With the introduction of Rigel, we can now offer product extensions to support existing OpenVPX and SOSA-compliant sensor I/O, while offering direct access to the performance of Rigel GPUs where it is useful to support new and more demanding applications.  Relying on OSS experience with PCIe Gen4 (and now Gen5) expansion, such a combination is now architecturally feasible, offers reduced sensor latency (with the potential to completely bypass host memory buffers), while preserving two decades of industry and government investment in existing SOSA interface solutions.

At the heart of a Rigel system is a PCIe Gen 4 switched fabric that provides the capacity to support the NVIDIA HGX™ module, which itself consists of four NVIDIA® A100 GPUs.  Each of the GPUs features an external 16-lane PCIe connection, as well as private NVIDIA NVLink connection to the other GPUs.  To manage access to the GPUs, Rigel manages this complexity by implementing a versatile PCIe switched fabric and related management software that allows dynamic or fixed lane routing between GPUs, hosts, memory, and I/O according to application demands.  Of course, the PCIe requirements of SOSA-compliant accessories in VPX format is accomplished by extending the PCIe Gen 4 Host Bus Adapter expansion technology developed by OSS.

Figure 1 – Conceptual Hybrid Rigel

Through OpenVPX extensions to Rigel, new levels of low-latency sensor acquisition are possible.  Direct access to memory in the GPUs without transit through host memory is possible, unlocking new levels of sensor bandwidth as latency is reduced.  By way of this architecture, the role of the host processor gives way to the GPUs, which can offer greater computing power, flexible data formats, and of course, parallelism. 

In AI Transportables, GPUs and switched fabrics will play a key role in lifting the legacy application performance limits.  Combining Rigel and OSS host bus adapter and extender technology, the life of legacy SOSA interfaces can be extended. Placing the managed switch fabric at the heart of these applications, even the GPU elements can be replaced with newer generations, while preserving other elements of the system for reuse.

If you would like to learn more about hybrid Rigel, please be in touch!

Click the buttons below to share this blog post!

Return to the main Blog page

Leave a comment

Comments will be approved before showing up.

Also in One Stop Systems Blog

One Stop Systems' Annual Report
The Plan

January 31, 2023

The beginning of the year at a public company is inundated with a myriad of tasks which require a significant amount of time. The first task is writing annual performance reviews for each employee. This includes a review and assessment of achievement of goals from the previous year, and the establishing of goals for the new year. These are required to be written in SMART goal format; Specific, Measurable, Achievable, Relevant and Time based. 

Continue Reading

Which Cooling Method is Best for AI Transportables?
Which Cooling Method is Best for AI Transportables?

January 24, 2023

The most powerful artificial intelligence computing hardware is designed to thrive in a datacenter environment where there is uncapped clean power, near limitless cooling capacity, and a vibration-free environment. The growth of AI use cases in vehicles including automated crop management, autonomous long-haul freight, and military ISR aircraft necessitates the use of datacenter-oriented hardware in vehicles – particularly for initial developments while more customized size, weight, and power (SWaP) optimized embedded platforms are developed. 

Continue Reading

Features and Benefits of Centauri Rugged High-Speed Storage
[VIDEO] Features and Benefits of Centauri Rugged High-Speed Storage

January 17, 2023

In this video, Jaan Mannik, Director of Commercial Sales at OSS, does a quick walkthrough of Centauri Storage Expansion. Centauri offers rugged high-speed storage in a compact chassis. Built as a modular storage expansion to the OSS 3U SDS, Centauri can store up to 256 TB of NVMe storage in its 8-drive canister. These canisters allow for tool-less bulk or individual drive removal and can be hot-swapped for ease of use in fast-paced environments. The system is compatible with 2.5" NVMe drives, and its PCIe Gen4 hardware facilitates high-speed storage throughput. 

Continue Reading

You are now leaving the OSS website