By Braden Cooper, Director of Products
The promise of next-generation autonomous systems and advanced sensor fusion is ultimately bottlenecked by one thing: data throughput. Our industry is generating an absolute torrent of high-resolution, real-time data processing architecture input—from 4K multi-spectral cameras to high-resolution LiDAR arrays—that must be ingested, processed, and acted upon without delay. When operating in environments demanding true real-time response (where a decision latency of milliseconds can mean the difference between mission success and failure) the traditional server architecture falters. The physics of moving data starts to matter more than the raw processing power.
We are consistently seeing customer specifications that require sustained, bidirectional throughput pushing toward the 200 GB/s barrier for a single edge computing node. To be clear, 200 GB/s is an extremely high data throughput: today, achieving this often requires the aggregation of 4x PCIe Gen5 x16 links in large enterprise server chassis. Since most high-density servers struggle to expand beyond this from a PCIe lane standpoint, we look to the bandwidth density of PCIe Gen6 to consolidate this massive throughput into a smaller, rugged, edge-focused form factor. This architectural consolidation is key to realistically breaking through this performance level for real-time compute. This is where the host CPU, once the brain of the system, becomes its most critical bottleneck. Known as the 'CPU Tax,' the primary processor spends precious cycles simply marshalling data between I/O devices and accelerators, incurring crippling latency via host memory copies. In the world of rugged high-performance computing, where power and thermal constraints are already severe, this host-level inefficiency is simply an architectural flaw we must engineer out of the system.
To solve this data torrent problem, we must not only consider faster components, but a fundamentally different data flow architecture. The critical enabler in the switch fabric, the digital plumbing of the system, is PCIe Gen6.
With Gen6, the raw bandwidth leap allows us to match the exploding sensor ingest needs with the internal fabric capacity required to deliver that data to the accelerator complex. This is not merely an incremental speed bump; it’s an architectural enabler. For a platform to sustain 200 GB/s, we must strategically deploy a dual-path Gen6 ingest layer coupled with a low-latency GPU switch. This robust pipe allows us to accept the data from, for example, the upcoming shift to native GMSL cameras and LiDAR sensors (adding a projected 10 Gbps of aggregate sensor bandwidth for high-end autonomous vehicles). Without PCIe Gen6, this massive torrent of data would immediately saturate the interconnect, forcing throttling and unacceptable latency. The Gen6 fabric is what translates theoretical maximums into guaranteed, sustained performance in the field.
The crucial step in eliminating the ‘CPU Tax’ and achieving predictable, sustained throughput is the implementation of a zero-copy architecture. This is where GPUDirect / RDMA technology provides the essential blueprint.
The legacy data path looks like this: Sensor → Ingest Card → CPU/System Memory → GPU Memory. Each stop is a source of latency and wasted cycles. GPUDirect / RDMA allows for a direct peer-to-peer data transfer, bypassing the host CPU and system memory entirely. The ingest link (via PCIe Gen6) can push data straight into the GPU’s memory space. This action radically cuts latency, ensures the data remains in motion, and frees up the CPU to focus on its actual job: control, high-level path planning, and operating system management. This zero-copy approach ensures instant processing, which is the singular goal of a true Real-time data processing architecture.

This architectural blueprint is what separates a system capable of handling a momentary data burst from one that can sustain peak performance for mission duration in the most demanding rugged high-performance computing environments.
While GPUDirect and high-speed fabrics solve the immediate I/O bottleneck, the continuous increase in model size and the complexity of sensor fusion datasets point toward the next logical challenge: memory management.
As we scale these edge computing nodes, we are rapidly hitting the limits of local memory provisioning and access efficiency. Compute Express Link (CXL) is positioned to address this by enabling memory disaggregation and pooling, making it the next logical bottleneck to solve in the throughput challenge. CXL allows accelerators (CPUs and GPUs) to access a shared pool of high-capacity, low-latency memory resources, optimizing access to the enormous datasets required by next-generation AI. Integrating CXL into rugged, high-throughput systems is a key item on the OSS product roadmap, but as the ecosystem matures, our immediate focus remains on validated zero-copy I/O. CXL is the clear, future-facing direction for solving the ultimate memory bottleneck as the technology and ecosystem continue to develop, ensuring our platforms remain future-proof.
The industry bottleneck is clear: scaling real-time data processing architecture beyond 100 GB/s sustained throughput hits an immediate wall in traditional server architectures due to the 'CPU Tax' and fabric saturation. The architectural blueprint detailed here—the zero-copy pipeline enabled by PCIe Gen6 and GPUDirect—is the critical mitigation strategy.
At OSS, our focus is on translating this theory into deployable reality for the most demanding edge computing applications. We are actively building and validating platforms that strategically integrate high-speed interconnects and low-latency fabrics to ensure predictable, sustained performance in rugged high-performance computing environments. By focusing on eliminating the host-level copy cycles and planning for the next-generation memory bottleneck with CXL, we are demonstrating a clear path forward for the industry to move beyond theoretical limits and deliver the guaranteed throughput required for next-generation autonomous systems.
This discussion is part of an ongoing architectural conversation about the future of mass-throughput systems at the edge. We encourage engineers and developers working on these challenging architectures to engage with us and the broader community on these critical design points.
Before starting college in 2022, I had considered artificial intelligence (AI) a thing of the future, something I wouldn’t see until I was later in my years. With the birth of large language models (LLMs) like ChatGPT and the rise of machine learning systems, my world flipped on its head. Since joining the tech industry, I see I am not alone with this experience. From one year to the next, there is no telling what kind of technological developments we will bear witness to. When it comes to the defense and security of our nation, capitalizing on these advancements is paramount, lest we fall behind our adversaries. As a result, within the defense industry, marketers are required to become adaptable to the shifting needs of their company’s customers.
My time at the booth was spent listening to my colleagues interacting with partners and potential customers. Watching these exchanges, I recognized what was at the heart of West—the real reason why hundreds of people had shown up during the work week to surround themselves with others in the defense industry. The obvious explanations come to mind: to meet customers, establish connections, solidify a brand’s image, and feel out competitors. But when I took a step back to study the messaging on the booths and walk the floor, I saw what I had studied for years as a marketing student come to life. Every conversation and display was geared toward answering one two-part question:
“What do my customers want, and how can my product(s) solve their problem?”
As data-intensive workloads continue to grow across industries like AI, defense, autonomous systems, and high-performance computing (HPC), the need for scalable, high-speed infrastructure has never been greater. Organizations are increasingly hitting the physical and performance limits of traditional server architectures—especially when it comes to GPU density, storage bandwidth, and I/O flexibility.
This is where PCIe Expansion Systems come into play. By extending the capabilities of existing servers, PCIe Expansion enables businesses to scale performance efficiently without complete infrastructure overhauls. In this blog, we’ll explore the key benefits of PCIe Expansion, including how expansion backplanes, GPU expansion, and emerging technologies like CXL and PCIe 6.0 are shaping the future of computing.