The AI Transportable Hardware Path: From Ingesting Data to Actionable Intelligence
June 28, 2022
By David Warren-Angelucci, OSS Channel Sales Manager
HPC hardware for AI Workflows on the Edge
The building blocks of an AI workflow are the same as any computational workflow:
Acquire Data
Store that Data
Compute the Data
Make educated decisions based on the computational output
While most AI workflows occur in the controlled environment of datacenters where servers have the HPC resources the applications need, many current AI applications require some or all the AI workflow steps to be performed out in the field, in harsh environmental conditions. Until now, companies with applications on the ‘edge’ have had to rely on low-performance hardware or deal with the latency of uploading data to the cloud; rugged edge-computing devices, like industrial PCs and IOT devices, are able to withstand the extreme environmental factors of harsh environments, but they do not come close to offering the same computational performance of servers in a datacenter. Because of this, AI applications on the ‘edge’ have had to compromise on performance, but not anymore!
With our latest line of “AI Transportable” products, One Stop Systems (OSS) supplies rugged appliances which have the same capacity of datacenter performance, but can be used for AI workflows in cars, planes, trucks, ships, drones, and any other environment which has never been able to support HPC hardware…until now. The products in the AI Transportable line are rugged, datacenter-type HPC products that are tailored for each of the four steps in the AI workflow. Companies with edge applications which require the highest performance compute power cannot compromise on performance; they need the components of the datacenter in the field.
With our “AI Transportable” product line, OSS brings the power of the datacenter to the edge!
OSS designs and manufactures high-performance computing systems that are uniquely positioned to support each stage of the AI Transportable workflow, and we have a range of products tailored to meeting the needs of each stage, based on the requirements of the application.
The 4 Stages of the AI Workflow
The ultimate goal of the AI workflow is to process raw data into actionable intelligence. OSS provides hardware platforms which expedite AI workflows and significantly reduce the time to take action.
The four fundamental building blocks of an AI workflow include: gathering raw data from sensors and other I/O devices (OSS has products which acquire significant amounts of data at high-speed), storing that data (OSS has products which support high-density storage in a small footprint), computing that data (OSS specializes in providing multi-GPU platforms for high-speed analytics, inference, AI training, and retraining), and then making intelligent decisions based on the knowledge gained from that data.
1. Data Acquisition
Ingesting data from various sensors and IO devices is a fundamental part of many edge applications. Acquiring large amounts of data at high speeds requires all-flash arrays with high-speed sensor inputs.
OSS has a variety of products which are built to address those requirements, each with its own unique benefits. Some of our data-ingest servers, like the 2U flash-storage array in the picture on the left, are designed for speed and utility. This server supports 24 2.5” SSD bays (which is up to 367TB using 15TB drives), so the chassis offers flexible capacity while maintaining high-bandwidth and low-latency with throughput of over 50GB/s.
Some of our data-ingest servers, like the rugged 4U FSAn-4 flash-storage array in the picture on the right, are designed with density and utility in mind. With 32 slots for PCIe NVMe flash add-in cards in four removeable canisters, it supports up to 400TB of data at double the bandwidth of traditional 2.5” SSDs, and 30GB of net data throughput per second.
2. Data Storage
In addition to the last two ingest and storage servers, we have other storage devices which are designed with density, speed, and utility in mind.
The SB2000, which is illustrated on the left, supports 24 2.5” drive-bays in a 2U chassis. The drives are individually hot-swappable, or they can be removed in groups of 8, which increases the utility of the chassis for the user. Like the 2U flash-storage array, the SB2000 can support up to 367TB of storage, and boasts over 50GB/s of data throughput.
With a similar utility in mind, we are currently in development of our newest flagship rugged storage server, the Centauri NVMe. This is a 4U tall chassis, only a half rack wide. It is designed to take up minimal space, while offering maximum utility for applications out in the field. It’s fully ruggedized and supports up to 8 NVMe SSDs in a single hot-swappable canister.
The benefit of the hot-swappable canister is the minimal down-time between saturating the drives and replacing them with fresh drives. It makes it so that the data-recording process of the application can continue without the delay of having to remove each drive or upload data to the primary datacenter.
3. Compute
Once the data has been collected and stored, it needs to be computed. The customers and applications which OSS targets require real-time multi-GPU computing out in the field. OSS offers a wide range of GPU accelerated systems. Some of them are rackmount and designed to connect to an existing server and simply scale the GPU resources. Some of them are designed with space-limitations in mind and provide an all-in-one powerful solution to rugged multi-GPU computing out in the field.
The chassis which is shown on left is our EB4400, which is designed to support up to 4 traditional GPUs. The EB4400 is a unique chassis in that it can be used as an expansion resource to a customer’s existing server, or it can be used as its own stand-alone GPU server solution.
If more than 4 GPUs are required in a single 4U space, our 4U Pro is designed to be identical to the EB4400, but by supporting 8 GPUs, it offers twice the capacity.
The Rigel Compute Server on the right is our flagship rugged GPU server. With a rugged chassis designed around NVIDIA’s HGX A100 4-GPU board, the Rigel harnesses the significant power of four SXM GPUs connected through NVLINK topology in the world’s highest-performing fully rugged GPU solution.
SXM GPUs are already individually faster than traditional PCIe GPUs, and with the NVLINK topology on the HGX A100 4-GPU board, which supports four SXM GPUs connected with full-mesh peer-to-peer communication, this is the fastest throughput possible in a four GPU system.
Customers around the globe are already using the HGX A100 4-GPU board in their NVIDIA’s DGX workstations in their labs, but our Rigel Supercomputer gives customers the ability to utilize that immense GPU power in harsh environments. Rigel can be used for a wide range of edge applications; it can be flange-mounted to the side of a truck and run on 48V DC power or installed in an airplane for real-time surveillance operations.
4. Actionable Intelligence
The last couple products are hybrid approaches to the purely storage or purely GPU solutions shown previously.
The 3U Short Depth Server on the left is designed with both space limitations and harsh environments in mind. It supports up to four double-wide add-in cards (like a GPU) and has 16 NVMe/SATA SSD bays in two removeable cannisters. The flexibility and utility of this all-in-one server are matched perfectly with its rugged nature for a wide variety of applications that may need more than only storage or only GPU computing.
On the right, you’ll see the Rigel NVMe Storage Server mounted side-by-side to the Rigel Compute Server. The potential capacity of this dual-solution approach to rugged supercomputing is unmatched by anyone in the high-performance computing industry. It may be overkill for many applications, but for those which require the highest performance storage and GPU computing at the edge, this dual-Rigel solution is unmatched.
The Future is Now
The push for supporting AI applications in the field is becoming increasingly evident. Companies are no longer able to accept the compromises that they must make by relying on the time-consuming latency of uploading data to the cloud so that it can be stored and computed in a datacenter before results are transferred back to the field, and traditional industrial box PCs are no longer able to support the intense storage & compute requirements of many AI workflows.
One Stop Systems is the solution -- leading the industry in offering rugged HPC solutions of varying scale for edge AI applications.
The evolution of IT infrastructure spans several decades and is marked by significant advancements in computing technology, networking, storage, and management practices. Data Centers have historically relied on Converged or Hyper-Converged infrastructures when deploying their hardware which proved to limited in flexibility, efficiency, scalability, and support for the Artificial Intelligence / Machine Learning (AI/ML) modern workloads of today.
“Edge Computing” is a term which has been widely adopted by the tech sector. Dominant leaders in accelerated computing have designated “Edge” as one of their fastest-growing segments, with FY24 revenue projected to be nearly $100 billion. The boom in the market for Edge Computing has become so significant that it is increasingly common to see companies create their own edge-related spinoff terms such as ‘Rugged Edge’, ‘Edge AI’, ‘Extreme Edge’, and a whole slew of other new buzzwords.
The landscape of modern warfare is undergoing a profound transformation with the integration of cutting-edge technologies, and at the forefront of this evolution are autonomous military vehicles. Datalogging, a seemingly inconspicuous yet indispensable technology, plays a pivotal role in shaping the capabilities and effectiveness of these autonomous marvels. In this blog post, we delve into the critical role of datalogging in autonomous military vehicles and its impact on the future of defense strategies.