The short answer: Tons of AI Inferencing
By Tim Miller, Vice President, Product Marketing
Typical Artificial Intelligence workflows are well understood. The classic example is image recognition distinguishing between an image of a cat or a dog. The first step is performed by data scientists who create a model they train using large sets of tagged data. The models are iteratively refined by the data scientists to achieve higher and higher levels of accuracy. Once the refined model is trained, it can be deployed, and new images can be presented to it for inferencing. The AI inferencing process results in a never seen before image being classified as either a cat or a dog.
The process of training more sophisticated models which do more sophisticated tasks is hugely compute intensive and requires very large sets of training data. Modern approaches to AI model training require powerful computer systems often with multiple GPUs. GPUs are optimal for the highly parallelized task of AI training. Even with these systems training a single model, it can take hours or even days to complete. For this reason, AI model development and training often take place using shared datacenter resources using clusters of GPU servers among teams of data scientists. Scheduling applications manage queues of training jobs applying priorities and policies to ensure maximum GPU utilization and fair access shared by the data scientist teams.
In contrast, on the AI inference side, the compute requirement of a single inquiry is low. It still benefits from the parallel architecture of GPUs, but requires only a fraction of a GPUs available compute power.
So, if servicing a single inferencing query is not significantly compute intensive, why do edge deployed inferencing platforms need powerful GPUs?
Most AI inferencing requirements are outside the datacenter at the edge where new data is being sourced, and inferencing queries are being generated. Here the measure of effectiveness is the speed with which an answer is provided, and, in many applications, real-time response is required. And most importantly, to meet the objectives of the overall AI application, a very large number of inferencing queries need to be serviced simultaneously.
To understand why, let’s take the example of an autonomous long-haul truck.
The AI Level 4 (no driver) enabled Peterbilt truck hauling cargo destined for Seattle will leave its starting hub outside Washington DC, and for the next two days it will autonomously operate across country, saving its owner cost and time. Driving all day and night, it will encounter rain, snow, wind, variable traffic conditions, and some unexpected events like animal crossings or debris in the road. There will be the construction lane shifts to be navigated outside Chicago. Once in Seattle, it will drop off its cargo, get reloaded, and be back on the road 2 hours later. Time is money, and maximizing utilization of the quarter of million-dollar truck is pumping up the operating company’s bottom line.
Upfront investment outfitted the truck with a myriad of sensors, including lidar (light detection and ranging) cameras and radars, along with a powerful and rugged AI inferencing computer server. The data being pumped out by the sensors generates thousands of inquires a second to the on-board inference engines. The blip coming in from lidar #1 asks, what is it seeing? Is it normal? Do I need to do something? Thousands of such blips are continuously streaming in from each independent sensor, and are generating a steady stream of inferencing queries. The first task needing to be accomplished is perception and environmental awareness, such as what is going on around the truck, and how does it relate to the understanding of where it is located. It checks its acquired understanding with the pre-conceived view, based on the continually updated GPS mapping coordinates. The next step is system planning and decision-making. Based on the perceived world what actions are required? Thousand of micro decisions need to be made. Are we approaching a curve? has someone slammed on their breaks in the lane ahead? Are we headed up a grade and need more acceleration? Different inference models are being taxed with different sets of inquiries. Some answers are generating new queries to different engines. And all of this is operating in real-time. There is no lag time available to wait for decisions. Finally, the actuation stage is where instructions are relayed to the steering and braking systems, with fine-tuned adjustments streaming in continuously.
It is clear the operation of the autonomous truck is dependent on not a single inference query, but thousands, simultaneously and all posing different types of questions. The AI inference computer is supporting many different model types, each handling a high bandwidth of incoming questions. The powerful GPU-based system is ideal for handling all of this, and doing it with low latency, so decisions are made and instructions carried out in real-time.
Not surprisingly, the more GPU power that can be provided, the more sophisticated the response capability. The data scientists can develop more and more sophisticated models that will increase the efficiency, resiliency, and safety of the autonomous truck. The goal of the hardware designer is to provide headroom to allow the software to evolve to higher and higher value solutions.
The autonomous truck is but one example of the need for very high-performance inferencing solutions at the edge. Imagine similar requirements in autonomous mining, agriculture, construction, and oil/gas operations.
Whereas AI training will consume entire GPUs for hours and even days. AI inferencing requires powerful GPUs for real-time responsiveness. If you are an inference engine in an IoT device, you can get by with a low power embedded process or GPU, but if you are a long-haul autonomous truck hurling down the highway in variable conditions, you need to be processing a flood of "what is that?" and "what should I do?" inquiries simultaneously.
So yes, you do need powerful GPUs for AI inferencing at the edge. Oh, and by the way, they need to be rugged too. That is for discussions in other blogs. These are not your dad’s datacenter systems, these are specialized systems with no-compromise performance, ready for rugged edge deployments. These are AI Transportable systems.
Click the buttons below to share this blog post!
Comments will be approved before showing up.