Operating system engineering used to be a specialized field, all about cleverly bending software to overcome hardware limitations. The goal? To present users with a smooth, virtual environment.
This field was driven by two key performance needs, so universal that they shaped the features of most operating systems. Server versions had to handle massive data bursts, compensating for slow network speeds across different time zones. Server capacity was directly tied to the number of users, making efficient cluster utilization a critical cost factor.
Maximizing server performance meant handling longer bursts of activity. More CPU power could be harnessed without interruption, and servers could pipeline requests, reading from or writing to the network to offset internet delays with larger data packets. Server operating systems were specifically tuned for this.
Client versions were different. The primary concern was responsiveness, especially when typing. CPU utilization was less of a worry. Client CPUs were often oversized, prioritizing a fluid user experience over constant maximum usage. GPUs were essential for handling the user interface and achieving these responsiveness goals.
Chip design is all about trade-offs, so server and client versions had their own matching CPU and GPU types.
Today, we see GPUs evolving in two distinct directions. Massive data centers are being built with high-end GPUs costing $20,000 or more. Such a significant investment demands high utilization. Even though these GPUs contain specialized cores like CUDA, RT, and TF, some chip space might still be underutilized. However, the stakes are so high that entire companies are dedicated to optimizing GPU code for maximum hardware utilization. This can lead to network underutilization of around 30%, based on AWS pricing data, as data is kept as close to the CUDA cores as possible.
Edge computing emerged from the PC world as increased income allowed for many IoT devices in a building. These edge GPUs are feature-rich because customers expect instant responses, regardless of the task. Image recognition, interactivity - all require seamless hardware support. These features can lead to local GPU underutilization, but they offer better thermal and power characteristics than their server counterparts. Client versions can easily drive these. The key is interactivity and low response time.
These edge environments will still face bottlenecks. Limited fiber bandwidth means devices will likely max out their network capacity when downloading updates or uploading images. A well-designed operating system will maintain a balanced cache to support applications like robotics, surveillance, industrial control, and presentations.
Clock cycles used to be the main constraint. Raspberry Pi and ARM devices often struggled with HTTPS decryption and ZIP decoding. The former is computationally intensive, while the latter requires a large, fixed cache buffer for code words. Linux, often optimized for 4GB of memory or more, can pose challenges for lower-spec ARM designs.
GPUs consume a lot of energy and generate heat in data centers, requiring complex cooling solutions. Firmware updates that boost performance can shorten hardware lifespan to just 3-4 years, which is unacceptable for many applications.
The traditional GPU applications of gaming, science, and AI training (essentially compression) are expanding to include simulations, robotics, and financial decision-making. AI is currently very training-heavy. We anticipate that smart, modular models will allow users to turn features on and off, similar to operating systems. Models will become compressed training sets of searchable archives, and inference will take over a significant portion of the world's computing capacity.