This site may earn affiliate commissions from the links on this page. Terms of employ.

Amazon'south Rubberband Compute Cloud (EC2) offers businesses the opportunity to hire scalable servers and host applications and services remotely, rather than pay for and handle the infrastructure and management of those resources on their own. The service, which offset entered beta a picayune more ten years ago, has historically focused on CPUs, but that's changing now, courtesy of a newly-unveiled partnership with Nvidia.

According to joint weblog posts from both companies, Amazon will now offer P2 instances that include Nvidia's K80 accelerators, which are based on the older Kepler architecture. Those of yous who follow the graphics marketplace may be surprised, given that Maxwell has been available since 2014, merely Maxwell was explicitly designed every bit a consumer and workstation product, not a big-atomic number 26 HPC office. The K80 is based on GK210, non the top-end GK110 parts that formed the basis for the early on Titan GPUs and the GTX 780 and GTX 780 Ti. GK210 offers a larger register file and much more shared retentiveness per multiprocessor cake, as shown beneath.

GK210-Comparion

The new P2 instances unveiled by Amazon will offer up to 8 K80 GPUs with 12GB of RAM and two,496 CUDA cores per card. All K80s back up ECC retentivity protection and offering upwards to 240GB/s of memory bandwidth per carte. One reason Amazon gave for its decision to offer GPU compute as opposed to focusing on scaling out with additional CPU cores is the so-called von Neumann bottleneck. Amazon states: "The well-known von Neumann Bottleneck imposes limits on the value of additional CPU power."

This is a pregnant oversimplification of the problem. When John von Neumann wrote "First Draft of a Report on the EDVAC" in 1945, he described a figurer in which program instructions and data were stored in the aforementioned pool of memory and accessed by the aforementioned coach, equally shown below.

von-Neumann

In systems that apply this model, the CPU tin can either access program instructions or data, merely information technology can only admission one or the other. It cannot simultaneously copy instructions or information at the same fourth dimension, and it cannot transfer information direct to or from main retentivity nearly as quickly as it can perform work on that data once the information has been loaded. Because CPU clock speeds increased far faster than memory operation in the early on decades of computing, the CPU spent an increasingly big amount of time waiting on data to be retrieved. This wait-state became known every bit the von Neumann bottleneck, and information technology had get a serious problem by the 1970s.

Harvard_architecture.svg

An alternative architecture, known equally the Harvard compages, offers a solution to this problem. In a Harvard architecture chip, instructions and data had their own split up buses and concrete storage. Just most fries today, including CPUs built by Intel and AMD, tin can't be cleanly described equally Harvard or von Neumann. Similar CISC and RISC, which began as terms that defined two different approaches to CPU design and have been muddled by decades of convergence and common design principles, CPUs today are best described every bit modified Harvard architectures.

Modern chips from ARM, AMD, and Intel all implement a split L1 cache with instructions and data stored in separate concrete locations. They use branch prediction to decide which lawmaking paths are most likely to be executed, and they can shop both programs and instructions in case that information is needed over again. The seminal paper on the von Neumann bottleneck was given in 1977, before many defining features of CPU cores today had even been invented. GPUs have far more memory bandwidth than CPUs practice, just they also operate on far more threads at the same time and have much, much smaller caches relative to the number of threads they continue in-flight. They use a very different architecture than CPUs do, just it'due south subject to its ain bottlenecks and asphyxiate points besides. I wouldn't telephone call the von Neumann bottleneck solved — when John Backus described it in 1977, he railed against programming standards that enforced it, saying:

Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has kept us tied to word-at-a-time thinking instead of encouraging usa to think in terms of the larger conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant information itself, but where to discover information technology.

We've had good luck challenging the von Neumann clogging through hardware. Simply the full general consensus seems to be that the changes in programming standards that Backus chosen for never actually took root.

I'm non sure why Amazon went down this item rabbit hole. Incorporating GPUs as part of its EC2 service makes proficient sense. In the nearly 10 years since Nvidia launched the first PC programmable GPU, the G80, GPUs have proven that they can deliver enormous functioning improvements relative to CPUs. Nvidia (and to a lesser extent, AMD) has built a significant business concern around the utilize of Tesla cards in HPC, scientific calculating, and major industry. Deep learning, AI, and self-driving cars are all hot topics of tardily, with huge amounts of corporate funding and a number of smaller companies trying to stake out positions in the nascent market.