I believe this is one of the hottest questions in every discussion around picking a new workstation. It is worded in various ways: Which is the best GPU for Vray? Or, Which is the best GPU for rendering in 3DS Max / Maya / Rhino 3D / Cinema 4D? etc. In most cases, the answer is none. Not as in “there is no single answer”, but more as in, “GPU has nothing to do with rendering” – or does it?
The talk around GPUs and any short of workstation application has been raging for ages. Blame the interwebs, review sites and blogs, biased sellers that care for nothing but pushing the most expensive cards or the fact that most if not all purpose built PCs with a workstation label have a expensive (I don’t say good / bad / fast, just expensive) GPU, add some lack of education & the natural confusion on the buyer’s side, and there you have it.
Most of these points are extracted & enriched posts from my activity in the CGArchitect.com forums and presented here in a Q&A fashion.
The role of the GPU in traditional rendering engines
The GPU is not used for any aspect of “normal” renders with VRay advanced, Mental Ray, Maxwell or any other traditional renderer. It never had. It will get there, but it is not right now. The GPU – known for ages as “the graphics card” – for the most part does nothing more than displaying a visual representation of a digital model, on the 2D space of a monitor screen. That’s it. It doesn’t help the CPU – most likely the contrary – it delays it as the CPU is actively creating queues and prepares frames fro the GPU to render on screen. Yes, yes, the GPU is rendering – all the time – the frames you see on the screen, not your final production rendering.
Would make virtually no difference rendering something with a workstation or rendering node with “on-board” graphics/IGP, GTX 780, GT 610, Radeon 7750 or any Quadro. The fact that machines labeled as “workstations” marketed to people “rendering stuff” come options that contain “workstation” GPUs, mainly Quadros & Firepros have nothing to do with “rendering” itself. It is a delicate balance between vendors trying to create an all-around product, capitalize on marketing promises for better compatibility and reliability and increase their margins upselling the more expensive workstation cards.
APIs and the Worstation Card advantage
As viewport engines mature, most modern 3D packages offer optimizations that do benefit GTX / Radeon cards. Especially for Autodesk products that have departed OpenGL (or never used it), like 3DS Max, AutoCAD, Revit and to a lesser extend the current versions of Maya, Quadro & Firepro workstation cards have little to offer over a mid-range or better GTX/Radeon.
90% of what a workstation card has to offer, is OpenGL driver optimizations, most of which are intentionally “left out” of “gaming cards” that are optimized only for Direct3D – a traditiotionally “Gaming” API. For good or for bad, it has been the choice for many 3D App devs. This doesn’t mean that Direct3D or the particular implementation of it by Autodesk for each of its products is allowing to properly utilize the performance of modern GPUs – quite the contrary: you will see that the performance reaches a plateau after you get into GTX 760/770 performance levels. Opting for a faster on paper 780 or a shiny 980, won’t yield substantial benefits.
Is it Autodesk doing a bad job, or NVidia / AMD intentionally leaving drivers without any optimizations for 3D Apps, trying to boost their workstation lines? Well, perhaps both. The reality is that you getting a K2000 instead of a 750Ti to work on the aforementioned D3D based apps, will be wasting your money. Users interested in Solidworks, Catia, NX and other modelers based on OpenGL have many good reasons to consider investing into a Quadro or Firepro, but for ArchVIZ, the investment in low to mid-range workstation cards is returning diminishing, or even negative results. Twisted, but real.
GPU Accelerated Rendering Engines
Much of this confusion is created due to the fact that companies have been experimenting with GPU accelerated rendering for quite some time now. The process of utilizing the GPU’s comptuting capabilities outside of driving one or more displays is offen refered to as General-Processing GPU applications or GPGPU.
The first commercially available example of GPGPU I was aware of was Octane, but it was followed by others like iRay, Vray RT GPU, Maxwell Multilight – to name a few.
With Vray being the most popular choice in ArchViz, many are interested in VRay RT GPU. We ought to know that the RT engine is different than the original and not 100% compatible with all the features of Vray Advanced (yet).
Unfortunately, that employee that tried to upsell you that expensive GPU as a good choice “for rendering” has no idea when GPGPU started to become a “real thing” in 3DS Max or its plugins, much like in Photoshop or the rest of the Adobe suite, and most likely that is the case for 99% of the people giving advice in forums and blogs: we often just recycle rumors or what aligns to our general perception.
GPGPU implementations utilize either CUDA – nVidia’s proprietary programing language for GPGPU – or OpenCL. To a great extend, these langages communicate with the GPU in a very direct way, bypassing the drivers which still have an impact on actual performance, but it is much smaller. What it is sure, there is no benefit in using a Quadro or Firepro card with those, at least not in ArchViz where the fully unlocked FP64 (also referred to as double precision) has little to no role in GPGPU renderings.
Yes, once again, all the stuff you’ve been reading on how the GTX Titan is a game-changer of a card for offering increased DP/FP64 performance, and it being the perfect “workstation card” was again misinformed beliefs. Regardless of how much time spent on forums and blogs theorizing, the card or card combination with the better aggregate of shaders* x the base clock will yield the best performance. Thus the GTX Titan is worse in VRay RT GPU than say 2x 770s.
* Remember to compare directly only same generation / architecture GPUs. The Titan and 6xx/7xx GPUs other than the 750Ti are all Kepler cards. 750Ti & 9xx cards, along with some 8xx mobile GPUs are based on Maxwell which in general is better for compute despite using less shaders per cluster.
Biased & Unbiased Methods
Vray RT GPU and most GPGPU rendereres are “unbiased” renderers, that uses brute force & the direct compute method to literary calculate ray by ray and bounce per bounce of light for each and every pixel on the frame, gradually chewing through the whole GI solution. These very small “problems”, are a waste for the long, complicated compute threads of modern CPUs: the CPU is “done” with it very fast, but it has to wait for the next problem in queue to come up.
Calculating hundreds of thousands or millions of bounces with 8, 12 or 24 threads – depending on the CPU(s) you have – is tedious and takes lots of time, with the CPU often waiting more time for the answer to go through the processing pipeline than it took for it to be solved.
This issue was solved long ago in rendering engines, with developers concluding that it was perfectly acceptable to utilize “intelligent shortcuts”, which in a nutshell involved grouping neighboring pixels and interpolating a lower resolution GI solution to more than one similar pixels to speed up rendering speeds (e.g. Irradiance mapping is such a technique). These techniques are characterizing a rendering engine as “biased“, since it doesn’t independently calculate each and every pixel on the final frame, but “cheats” through interpolating results based on a predetermined rate of interpolation.
The game changer came with the massive parallelism built into the 100s or 1000s of simple compute units (aka CUDA cores, shaders, etc) in modern GPUs. All of the sudden, the direct approach of calculating each and every pixel in a frame individually and in a timely manner became possible again, as these little cores are very efficient in calculating these exact small problems. Instead of calculating all those repetitive tasks in a handful of CPU threads, you are throwing thousands of shaders to the task.
The theoretical speed advantage is so much faster than using just the CPU (in the same task) that many times there is no merit in combining the CPU in this “loop” – even though many GPGPU engines will allow you to…it will just “burn” electricity. Also, for most intensive GPU tasks, you need at least a CPU thread “open” to feed data back and forth the GPU efficiently, thus occupying the CPU with something else to 100% of its capacity, might even be counter-productive.
Many of the features / options / fx of Vray are actually based on this “biased” methods, thus unavailable (still) to Vray RT GPU. For example you cannot have the latest in VRayblend materials, doesn’t support displacement etc. See more in their support page. Maybe in future versions they will iron everything out as features are added with each generation of the engine, and eventually CPUs will be used less and less in the process, but for most people this time is not “here and now”.
Some serious amateurs and professionals have adapted their workflow and watered their wine down to capitalize on the speed benefits of GPGPU rendering, despite the lack of some features – many times with amazing results – so we are not talking down GPGPU rendering as a gimmick. The contrary. It is just specialized portion of the current ArchViz industry, and it should be part but not the exclusive factor for picking that much more expensive GPU.
What about VRam Buffer? How much RAM do I need on my GPU?
This is pretty complicated to be answered. Each application has different requirements, and each hardware configuration also. Again, the common perception that “more = better” is false, but in general 2GB cards are more than enough even for very complex models, given that the GPU has enough grunt to actually page 2GB of buffer. Many cards don’t, and that is the reason getting that 64bit 2GB card for $60 was not the bargain you were hoping for.
Older yet powerful cards with 1-1.5GB can still serve pretty well for viewports, although you might be pushing it if you try higher resolutions or multi-monitor setups. Very few viewport engines in the ArchViz world can push more than 2GB or VRam, and when they do, don’t expect a massive drop in performance.
For 3D studio, the #1 reason for the Vram buffer to fill up all those GBs is having massive texture sizes and/or thousands of instances using them. Proper grouping of geometry, layers, proxies and/or less demanding settings can alleviate the weight, and the adaptive degradation engines in current 3DS/Maya engines do help massively on top of that. If you are heaving issues with very complex models in cases of large vegetated models and whatnot, it might be the CPU that is bottlenecking the whole system, and not Vram.
GPGPU rendering is a different story, as the whole scene along with all the required assets (mainly the textures) need to fit inside the VRam for it to work. In this case it is a “make it or break it” scenario: if it doesn’t fit, it won’t work, so you will probably have to use less demanding settings, fewer proxies and/or downsample the texture sizes. If GPGPU is a priority, a 4GB card should buy you some leeway for complex scenes. Usually VRay RT GPU users get bugged by other, more serious limitations than hitting a Vram ceiling.
that much more expensive GPU.
It is great to ask people that use the programs you want to use along with the hardware you wish to get, but don’t just fall for whatever is out there in the wild. Many times people just speak loudly to justify their own purchases. Doing some additional research doesn’t hurt anyone – unless of course money is not an issue, and nobody can go wrong with a K6000.