Multithreaded PRMan

Multithreaded PRMan

September, 2009 (Revised August 2015)

Introduction to Multithreaded PRMan

Pixar's RenderMan supports multithreaded rendering. Within a single prman invocation, multiple threads will process the image simultaneously. If there are multiple processing units available this will usually result in a faster time to completion for a given set of input data. The primary advantage of multithreaded rendering over multi-process rendering (which is the method employed by netrender) is a smaller memory footprint and reduced shading computation. When multi-processing, each process will consume nearly the same amount of memory per frame as a single process would. With multithreading, the total memory footprint will be significantly lower.

By default, prman will determine the number of processing units available and will operate in multithreaded mode. A single invocation of prman will only consume one license by default and will utilize all the processing units on a given host machine. The user can override the default behavior by specifying the number of processing units that will be used with the -t:n option, where n is the number of processing units the user wants the renderer to utilize. The user can also change the default number of processing units via the rendermn.ini file:

/prman/nprocessors  1

This is useful if one wants to override the default behavior, which queries the system for the number of processing units available. If the nprocessors setting is found, it will use that setting instead of querying the system.

Performance Expectations

prman in multithreaded mode should almost always result in faster render times than prman in single-threaded mode. The amount of speed up is highly scene dependent, as different scenes involve varying usages of resources that must be shared between processing units (RAM and disk). As a general rule of thumb, the scalability of multithreaded rendering improves with more complicated scenes, which is fortunate since it is exactly those kinds of scenes that most need the improved performance.

When using prman in multithreaded mode, one has to reconcile real-time (also know as elapsed-time) statistics with the user-time statistics. When multithreading, user-time will be the total amount of processor time utilized by all processing units on the system. This will normally be more than real-time because multiple processing units will be active simultaneously.

In multithreaded mode prman should use less memory per scene than multi-processing mode. However, some of the rendering subsystems will utilize more memory in multithreaded mode than single-threaded mode to maintain performance. One subsystem that will utilize more memory is texturing (both 2D and 3D). The texturing system will create texture caches per processing unit that will consume slightly more memory than an invocation of prman that utilizes only a single processor. Likewise, ray tracing will create a geometry cache per processor and will consume slightly more memory in multithreaded mode than in single-threaded mode. Finally, hiding can occur simultaneously in multiple buckets, which will lead to increased visible point memory footprint; this is noticeable especially for scenes with high depth complexity.

Performance Tuning

There are several options that can be used to control the performance of multithreaded prman. The first option, of course, is -t:n, where the user can specify n processing units to be used. Performance should increase with the number of processing units utilized. Specifying more processing units than are physically available on the system (often called overscheduling) will most likely result in a slower render time.

The number of threads can also be explicitly set directly in a RIB file, using:

Option "limits" "int threads" [3]

This is an advanced Ri option; it currently accepts an integer between 1 and 32, with the number of processing units on the executing host being the default.

Another option that can be used to improve the efficiency of multithreaded prman is the bucket size. The can be controlled with the option:

Option "limits" "bucketsize" [16 16]

As in the single-threaded case, the renderer bucket size controls the trade off between speed efficiency and memory usage. Increasing the bucket size will generally result in a faster, more efficient render, at the cost of increasing memory utilization. Decreasing the bucket size will decrease the efficiency but also decrease the memory footprint. In the multithreaded case, this is still true, and in fact, the effects are magnified. Increasing the bucket size will decrease the likelihood of contention between threads for shared resources (increasing speed and efficiency), while at the same time increasing the amount of memory used. In particular, note that the amount of visible point memory is directly proportional to the bucket size times the number of threads. This is an issue for scenes with high depth complexity. For multithread rendering, we recommend the same basic guidelines as for single-thread rendering: leave the bucketsize limit at the default setting unless memory consumption becomes an issue, at which point bucketsize should be decreased gradually until memory consumption becomes acceptable.

The ray tracer is tuned to be very effective when multithreading. It will allocate by default a 200 Mbyte geometry cache per thread. This is controlled with the option:

Option "limits" "int geocachememory" [204800]

If a smaller cache per thread is required, it can be specified with this option, but this will significantly impact the speed of ray tracing in a multithreaded render.

Plugins

If a scene employs shaders that use old-style RSL plugins, those should be ported to the new format. Old-style RSL plugins will cause the multithreaded render to lock. This only allows the execution of one old-style RSL plugin to occur at a time, which can significantly impact the effectiveness of the multithreaded renderer.

When writing RSL plugins or procedural plugins, care should be take to limit the number and the scope of any locks used to protect global data from simultaneous access. Plugins should try to avoid using global data, because each lock can significantly reduce the performance of the multithreaded renderer.

For more information on re-entrant procedurals, please see the Attributes reference documentation.

Additionally, plugins should avoid spawning their own threads. This can lead to detrimental over-subscription and make render farm management difficult. If it is required, then calls back to the renderer should only be done from the original renderer-spawned thread. In order to minimize lock use the renderer employs thread local data keyed to the specific threads it spawns. Executing renderer calls from other threads that do not have this data may lead to errors and crashes.

For example, if a procedural hair generation plugin were to be called from prman thread P, and blocked P while running only 1 thread W from its own managed thread pool, this would not cause over-subscription. However, that thread should only perform operations on data local to the plugin. If the worker thread were to call back into the renderer to read a brickmap via the Bkm API or render a brickmap using an Riジオメトリ call, that file access from a foreign thread would corrupt the renderer's careful accounting and recycling of file descriptors. This can lead to a hard-to-diagnose crash when opening other types of files much later in the render. If that same plugin is reentrant and instead tries running 5 worker threads in a -t:n multithreaded run, it could lead to n*5 threads tasks trying to run at the same time. This could lead to heavy task over-subscription of computation or I/O resources, ultimately slowing down renders.