Skip to main content
GPU DirectX C++

GPU Particles

A GPU-driven particle system using C++/DirectX 11 compute shaders and indirect rendering to simulate 1M+ particles at 60 FPS in two dispatches and one draw call.

GPU Particles

The Goal and Result

This is a GPU-driven particle system built in C++ and DirectX 11, utilizing compute shaders with indirect dispatch and indirect draw call. The result shows that with more than 1 million particles, only 2 dispatch and 1 draw call is used, and it runs 60 FPS on a machine with GTX 1080 and i7-8700K.

Resources Setup

Particle Emitter Buffer

// Struct definition for a single particle emitter
struct ParticleEmitter
{
    float4 m_position;
    float m_emissionRate;
}
// Particle Emitters, only 1 is used for now
StructuredBuffer<ParticleEmitter> gParticleEmitters : register(t70);  // array of particle emitters

Particle emitters are defined in one structured buffer with 2 simple properties of position and emissionRate.

Particle Buffers

// Particle CS buffer 1
RWStructuredBuffer<Particle> gCSMapParticleTarget1UAV : register(u1);  // used by compute shader
StructuredBuffer<Particle> gCSMapParticleComputeTarget1View : register(t71);  // same as above, used by second pass

// Particle CS buffer 2
RWStructuredBuffer<Particle> gCSMapParticleTarget2UAV : register(u2);
StructuredBuffer<Particle> gCSMapParticleComputeTarget2View : register(t72);

To avoid fragmentation, 2 particle buffers are created to do flip flop operation, each one has a UAV and a SRV used by compute shader and vertex shader respectively.

On the CPU side, there’s also a particle definition used to initialize buffers.

struct Particle
{
    Vector4 m_position;
    Vector4 m_velocity;
    Vector4 m_acceleration;
    float m_age;
    float m_lifeTime;
};

When the engine starts, two buffers will be created with initial data of age equals -1.0, indicating that the particle is not active.

Indirect Draw Buffer

Because all particles are drawn in one draw call using indirect draw, an indirect draw argument buffer is needed. It has the type of RWByteAddressBuffer and mics flags of D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS | D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS to allow using as draw arguments and performing InterlockedAdd atomic instruction.

// Indirect draw args
RWByteAddressBuffer gIndirectDrawArgsUAV : register(u3);

Indirect Dispatch Buffer

The number of active particles can vary from frame to frame, and one thread group can only have a max of 1024 threads. Therefore we need an indirect dispatch where the number of groups is determined by another compute shader. In this case, it’s determined by adding the particle count last frame with new particle count this frame. The setup of this buffer and shader view is similar to the indirect draw buffer.

// Indirect dispatch args
RWByteAddressBuffer gIndirectDispatchArgsUAV : register(u4);

Pipeline Setup

The pipeline consists of 2 dispatches and 1 indirect draw call, they are separated into 3 effects(shader combination group) in a single render group.

if (iEffect == 0)
{
    // setting up particle system, reset instance count
    // spawn new particles, update instance count

    // for now only support one particle emitter
    pDrawList->setDispatchParams(Vector3(1, 1, 1));

    ParticleSetBufferGPU::bindUAV1(*m_pContext, m_arena, pDrawList);
    ParticleSetBufferGPU::bindUAV2(*m_pContext, m_arena, pDrawList);
    ParticleSetBufferGPU::bindEmitterSRV(*m_pContext, m_arena, pDrawList);
    ParticleSetBufferGPU::bindDrawArgsUAV(*m_pContext, m_arena, pDrawList);
    ParticleSetBufferGPU::bindDispatchArgsUAV(*m_pContext, m_arena, pDrawList);
}
else if (iEffect == 1)
{
    // calculate existing particles, update positions and velocities, and update instance count
    pDrawList->setDispatchIndirect();
    addSAs_ParticleEmitter(pDrawList);
}
else
{
    // perform the indexed instanced indirect draw
    addSAs_ParticleEmitter_Pass2(pDrawList);
}

On the first dispatch, the instance count of the draw call buffer will be resetted. New particles emitted will be added to the particle buffer. Instance count will be updated accordingly.

Then the indirect dispatch argument will be updated so the next compute pass has enough groups for all active particles.

First dispatch

On the second dispatch, all existing particles will be updated from one buffer to another, particles that are inactive or become inactive will be ignored. The instance count will also be increased.

uint originalIndex;
gIndirectDrawArgsUAV.InterlockedAdd(4, 1, originalIndex);

After this update, all active particles will live in index 0 to n, and the draw argument instance count will be n, where n is the number of active particles. An example particle buffer is shown below, note that new spawned particles will have age of 0.01, in this frame 3500 particles were spawned.

Buffers after second dispatch

Finally when particles are ready to be draw, an indexed instanced indirect draw call will be performed.

void IndexBufferGPU::drawIndirect(ID3D11Buffer* buffer, PrimitiveTypes::UInt32 bufferOffset)
{
#if APIABSTRACTION_D3D11
    D3D11Renderer* pD3D11Renderer = static_cast<D3D11Renderer*>(m_pContext->getGPUScreen());
    ID3D11Device* pDevice = pD3D11Renderer->m_pD3DDevice;
    ID3D11DeviceContext* pDeviceContext = pD3D11Renderer->m_pD3DContext;

    pDeviceContext->IASetPrimitiveTopology((D3D_PRIMITIVE_TOPOLOGY)(m_apiTopology));
    pDeviceContext->DrawIndexedInstancedIndirect(buffer, bufferOffset);
#endif
}

Another set of shaders are created that can index into the particle buffer using instance id and then change the vertex position. Following image shows drawing all particle instances at one draw call.

Indirect draw call

Performance

Comparing before and after adding the particle system, it has a minimal impact on the performance, running smoothly at 60 FPS on a machine with GTX 1080 and i7-8700K.

Before adding particle system

After adding particle system, fps is still 60+

The task manager also shows that the program takes most power from GPU while keep CPU mostly free.

Result shows GPU is doing all the calculation