Skip to main content
GPUCUDAC++Cloth SimulationOpenGL

XPBD Cloth Simulation with CUDA

Real-time cloth simulation built on Extended Position-Based Dynamics (XPBD), with parallel CPU and CUDA GPU solvers running inside a from-scratch C++ engine.

Overview

A real-time cloth simulator implementing the XPBD algorithm, prioritizing both physical plausibility and interactive frame rates. XPBD solves constraints in a way that’s independent of time step and iteration count, which keeps the simulation stable under large time steps where force-based methods tend to blow up.

My role in this project was to setup CUDA framework, implement the XPBD solvers in CUDA, and adding texture mapping to the cloth surface. The simulation runs on both a CPU solver and a CUDA GPU solver behind a shared ClothSolverBase interface, making it easy to validate the parallel implementation against the reference CPU path.

Key Features

  • Full XPBD constraint set: stretching, bending, and shrink constraints for cloth integrity and realistic deformation
  • Self-collision: k-NN-based self-collision constraints prevent the cloth from passing through itself or other cloth
  • Collider support: collision handling against sphere and cube / AABB colliders
  • Spatial acceleration: spatial hashing and a KD-tree for fast neighbor queries during collision detection
  • Interactivity: runtime manipulation of the cloth and environment with adjustable solver iterations to trade accuracy against speed

GPU / CUDA Implementation

The core of the project is porting the solver to the GPU and making the parallel constraint solve correct and fast.

  • Authored 7 CUDA kernels spanning the full XPBD step: stretch and bending constraints, sphere/AABB collision, and k-NN self-collision
  • Used 8-direction even/odd graph coloring to eliminate write conflicts in the parallel solver — resolving race conditions without locking by splitting constraint updates into independent passes
  • Parallelized bending per 4-particle quad across the entire mesh
  • Stored particle positions and velocities in contiguous VRAM blocks under a unified device-buffer layout, which is what makes the parallel particle-collision pass efficient
  • Applied ping-pong buffering so simultaneous reads and writes during self-collision stay hazard-free
  • GPU threads query a precomputed KD-tree for spatial lookups during collision resolution

Result: scaled the cloth from 16x16 to 64x64 (~16x more degrees of freedom) while holding real-time performance.

CPU

GPU

Rendering

OpenGL renderer with normal mapping and per-frame TBN reconstruction for the cloth surface, plus an ImGui interface for tuning constraints, solver iterations, and scene parameters on the fly.

Takeaways

  • Building a complete application framework — renderer, GUI, input, and scene management — from the ground up
  • Understanding Position-Based Dynamics and XPBD, and why position-correction stays stable where force integration doesn’t
  • Mapping a data-parallel physics workload onto the GPU with CUDA, including the synchronization and memory-layout details that make a naive port either incorrect or slow

References: Müller et al., Position Based Dynamics (2007); Macklin, Müller & Chentanez, XPBD: Position-Based Simulation of Compliant Constrained Dynamics (2016).