Real-time cloth simulation built on Extended Position-Based Dynamics (XPBD), with parallel CPU and CUDA GPU solvers running inside a from-scratch C++ engine.

Overview

A real-time cloth simulator implementing the XPBD algorithm, prioritizing both physical plausibility and interactive frame rates. XPBD solves constraints in a way that’s independent of time step and iteration count, which keeps the simulation stable under large time steps where force-based methods tend to blow up.

My role in this project was to setup CUDA framework, implement the XPBD solvers in CUDA, and adding texture mapping to the cloth surface. The simulation runs on both a CPU solver and a CUDA GPU solver behind a shared ClothSolverBase interface, making it easy to validate the parallel implementation against the reference CPU path.

Key Features

Full XPBD constraint set: stretching, bending, and shrink constraints for cloth integrity and realistic deformation
Self-collision: k-NN-based self-collision constraints prevent the cloth from passing through itself or other cloth
Collider support: collision handling against sphere and cube / AABB colliders
Spatial acceleration: spatial hashing and a KD-tree for fast neighbor queries during collision detection
Interactivity: runtime manipulation of the cloth and environment with adjustable solver iterations to trade accuracy against speed

GPU / CUDA Implementation

The core of the project is porting the solver to the GPU and making the parallel constraint solve correct and fast.

Authored 7 CUDA kernels spanning the full XPBD step: stretch and bending constraints, sphere/AABB collision, and k-NN self-collision
Used 8-direction even/odd graph coloring to eliminate write conflicts in the parallel solver — resolving race conditions without locking by splitting constraint updates into independent passes
Parallelized bending per 4-particle quad across the entire mesh
Stored particle positions and velocities in contiguous VRAM blocks under a unified device-buffer layout, which is what makes the parallel particle-collision pass efficient
Applied ping-pong buffering so simultaneous reads and writes during self-collision stay hazard-free
GPU threads query a precomputed KD-tree for spatial lookups during collision resolution

Result: scaled the cloth from 16x16 to 64x64 (~16x more degrees of freedom) while holding real-time performance.

CPU

GPU

Rendering

OpenGL renderer with normal mapping and per-frame TBN reconstruction for the cloth surface, plus an ImGui interface for tuning constraints, solver iterations, and scene parameters on the fly.

Takeaways

Building a complete application framework — renderer, GUI, input, and scene management — from the ground up
Understanding Position-Based Dynamics and XPBD, and why position-correction stays stable where force integration doesn’t
Mapping a data-parallel physics workload onto the GPU with CUDA, including the synchronization and memory-layout details that make a naive port either incorrect or slow

References: Müller et al., Position Based Dynamics (2007); Macklin, Müller & Chentanez, XPBD: Position-Based Simulation of Compliant Constrained Dynamics (2016).

XPBD Cloth Simulation with CUDA

Overview

Key Features

GPU / CUDA Implementation

CPU

GPU

Rendering

Takeaways