CUDA Boid Simulation
I implemented a boid simulation in CUDA, leveraging GPU parallelism.
Results Showcase
Below are GIFs showcasing different implementations with 100,000 boids.
Naive
The naive implementation is the least efficient, as each boid simply loops through every other boid to update its position and velocity. This results in the lowest FPS (~11.5).
Scattered Grid
In comparison, the scattered grid implementation demonstrates a notable improvement by employing the uniform grid data structure. This optimisation increases the FPS by ~5166%..
Coherent Grid
The coherent grid implementation further optimises the algorithm by sorting the position and velocity buffers to improve memory access patterns. This increases the FPS by ~3.63% from the scattered grid implementation.
Coherent Grid with Shared Memory
Finally, by utilising shared memory for each individual grid cell, the FPS increases by another ~3.86% from the coherent grid implementation.
View the Source Code
Below is the portal to my GitHub repository. I have included a performance analysis and additional details in my README.md.
References
https://developer.nvidia.com/cuda-toolkit