SamuKata
vercidium
vercidium

patreon


Raytraced Audio - No More Voxels

Since I posted my Raytraced Audio video back in April, nearly everything has changed:

The video below showcases a few of these features. I plan to add more wildlife and sound effects to this forest sandbox and release it as a demo for you to try yourself. Stay tuned!

In this post we'll talk about the first major change: voxels.

No More Voxels

The original raytraced audio system was built for Sector's Edge, which had a voxel environment. Casting rays through voxel worlds is incredibly fast, and meant thousands of rays could be cast in real time, even on older laptops.

So when I extracted the raytracing system out of Sector's Edge, it made sense to keep all the original voxel code. It already ran great, all I needed to do was convert non-voxel worlds to voxels.

I spent a couple months on this and created a system that could convert meshes and low poly shapes into voxels in real time on background threads. But I always had the thought in the back of my mind of "how will this work for large outdoor environments?".

There were two issues with the voxel approach

So I started creating a new system that casts rays against low poly 3D primitives (triangles, spheres, prisms, etc) without converting them to voxels. This has a few benefits:

At first this new system was about 6x slower than voxel raymarching, but after applying the optimisations below, it ended up 8x faster.

Optimisation 1: Bounding Volume Hierarchies

Originally, when casting a ray I was checking if it intersected with every object in the world, which becomes very slow when more objects are added. To alleviate this, I organised the objects into a Bounding Volume Hierarchy (BVH).

This data structure groups objects by their 3D position, so that rays only intersect with nearby objects, rather than every object in the scene.

There are many articles that explain BVHs already, so the quick explanation is that close-by objects are grouped together within a box. If a ray doesn't intersect with the box, it's guaranteed not to intersect with any object inside it. This reduces the amount of objects a ray needs to intersect with.

For most scenes, performance was on par with voxel raymarching. But when objects were more spread out, the voxel approach was still about 3x faster.

Optimisation 2: Trails

This next optimisation completely left voxel raymarching in the dust. It significantly reduces the amount of rays that need to be cast, and can also scale onto compute shaders.

Let's say our world has 30 sounds and 1 listener, and we're casting 1000 rays with 8 bounces each. This means we need to cast 248,000 rays total, to calculate reverb properties and figure out how occluded each sound should be. This is far too many rays for the average CPU to cast in real time.

This new trail-based system reduces the amount of rays that need to be cast each frame by 85% by:

With all optimisations applied, the new raytracer is about 8x faster than voxel raymarching.

Let's dive into how it works. Each time a ray bounces, it checks for line-of-sight to the listener and each voice. It also casts a permeation ray to each voice on the 1st bounce only (configurable).

The naïve approach casts all of these rays every frame, but this is wasteful.

If we look at the full path the ray took, we can break it into four parts:

This green trail will form the foundation of all our raytracing data and optimisations.

The second half of this post dives into the optimisations that power this new trail-based raytracing system, and is available to Patrons only.

Next Post

The addition of materials has opened up many opportunities for increasing the realism of this raytraced audio system. In the next post I'll talk about how I used materials to implement a new energy-based reverb, occlusion and permeation system.

This new system produces nearly identical results with 1024 vs 32 rays, which means the minimum required hardware can potentially be reduced significantly.

For paid Patrons - continue reading!

Trail Building

The new system creates a trail for each ray, which stores the position of every bounce, the object it bounced off, and how much energy the ray lost (depending on the object's material).

Once this trail has been constructed, it can be reused until one of the following events occur:

Event A - Reflection Updates

Let's say the sphere on hit 6 moves up and down. The first 5 hits in the trail remain unaffected, but the rest of the trail needs to be updated:

We don't need to cast any rays to determine which objects have moved. We can simply set a 'dirty' flag on an object when it moves, and then loop over each hit in the trail to check if the object is dirty. If so, we'll 'trim' the rest of the trail, and cast rays 7 and 8 again.

Event B - Intersection Updates

If a new object is added, or an existing object moves, we need to check if it intersects with any part of the trail.

We only need to check against objects that are moving - not the entire BVH. That's because the trail has already been constructed by raytracing against the entire BVH. This makes it very quick to test for new intersections.

If a new intersection occurs, we'll trim the trail from that point onwards, and then rebuild the trail. Note that rebuilding the rest of the trail requires intersecting rays against the entire BVH, because they are now reflected in different directions.

Event C - Listener Updates

If the listener moves, we need to check if we still have line-of-sight between the listener, and the 1st hit in the trail. If not, or if the listener has moved a significant distance away from the original raycast position, we must recalculate the whole trail.

But if the listener has only moved a little bit, the trails are still valid and can be re-used. Note the reflection angles remain stable even though the listener is moving:

Voice Occlusion

Now that we have a complete trail, we can perform line-of-sight (LOS, yellow rays) checks by casting rays from each point in the trail, to each voice in the scene:

Similar to trail trimming, these LOS checks are also cached and stored in the trail. To check if these LOS checks are still valid, we'll take two different approaches.

To validate a successful LOS checks, we only need to check if any new/dirty objects intersect with the yellow ray. If not, it's still guaranteed to have line-of-sight with the voice.

To validate a failed LOS check (red line), we'll keep track of the primitive that blocked our line-of-sight:

If the voice itself moves:

The reason we can perform these simple validation checks is because LOS is either a pass or fail, so we only need to check if one object is breaking line-of-sight.

Plane Checks

When a ray bounces off a surface, we can create a fake plane that aligns to the normal of the surface (blue dotted line):

Because the reflected ray will be bouncing away from this dotted line, it's guaranteed that it won't intersect with anything on the other side (blue area) of the plane. This means:

This effectively cuts the scene in half on each bounce, and reduces the amount of ray-intersection tests we need to perform.

LOS Compute

This raytracing system is now split into two stages: trail building, and line-of-sight checks. Trail building contains the complex bouncing logic and will likely always run on the CPU.

Whereas all these LOS checks can theoretically be organised into an array of start + end positions and loaded onto a compute shader. Then all of these LOS rays can be fired on the GPU in parallel.

This feature won't be in the first C# SDK release, but I'm planning to work on it soon. I'm also unsure if the latency between the CPU and GPU will affect this; it may be faster to keep it on the CPU.

Reduced Playback Latency

All trails are stored in memory and refreshed in real-time to ensure they are always up to date. This means when a new sound is played, we can instantly perform LOS checks from every position in every trail. This reduces the amount of work needed to initialise new sounds, and reduces the delay before they can play.

Summary

When all of the above optimisations are combined, a huge number of rays can be cached. In the forest demo at the top of this post, it used to take 3.5ms to cast all rays with voxel raymarching.

With the new system, when the camera is moving around, 85% of rays are cached and it takes 0.4ms to refresh all trails + LOS checks. When the camera is still, 100% of rays are cached and it takes 0.06ms to refresh. This means the new system is:

Next Post

The addition of materials has opened up many opportunities for increasing the realism of this raytraced audio system. In the next post I'll talk about how I used materials to implement a new energy-based reverb, occlusion and permeation system.

This new system produces nearly identical results with 1024 vs 32 trails, which means the minimum required hardware can potentially be reduced significantly.

Thanks for reading!


More Creators