Hey folks, welcome to the digest.
Gaussian Splatting is a technique for representing real-world 3D scenes as a point cloud.
The result is similar to a point cloud generated from a LiDAR scan, but with fewer artifacts. This is remarkable given that unlike a LiDAR scan, it doesn’t require depth information, and unlike NeRFs, it doesn’t use a deep neural network.
Two browser-based gaussian splat viewers have popped up recently, one doing all the rendering client-side and another using pixel-streaming.
Here’s a WebGL gaussian splat viewer from Kevin Kwok. Its README contains a great discussion of the rendering technique.
Rendering gaussian splats is tricker than a typical point cloud.
In a typical point cloud, the points are opaque. When rendering opaque things, GPUs use a trick where they keep track of both the color and distance (depth) of each pixel. To draw each point, they calculate the pixel it affects, and then check whether a nearer point has already affected that pixel.
This gives a correct result regardless of the order of the points.
With gaussian splats, this doesn’t work, because the points are translucent. To render them correctly, they need to be processed back-to-front. Since the order depends on the camera position, this means the points need to be re-ordered whenever the camera moves.
Kevin points out that most Gaussian Splatting renderer implementations do the sort directly on the GPU with a bitonic sort.
In order to target browsers that do not yet support compute shaders (via WebGPU), Kevin instead opts to sort the points on the CPU. Since this is slow (~4 fps), sorting happens outside of the main render loop in a background thread via a WebWorker.
To keep rendering at a smooth 60fps, the GPU side doesn’t wait for the new point order, and instead uses the last available order. This means that most frames are actually rendered with a stale point order. Kevin takes advantage of the fact that unless the camera has moved a lot, this stale point order will still produce an image reasonably close to the “correct” one.
You can see artifacts of this approach if you swing the camera around wildly, but even then it will self-correct in a few hundreds of milliseconds when the CPU finishes re-sorting the points.
Dylan Ebert of Hugging Face has built another browser-based viewer (repo) that uses pixel streaming over WebRTC instead of running entirely in the browser.
Dylan wrote a short twitter thread about the technique, which takes advantage on-GPU video encoding. This tracks with our own experiments with remote rendering at Drifting, where we found that CPU-side encoding was often a bigger source of latency than rendering or network.
Dylan’s explanation of gaussian splatting is also a great two-minute introduction to the technique.
Here’s a SyntaxFM episode with Andrew Lisowski about migrating Descript’s Electron app into the browser.
Some notes I took:
It’s a good episode, give it a listen.