NanoGL: a crafted WebGL(2) microframework
Here at makemepulse we’re often tasked to produce complex and outstanding pieces in WebGL running smoothly on a variety of devices. WebGL, WebVR, AR, you name it… We always try to push ourselves and the browsers/devices to give our users the best experience possible. We’re also often asked how we do it. So today, we’d like to talk a bit about some of the tricks we use, including a framework we have developed specifically, to help us build WebGL productions.
Why
We’ve been using Three.js to make 3D for a long time, but as a very high level library many things are hidden to the developer. Sometimes if you want some specific behavior/effects, or more control over the renderer, it can be very tricky to get what you want and keep performance high. Also, we often don’t really want to load a whole engine to display few 3D objects on the screen, which is the main reason we decided to drop it and build our own microframework, NanoGL: a bunch of light-weight helpers wrapping the WebGL/WebGL2 API maintained by our very own @_pil_.
Disclaimer : don’t get us wrong : this is not a comparison nor a “you might not need Three.js”. Three.js and other engines are great tools and we can’t be more thankful for them!
To us, generally speaking, to make complex things run fast there’s no shortcut: you have to know what’s going on under the hood. So, with a determination to write low-level code with performance front-of-mind, we built an ecosystem of helpers in NanoGL.
Jump to the npm repository to see all the NanoGL modules here.
After these recent projects, we’ve been able to clearly articulate the benefits of NanoGL:
- Light codebase
- Low memory print
- Huge control/flexibility due to low level philosophy (in the limits of what WebGL can offer)
- Easy to load and deal with complex 3D scenes
- Same codebase for desktop, mobile, VR with zero or few optimisations
- Easier to debug
- Huge performance boost
Caveat: we use the word ‘crafted’ in title of this article because each project is different and requires specific code and it’s important to note we’re not trying to create an engine that will handle every case in the world.
The Core: gotta go faster
At the centre of our microframework, you’ll find four essential components we recycle within many of our more ‘high-level’ components:
If you’ve ever tried to write an entire app with the raw WebGL API, you know it’s a damn pain. The idea with these components was to make a less verbose WebGL/WebGL2 code; a friendly API for the team.
ArrayBuffer / IndexBuffer
Basically the geometry. Managing attributes easily (and while rendering, binding them seamlessly) were the key goals:
Program
We have a geometry now and want to display it. Here comes the program, the heart of any material. The aim of this component is to access and manage your shader and its uniforms easily:
Bonus : Program automatically detects what uniforms are used and their type.
Texture
Like a lot of stuff in GL programming, you often have to deal with the GL context. Textures are no different and it always feels more comfortable to deal with a “texture” object instead of the GL context
Framebuffer Object (FBO)
FBOs are probably one of the most irritable things to deal with in GL programming. At the end of the day, you just want to bind it, draw on it and get the result. It was with this in mind that we created the FBO component:
These four components are the smallest building blocks, the core, and with just this in place we can already start to sketch some good stuff! Here’s an example (made with only those components and also nanogl-node and nanogl-camera):
However, if you need to do more advanced tasks you still have to write plain WebGL (instancing, transform feedback, etc.)…
The Rest: take it higher
On top of the building blocks we implemented some “high-level”(ish) features in order to provide an even smoother experience to the user:
- Fast models loading
- Using GPU-ready textures
- Ghost loading assets
- LOD dithering
- GLState
Fast models loading
At some point you’ll need to load more or less heavy, complex models and at the moment it’s important to make it fast/ limit loading times (thinking about memory consumption here doesn’t hurt either).
Last time, we talked about the AWD binary format, keeping various information about your mesh (node tree, materials, bounds, animations…) and since then, we’ve looked at the glTF format(version 1.0) and Draco.
While the glTF format is very similar to the AWD format, we found Draco very interesting so, we chose to go even further into AWD and developed an evolved version: AWD-Draco.
On average, the gain is at least 50% lower:
Try it here and let us know your results.
Mind. Blown. *boom effect*
Suddenly we can deliver 3D models to the web faster than ever. However, it requires more CPU computing than other versions. The side effect is that if you load an asset from browser’s cache, AWD-Draco will be longer to display. You can address this issue by loading a normal AWD version if the visitor isn’t loading your content for the first time. This side effect applies to mobile as well.
Using “GPU-ready” textures
It may appear insignificant, but using the JPG/PNG format all over a complex textured 3D scene means performance drops.
Both formats, commonly used on the web, are not understood by the GPU and involve decompression work before uploading and depending on the file size, it can take more or less time. Also, it’s important to be aware of the GPU VRAM. I won’t go into the deep details (there’s so much, it could actually be a subject of an article by itself…) but what you need to know is that using JPG/PNG will take more space into the VRAM.
The browser will convert those formats into something that the GPU can digest. But as it has to make it fast, the data won’t be optimised… This leads to more texture data in the cache and this leads to bad performance. That’s why GPU constructors created special formats that can be decompressed directly by the GPU.
We use:
- DXT1 format for desktop
- ETC1 format for android devices
- PVRTC formats for iOS devices
- JPG/PNG if none of the formats above are supported
Those formats can be decompressed directly by the GPU, so the browser doesn’t have to do it. Straight to the point.
Here a comparison of using these formats over the JPG/PNG:
The differences in performance are huge. You can see that loading so many JPG textures actually blocks the main thread / the framerate drops…
Compress we say! Compress all the things!
On a side note, GPU-ready texture file sizes may be above the regular JPG/PNG. But once on the VRAM, this size will be way lower — 6 times lower — up to 12!
Ghost loading assets
So, now you have GPU-ready textures, but it’s longer to load over the network. Indeed, sad news, but one way to deal with it is to “ghost load” the assets.
The idea is to load low quality textures first to get the user content to play with as fast as possible and from then, you have time to load proper textures without a hurry.
Once the HD textures are loaded, just replace the gl texture with the new HD data. Pretty basic principle, but oh, so effective:
It can be a bit tedious. So automating it in your workflow can be very helpful.
LOD
With complex and charged scenes it’s important to limit the number of vertices, but at some point when you want detail, you have no choice but to go big. In those cases we use level of detail (LOD) per mesh.
Via a 3D software we generate two or three versions of a mesh:
- the full HD mesh
- 50% version
- 10% version
The idea being to render a version depending on the distance between the camera and that mesh:
There’s no real rule to set the ‘breakpoints’, so it’s up to you, but the performance can drastically improve with this technique.
In addition, we add some transition between the two states to make it transparent to the user:
GLState
the nanogl-state module is also worth a mention. A state machine manager for WebGL contexts, it’s designed to be lightening fast (meaning unreadable code).
Shifting the context and making API calls can lead to some performance drops so it’s often a good idea to do the overhead work on CPU side:
In addition, it’s also up to you to optimize your renderer and to group draw calls by material to avoid too much gl state shifting.
Upgrade of the first demo:
NanoGL PBR
Before
Previously we were on our own when it came to achieving realistic and accurate rendering.
We would tweak values to get a satisfying results, store all lighting information in textures and as soon the environment and lighting changed it was a question of tweaking again, changing textures again… so painful :( With complex scenes, it can take a lot of time to adjust every single asset.
Today
Today, some smart people made new models equations to simulate real life materials. Those techniques are better known as ‘Physically Based Rendering’ (PBR). This shift is huge, mostly because it’s introduced a the concepts of a common vocabulary between artists and developers.
There’s no real standard, but many engines like Unreal Engine / Unity / Marmoset / Sketchfab etc… tend to converge on the same ideas.
https://www.marmoset.co/posts/basic-theory-of-physically-based-rendering/
We also have our very own version: nanogl-pbr.
The goal with this is to be able to produce kick-ass renders and at the same time, hide the complexity and be very modular. We iterate on it project after project for more and better realism. Naturally it became our pièce de résistance and we tend to use it for most things these days.
We built this component on the following ideas.
The inputs transmit data to the shader regardless of its nature: a texture, a constant, a vec3… The benefit here is the modularity. You can pick up whatever input you need and add it to the material. That way, we avoid a supercharged uber shader being non performant: once the compile function is triggered, it generates a custom shader depending on the inputs, the lighting, etc.:
This module works best with our workflow, but have a look / play if you’re feeling adventurous!
Quick look at the different components of our PBR workflow:
Post processing
Generally speaking, in graphics, the most beautiful scenes are often made up with some screen-space effects : post-processing.
The principle is fairly simple: draw your scene to an FBO (seen previously in the article) and then use its texture (it’s colour) to operate pixel operations.
Here’s the big picture :
But there are performance drawbacks with this as soon as you have multiple effects (and multiple FBOs), especially on mobile. Therefore, a good practice is to merge, as much as possible, the effects in one pass:
And we created a helper to get back the flexibility lost when everything is merged too: nanogl-post.
Here some sample code :
And below the demo, where you can observe performance when you use multiple passes versus merged passes:
Demo : burn your computer here
Demo or it didn’t happen
Sure ! Inspired by famous material demos, we built our own:
demo :
- LD (mobile only) / MD / HD / HD with bloom
If you fancy trying the demo with only JPG textures, there you go :
Conclusion
Creating stunning, almost photo-real content on the web is absolutely possible today and you can do it in a way that offers the same premium experience on any screen or device too, but you constantly have to think about the performance cost and this is hard if you use libraries/helpers that you didn’t write (not impossible).
Yes, it’s a lot of work to fully understand the pipeline and API… but we truly believe investing that time will mean you get to reap big rewards. Ultimately, it’s this attitude that led us to develop NanoGL: a lightening quick, uber optimised tool we now use in 99% of our productions. The investment is worth it even if, eventually, you end up back to with a more high-level library because you would go back with superpowers 💪
So, there you a pretty good idea of our microframework NanoGL (and maybe some WebGL mysteries demystified)? Whilst we can’t share all our secrets we hope these tips are valuable. For more detail still, please have a read of our behind the scenes articles or reach out on Twitter. We are always happy to hear from you.