As I mentioned in a previous post, my wife plays this colour-by-numbers game that has a simple user experience but charges a high subscription price for content. That didn’t seem appropriate for what it was, but it seems the colour-by-numbers space suffers from price inflation. My theory is that they pay artists or moderators to produce the artwork. If that isn’t the case i don’t see how the price could be justified (£15-20 a month).

I’m not an artist myself, I may have a little artistic flair, but not skilled enough to produce original artwork to the required pedigree. Instead I will lean on algorithms and machine learning to source royalty-free images and transform them into colour-by-numbers canvases.

The first step for me is to get into building the application to allow a user to colour-by-numbers. This will help me to decide what format the source artwork needs to be in order to render efficiently on a mobile device.

I knew I wanted the game to be cross-platform and easy to develop on a circa 2011 MacBook Air. That sounds like a task for a hybrid application!

Looking at the landscape for 2d web game engines, there isn’t really that much out there at this time. One clear favourite emerges though and it’s pretty good - pixijs.

Starting out with pixijs is pretty straight forward. Quickly though you realise everything is left up to you. That’s a bit too much, I don’t want to create an entire game engine from scratch. Maybe it will turn out that way, but at this point pixi gets the job done and there’s not tonnes to learn about before being able to build something.

The big elephant in the room is phaser, built on top of pixijs it does come out of the box with much more to offer. However I found phaser awkward to use, it also doesn’t seem to have the bits that I would want to leverage. So what is it that I want to be able to do in the first place?

Well the main point of the game is to have a canvas with a bunch-o-squares (let’s call them game cells) that you colour in by selecting a fill colour from a numbered palette and click/tap on cells with the same number as the selected colour. Simple. However the artwork is going to be too small for a user to accurately tap on individual cells when the whole image is visible at once, so we need to be able to zoom. Once you can zoom you’ll probably also want to pan around to move to different parts of the artwork. So that describes a little more of the challenge.

Even something as simple as this needs some thought about performance. But let’s not optimise before we know what we are dealing with.

My first attempt was to render a simple grid of pre-coloured cells. Being new to pixijs and webgl I wasn’t too sure about “the best way”. Pixijs excels at raster graphics, but I will need to generate those programmatically as cells can have many different and unknown colours (if you know pixijs you are probably already thinking there is a simpler way). Since a cell is a bordered square I leveraged the Graphics object to draw a filled square in the chosen colour. Something like:

const graphics = new PIXI.Graphics();
graphics.lineStyle(0, 1);
graphics.beginFill(colour);
graphics.drawRect(0, 0, width, height);
graphics.endFill();

This seemed to work well for my small initial render. Then I tried to scale it up.

I envisage artwork perhaps 128x128 cells maximum (that’s 16384 cells to colour, which is probably more than enough). So I gradually increased the number of cells and watched the frame rate plummet.

Running the profiler revealed that a lot of time was being spent drawing lines. It makes obvious sense when you think about how game rendering tends to work. But if you’re not used to it then you don’t think about it in the same way. You may think a static collection of cells that haven’t changed shouldn’t need much work, but it does, because every frame the cells are re-rendered from scratch and that means redrawing all of those edges.

We can do better, that first approach was naive and there’s plenty of room for improvement.

Firstly how do we avoid that expensive line drawing. As I said earlier pixijs excels At raster graphics, but by drawing lines and shapes we are not leveraging raster graphics. Instead we need to take our unchanging cell and turn it into a raster image. PIXI can help here as it lets you cache any display object as a bitmap. That’s not quite good enough for us though. If we did that for all our cells, each cell would essentially be it’s own texture and all of those textures would have to be sent to the GPU, as good as GPUs are they can’t store 16000 textures and access them all quickly.

What we do have though is lots of cells that look the same, so maybe we can create one texture and reuse it. That is certainly the case with the unfilled cells, but filled cells have many different colours. Worst case is to have a texture per colour, we also need textures for the numbers in the cells. That means each cell is actually 2 textures - the background fill + border and the text. With this approach performance is much better and decent enough for 64x64 artwork. The pain though is with having multiple textures for different colours, there must be a better way.

The holy grail in this situation is to have the GPU render the whole thing using a custom shader. I don’t have the knowledge to do that at this stage. The idea comes from this blog post that describes a source image that encodes a tile map as a small image whose pixel values describe an indirect lookup to a texture and a texture atlas containing the textures. The shader then looks up the texture in the atlas for each pixel in the source and can scale that up. That’s my understanding anyhow. It’s essentially what I need, but my tilemap is dynamic so that source image would need updating when cells are filled in and the text still needs to scale well too - something much harder to do in a shader.

In reading up about this approach I discovered the ability to tint a pixi display object for nearly free as it can be done with a simple shader that PIXI already has. Tinting will change the colour of white pixels completely to the tint color and then obviously any variation will be tinted accordingly. Fortunately my cells start off as white (full tint) and black borders/numbers (no tint). So that means I could draw a white cell with my black border as a single texture and then tint it for different coloured cells.

This approach seems cleaner, there’s a single filled texture that gets tinted, an unfilled texture with a border and a texture for each required number. To boost this a little more we can use a texture atlas so there’s a single texture sent to the GPU and PIXI will take care of using that atlas in a simple way.

Performance is still not quite at the target of smoothly rendering 128x128 art. The problem is there’s still too much to render each frame. So we need a way to reduce this - what is known as culling. We reduce the work the GPU has to do for a frame by looking at what is actually visible on the screen. In PIXI this process is left up to you - by marking a display object as not visible it is essentially culled from the render pass.

Being in a static layout makes this job fairly straight forward, we can calculate the index bounds of the cells with a bit of simple maths and then loop through those bounds to mark the cells as not visible. That’s why PIXI leaves it up to the developer - for this project it’s really simple, but for projects with dynamic moving objects that probably needs a general purpose quad-tree, or maybe not if there’s some deterministic specialised way to calculate the intersecting objects.

This is fine at zoom, but what about when you zoom out to view the whole picture?

In this scenario you’re zoomed out such that every cell is visible, but it doesn’t have to be inter-actable. We can say that maybe once a cell is a certain size we can render things differently.

To understand what I mean I need to explain that we want sharp text when zoomed in, so those pre-rendered textures are big (100 square pixels). Take that and multiply it by how many cells there are and you end up with needing a single texture that’s much too big to fit on the GPU. So what’s needed is a pre-rendered scaled down version of the whole artwork. We can calculate what size we need each cell to be to fit into a single texture and once we reach an appropriate zoom we can switch from lots of renders to a single sprite object

We could still make that inter-actable, but then instead of PIXI handling the hit test for us we would have to do it ourselves. I’m not saying that’s hard per-se, we have a static grid of cells and so actually the hit test is easy to derive. But then we may as well always lean on that. So at this stage instead of rewriting the interactivity and figuring that out we will just make it non-interactive when we make the switch.

To really boost performance even more we could use fewer textures - to the extent that once you reach zoom checkpoints we change the number of textures used so that we can fit all the textures on the GPU at the same time. When a change to a cell colour happens we will need to regenerate the texture representing the area the cell is in, but by leveraging the same optimisations this should be fast. The real difficulty here is handling the transition between textures - we want this to be invisible to the user.

At each checkpoint when zooming out what happens is the new texture will be made up of cells that are the size of the scaled version of the previous textures cells and then the new textures zoom will be reset to 1. When zooming in the current texture will become scale 1 and the new texture’s scale will be calculated by taking the ratio of its unscaled size to the size of the current textures cells. Phew.

If you’re used to the DOM this all sounds horrible - replacing an entire tree of components with a new tree. But this is what happens anyway in a rendering engine, the previously rendered frame is often thrown away and the new frame rendered. So while there’s a cost to setting up the new textures it will still be drastically cheaper than trying to render hundreds or thousands of individual textures each frame.

We could calculate these checkpoints dynamically based on the size of the game area.

This is currently all theory, I still need to implement this behaviour if required. It might not be necessary yet. I have a working game experience, the more interesting bit in the project is how to get the artwork. That’s what I will be looking at next. There’s lots of choices, but I’m looking for something that will run in the cloud, probably as an AWS lambda or Azure Function. The artwork transformation pipeline is after all very functional - take a source, apply some transformations, get the result and then store that somewhere. I’ll definitely need a database, perhaps a document store depending on the output of the transformations and again that will be cloud - the end goal is to have this game backed by a “serverless” architecture. That will keep it relatively cheap to run.