Displaying Animations in OpenGL on iOS using Bink and Shaders
I’ve discussed before the issues with fluid 2D animation in an iOS game before. Memory, load time, bundle size, and hardware limitations all contribute to making traditional sprite sheet animation on iOS difficult. We were able to just get by with them in the early days of MinoMonsters. Then the evolutions happened.
Suddenly we needed more space for our monsters. With the hard limits presented by sprite sheet animation, we couldn’t fit more than half of an evolution animation on a full size texture sheet. The device simply could not bear the memory load required for monsters of this size. A different approach was needed.
One Word: Video
Each animation is taking up too much memory. We are holding every frame of the animation in memory for the entire duration although only one frame is visible at a time. The contents of a single frame make up only 3% of the total texture sheet area. That is 97% wasted memory. How can we cut down on this waste?
There has been a lot of engineering effort invested making computers capable of playing video. What does video actually mean from a software perspective? Video playing pipelines work something like this (my understanding of it at least, I’m probably wrong):
This process occurs each frame, the next frame is pulled from the video file and decompressed, this often involves applying diffs pulled from the video file to the previous frame, the video data is then in a state of several images that represent the different channels of the final frame (i.e. RGB). The color planes are then recombined using either built in hardware or in software. The resulting image is then sent to the display. This process repeats at the framerate of the video, producing a smooth image. The advantage of a system like this is that each frame is pulled from the video file on demand. The only memory required for video decompression are few buffers to hold the color planes and a destination buffer for the final composite. More on this later.
A video codec is not something that you can just whip up from scratch, at least not in a startup environment. The system that compresses, stores, and decompresses the image data must be heavily engineered in order to remain efficient yet small in file size. This was not something we could do on our own.
There are tons of video codecs out there. Apple’s iOS APIs support about 20 of them out of the box. iOS even has many built in facilities that take advantage of the h.264 hardware on the phone for fast video decompression. Unfortunately we ran into a major issue with most of these formats, the lack of alpha support. Most video codecs were never designed with transparency in mind but MinoMonsters requires it for compositing monsters with the background environment. I then recalled something I’d seen on the back of retail game boxes and on splash screens before:
Bink Video. What is this mysterious framework who’s brand I recognized instantly? After reading the marketing copy it looked very much like our solution. A self contained video codec that supports video with alpha, optimized for games AND they had just released an iOS versions. Yes, please.
What Bink Does
Bink is made up of a proprietary video format and codec. The codec can be embedded directly into your application via a static lib, avoiding any dependencies on iOS versions. The C framework then gives you facilities for opening these files, decompressing them, compositing them on the CPU and some examples on how to get the final products on to the screen (which is very platform dependent). After verifying the resultant video files would be small enough, we proceeded with implementing a new rendering system using Bink.
The Bink Pipeline
Our first attempt at rendering with Bink looked something like this:
Many video frameworks don’t maintain the video in RGBA channels instead opting for a lesser know Y cR cB A representation. This representation consists of a luminance channel, two color channels, and an alpha channel. The reasoning behind this being the human eye can not perceive compression in the color channels as easily as it can in the luminance channel. This allows the 2 color channels to be compressed more heavily and even to be stored at a smaller resolution then the final video without sacrificing quality.
The first step in rendering a Bink video is decompressing these color planes. Below are examples of the separated color planes for a frame of a MinoMonsters animation:
These images then have to be recombined in order to produce the final resultant frame. The simplest way of doing this is by combining them using built in Bink methods which do the compositing on the CPU (utilizing some of the vector hardware in the iPhone processor):
Following CPU composition this image then has to be uploaded to the GPU for display in our OpenGL context, no problem.
Well, there was one problem. This approach was slow. We were now asking the CPU to decompress a video file and then to recombine it, pixel by pixel, before uploading it to the GPU. The CPU based approach is plenty powerful if you are only displaying a single video. In MinoMonsters we are always playing at least 2 videos and sometimes 4. We also started to run up against the limits of GPU bandwidth as we were uploading a full texture every frame. But there was another way.
Loving the GPU
The GPU is built for this kind of work, taking data buffers and operating on them in parallel. Using the GPU to recombine the image, the pipeline becomes slightly modified:
The key difference here being: We upload each color plane to the GPU as a separate texture. Then, through the powers of OpenGL, we recombine the frames into a render texture which is already on the GPU. We are then instantly ready to display this frame.
YcRcBA Compositing in OpenGL
Recombining the color planes in GLSL is fairly straight forward process. It can be entirely represented with a matrix multiplication and a post bias addition. After several iterations, I found this to be the most efficient method of transforming the color vector in GLSL on iOS:
Each color component of the YcRcBA vector is pulled from its individual texture which have been uploaded to the GPU straight from Bink. The components are then combined into a vector and transformed. The alpha component does not need to be processed and goes straight to the resultant fragment. The fragment color is then passed down the rendering pipeline to eventually be ‘rendered’ to a framebuffer bound to a texture. This allows us to use the result in rendering without a download and upload step, saving considerable amounts of time.
In my previous post I explored the memory costs of using traditional sprite sheet animations. An animation in that system typically ran upwards of 16 MB of memory. We can compare that to Bink.
Let’s examine a worse case scenario: an image that takes up the entire screen (as some of our attacks do!). The Video file itself is fairly small, on average of about 500KB. We open the entire video file into memory to reduce filesystem read times. The next memory budget item is the buffers Bink requires to do decompression. Bink requires 2 buffers for each color plane due to the nature of the Bink decompression algorithm. The luminance (Y) and alpha planes are 960x640 and 8 bits per pixel, for 614k each. The color planes are only half the resolution of the luminance and alpha planes, 480x320 at 8 bits per pixel, for 153k each. This is a total of 1.5 MBs per frame buffer. Bink requires 2, so that brings us up to 3 MB. 3MB is great but we aren’t quite done yet. Each animation has 4 associated textures on the GPU for uploading the color planes. These make up another 1.5 MB. The final memory requirement is the destination texture, at 960*640 and 32 BPP this knocks us back another 2.4 MBs. This brings us to a grand total of 7.5 MBs total memory for a full screen animation. This is the absolute worst case scenario and we’ve achieved a 56% reduction in memory usage!
This sizable reduction was enough to eliminate memory crashes for the vast majority of our users. I am convinced that it played a big part in pushing our app to the 5 star threshold.
All these savings aren’t free, aside from the Bink licensing fee (which was not bad), we have taken a lot of load off the systems memory and pushed it onto the CPU and GPU. This did lead to a marked decrease in frame rate on the iPhone 4 and 4th generation iPod. Is it still playable? Absolutely, but you can feel it. We squeezed as much performance as we could out of these devices but we just aren’t able to push back up to that 60 FPS gold standard. Our compromise was to lower the target framerate to 30 FPS. This gives us smoother overall framerate which, I think, is preferable. The iPhone 4S has no problem keeping which gives me faith that as new hardware is released, frame rates will once again reach 60 FPS.
Implementing a Bink based rendering system was no small feat for me and our team. My personal understanding of OpenGL was basically zero before embarking on this journey. Many concepts had to be learned from first principles (read: the red book). The going was slow but we emerged out the other side with something that solves our problems. This is what we do as software engineers. Sometimes there is a better way. You just have to figure it out.
This project was one of the most ambitious engineering efforts I’ve ever done. It would not have been possible if not for the rest of the MinoMonsters team, for clearing the road for me, letting me dive into the bowels of the OpenGL stack. Also a big thanks to the team at Bink, they were nothing but helpful and responsive to my questions during the integration process.