Friday, September 28, 2007

Text Rendering

Moonlight has some fairly unique text rendering requirements that I've not seen done anywhere else in Linux, Gtk+ application or no. Most applications stick to rendering text horizontally or vertically, they rarely, if ever, perform unusual matrix transformations on text (e.g. rotations).

When I first implemented text rendering for Moonlight, I used the obvious choice: Pango, using the Cairo backend (since we're using cairo in Moonlight for 2D graphics rendering anyway).

Unfortunately, we ran into some problems...

The first major problem is that as we applied rotation transforms and called pango_cairo_show_layout(), we'd get rendering glitches in that each glyph seemed to have its own independent baseline and so each frame, glyphs appeared to jitter.

The second major problem was that calling pango_cairo_show_layout() each frame had some major performance problems.

Thirdly, there appears to be no way to tell pango to load a font face from a specific file path.

As far as the rendering performance issue, we considered caching the layout path but my discussion with Owen suggested that it would gain us nothing. That, plus the perceived difficulty of doing this (since we may have to change brushes mid-stroke), shied me away from bothering to try.

These problems led me to consider implementing our own text layout/rendering engine to see if we could solve the above problems since the pango maintainers didn't seem to know what the problems could be offhand and thus had no suggestions for us.

At first, my text layout/rendering engine only handled rendering of glyphs via bitmaps, but even so, the result was that this new layout/rendering engine was quite a bit faster than pango.

Seeing this, Chris Toshok, Larry Ewing and I started digging into pango text rendering performance problems a bit more, not quite willing to give up on pango.

Toshok noticed that pango was loading the same font over and over again each frame, so started digging into that aspect a bit and came up with a patch to pango to fix a bug where it used the entire transform matrix as part of the hash instead of just the scaling component (which is all that was needed for uniqueness).

For one of the very simple text rendering test cases we had (the text "Moonlight in 21 Days" spinning and resizing via cairo matrix transforms), Toshok's patch nearly doubled the speed of pango rendering from something like 20 to 40fps (40fps was our cap, so it may have even rendered faster).

Meanwhile, I began looking into that cairo path caching idea and discovered it wasn't nearly as complicated to implement as I had originally feared. The results were just as amazing, again doubling the performance or better, altho this was before I had applied Toshok's patch (so don't get the idea that my patch + Toshok's patch = 4x speed improvement).

Not only did my patch make a huge performance improvement, it also got rid of the glyph jittering.

Unfortunately, this still left us with problem #3 as well as a few other problems regarding layout dissimilarities between pango and Microsoft's text layout in Silverlight, so for now, it seems I needed to go back to my own text layout/rendering engine.

Once I had finished adding support for rendering glyph paths, I implemented a similar cairo_path_t caching hack for my own text rendering engine and made it possible to choose which text layout/rendering engine to use at runtime via an environment variable.

Out of curiosity, I decided to compare performance of my own text layout/rendering engine vs pango on a test case I had of several "Hello" strings each having different combinations of matrix transforms applied to them in an ongoing animation. One of the "Hello" strings was simply undergoing FontSize changes which cause each of the text layout engines to have to recalculate the layout (wrapping, etc).

The performance difference was shocking... the pango implementation (which doesn't even render the underline for one of the text strings due to a bug in my cairo_path_t caching hack? If anyone has any suggestions on how to fix this in mango.cpp, don't hesitate to poke me) only gets about 23fps while my home-rolled implementation gets 45fps.

It might be possible to improve the performance of my home-rolled implementation if I were to fix my code to use a true FT_Face LRU cache... right now it simply keeps a hash of loaded FT_Faces with a ref_count, when that ref_count hits 0, it gets removed from the hash table. This means that each frame it has to load a new FT_Face from FontConfig because the FontSize attribute changes and since it was had the only ref on that particular FT_Face, it goes away and has to be reloaded again next time it changes back to that size. Oh, and the 45 fps was with debug spew turned on showing me whenever a new font got loaded - so turning off that printf() would probably bump me up to 50fps (which is the new fps cap on my machine).

As a further test, I removed the "Hello" TextBlock that had the FontSize attribute changes each frame. The result was that both pango and my own text layout/rendering engines jumped to ~50fps.

This suggests pango's layout calculation is where the performance bottleneck is.

I guess I'll have to dig into this problem some more later... Or, even better, maybe one of the pango developers can take a peek at Moonlight's font.cpp and see if they can maybe glean some ideas from there that they can apply to pango :)

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.