Thursday, December 27, 2007

Christmas Vacation Hacks

Evolution

There's been discussion about Evolution not handling some brokenly rfc2047 encoded message headers recently, so I ported some workaround logic from GMime to Camel yesterday. I think this should turn a lot of frowns upside down when Evolution 2.22 is released. The new rfc2047 header decoder in Evolution is now more accepting than Thunderbird's decoder[1], so I hope this will satisfy everyone.

The new rfc2047 decoder will now handle the following cases:

  • encoded-words embedded inside other tokens (e.g n=?iso-8859-1?q?a=EF?=ve)
  • encoded-words with LWSP in the encoded-text section (e.g. =?iso-8859-1?q?Look ma! there's spaces!?=)
  • some encoded-word tokens with unencoded special chars in the encoded-text section (such as ? marks and other specials - obviously this can't always work if a special sequence occurrs)
  • encoded-word tokens with no LWSP between them
  • no longer aborts if the encoded-text cannot be completely converted to UTF-8 from the charset provided, instead it will attempt to find a more suitable charset to use for conversion and/or replace invalid byte sequences with a '?'
This should fix all of the more common broken-mailer cases, or at least all of the ones that are worth even bothering to try and work around.

GMime

Did some digging and discovered how to document more things using gtk-doc, so GMime's documentation has been getting improved. All of the public functions in GMime now appear in the documentation (I had forgotten to add some of the newer API additions to the gmime-sections.txt file) as well as began writing documentation for each of the doc sections (which I hadn't known how to document previously).

You can find up-to-the-minute GMime documentation here as I continue to chug away at expanding it.

While porting GMime's rfc2047 decoding logic to Evolution, I discovered some areas I could improve in GMime too.

1. After having ported my GMime decoder logic to Camel, I was curious to see how Thunderbird handled broken rfc2047 encoded headers and so had myself a peek at mozilla/netwerk/mime/src/nsMIMEHeaderParamImpl.cpp and noted that my new logic in Camel would handle everything that Thunderbird would handle and more.

I also took a peek at KMail's rfc2047 header decoding logic and discovered Evolution is now more flexible than it as well (again, Evolution now handles everything that KMail's decoder handles and more).

Saturday, December 15, 2007

35 MPG

According to an article in the NY Times, the US Senate has passed a bill to force cars, small trucks and SUVs to achieve an average 35 miles per gallon by the year 2020. Unknown to me, apparently the current bar is at 27.5 mpg for the year 2008, but I can't help but chuckle at that one - perhaps I'm misinformed, but I don't think the average SUV sold today, in 2007, achieves 27.5 mpg or anywhere close to that. Cars, maybe, but not SUVs or small trucks.

Despite my doubts that auto manufacturers will actually adhere to this new standard, lets hope that they at least improve significantly over what they can do today.

In frustrates me to no end seeing so many SUVs on the streets of Cambridge/Boston and the neighboring areas. In my own unscientific estimation, close to 40% of the non-work-related vehicles driving around the city are SUVs and I have to wonder... why?

Is there some off-road driving that needs to be done in the middle of the city that I'm simply not aware of? Maybe they need to drive across the Boston Commons to avoid traffic?

Are these people driving off to the Great Outdoors on weekends and for some reason need an SUV? Like in those SUV commercials where young couples go off and drive through mountain ranges and settle on some cliff to have a picnic while enjoying the view of nature?

If those couples really enjoy nature so much, you'd think that instead of getting an SUV, they'd have bought a (cheaper) car, driven to a parking lot near some hiking trails, and hiked through the wilderness.

If everyone bought an SUV and tore up the wilderness every weekend to "get away", there'd be nothing to get away to after a few years. Everyone would just be driving to SUV parks (kinda like trailer parks, except even trashier).

Thursday, December 6, 2007

gnome-volume-manager-2.22.0

Just released gnome-volume-manager-2.22.0 - a bit early for GNOME 2.22, but seeing as how I rarely ever hack on this project anymore, I figured I should actually release a new version (last release was 2.17.0 from presumably a year or more ago).

This new release adds integration with ConsoleKit for detecting whether the user is local and whether or not he/she is active at the console.

The other major change it includes is the ability to execute autorun.exe and autorun.inf files on cdroms so long as wine is installed on the system.

I implemented this feature toward the end of the summer because I had attempted to help my grandmother migrate from Windows95 to Linux. Unfortunately, while she was OK with the GNOME desktop itself for the most part, she wasn't comfortable using OpenOffice (she wanted Microsoft Office 97) and she wanted to use the proprietary versions of Mahjongg/Sudoku/etc games that she was used to running on her old Windows95 machine. As I began to show her how to install/run Windows software under Linux using wine, I realized that the process could be a whole lot simpler/intuitive than it was by making gnome-volume-manager auto-magically run autorun.exe and autorun.inf files under wine for her when she plopped the cd into the drive.

...and the rest is, as they say, history.

Saturday, November 10, 2007

Call of Duty 4

I just got my 40" HD widescreen LCD TV Friday morning and today I went out and bought Call of Duty 4 needing a new game to play now that I've beaten Rainbow Six: Vegas.

Wow. Just wow. This game is amazing.

Here's the trailer:

And here's some sample gameplay footage:

Update: Already beat the game... guess I'll go back now and try to get all the achievement points I missed.

Wednesday, November 7, 2007

Parsing Integers

Every once in a while, it becomes necessary to parse an integer from a string. The trick is doing so correctly and without the need to use an integer type larger than what you are parsing into (i.e. if you are trying to parse a 32bit integer, you don't want to have to use a temporary 64bit int value for checking overflow - this is especially desirable if you are trying to parse larger int values, such as a 64bit int because you won't likely have access to a 128bit int type).

The following macro correctly implements an integer parsing routine for a given number of bits:

#define STRTOINT(bits, max)                                     \
static int                                                      \
strtoint##bits (const char *in, char **inend,                   \
                int##bits##_t *retval, int *err)                \
{                                                               \
    register const char *inptr = in;                            \
    int##bits##_t val = 0;                                      \
    int digit, sign = 1;                                        \
                                                                \
    while (*inptr == ' ')                                       \
        inptr++;                                                \
                                                                \
    if (*inptr == '-') {                                        \
        sign = -1;                                              \
        inptr++;                                                \
    }                                                           \
                                                                \
    in = inptr;                                                 \
    while (*inptr >= '0' && *inptr <= '9')                      \
        digit = (*inptr - '0');                                 \
        if (val > (max / 10)) {                                 \
            *err = EOVERFLOW;                                   \
            return -1;                                          \
        } else if (val == (max / 10)) {                         \
            if (digit > (max % 10) &&                           \
                (sign > 0 || digit > ((max % 10) + 1))) {       \
                *err = EOVERFLOW;                               \
                return -1;                                      \
            }                                                   \
                                                                \
            if (sign < 0)                                       \
                val = (val * sign * 10) - digit;                \
            else                                                \
                val = (val * 10) + digit;                       \
                                                                \
            inptr++;                                            \
            if (*inptr >= '0' && *inptr <= '9') {               \
                *err = EOVERFLOW;                               \
                return -1;                                      \
            }                                                   \
                                                                \
            goto ret;                                           \
        } else {                                                \
            val = (val * 10) + digit;                           \
        }                                                       \
                                                                \
        inptr++;                                                \
    }                                                           \
                                                                \
    if (inptr == in) {                                          \
        *err = EINVAL;                                          \
        return -1;                                              \
    }                                                           \
                                                                \
    val *= sign;                                                \
                                                                \
 ret:                                                           \
    *inend = (char *) inptr;                                    \
    *retval = val;                                              \
                                                                \
    return 0;                                                   \
}

To expand this macro to parse a 32bit integer value from a string, you would do:

STRTOINT(32, 2147483647L)

The resulting function would be named strtoint32().

Why implement my own integer parsing routine instead of using libc's strtol(), you ask?

For a variety of reasons:

  • strtol() might not be available or you might want to implement an integer parsing routine in a language other than c/c++ (a variation of my routine is now used in Mono for all of the Int.Parse() methods).
  • strtol() is locale dependent which is not always a desirable trait.
  • strtol() requires annoying error checking that using my function makes trivial.
  • it's easy to extend my function macro to support integers of any bit size, e.g. 64bit integers (or someday 128bit integers).

To generate functions to parse 8, 16, 32, and 64bit integers, you would write:

STRTOINT(8, 127)
STRTOINT(16, 32767)
STRTOINT(32, 2147483647L)
STRTOINT(64, 9223372036854775807LL)

What about unsigned integers, you ask? Not to worry, I have an implementation for that as well:

#define STRTOUINT(bits, max)                                    \
static int                                                      \
strtouint##bits (const char *in, char **inend,                  \
                 uint##bits##_t *retval, int *err)              \
{                                                               \
    register const char *inptr = in;                            \
    uint##bits##_t val = 0;                                     \
    int digit;                                                  \
                                                                \
    while (*inptr == ' ')                                       \
        inptr++;                                                \
                                                                \
    in = inptr;                                                 \
    while (*inptr >= '0' && *inptr <= '9') {                    \
        digit = (*inptr - '0');                                 \
        if (val > (max / 10)) {                                 \
            *err = EOVERFLOW;                                   \
            return -1;                                          \
        } else if (val == (max / 10) && digit > (max % 10)) {   \
            *err = EOVERFLOW;                                   \
            return -1;                                          \
        } else {                                                \
            val = (val * 10) + digit;                           \
        }                                                       \
                                                                \
        inptr++;                                                \
    }                                                           \
                                                                \
    if (inptr == in) {                                          \
        *err = EINVAL;                                          \
        return -1;                                              \
    }                                                           \
                                                                \
    *inend = (char *) inptr;                                    \
    *retval = val;                                              \
                                                                \
    return 0;                                                   \
}

STRTOUINT(8, 255)
STRTOUINT(16, 65535)
STRTOUINT(32, 4294967295UL)
STRTOUINT(64, 18446744073709551615ULL)

And there you have it.

Note: these routines require [u]int[8,16,32,64]_t to be defined. If you are on Linux, #include <stdint.h> should do the trick.

Update: Miguel's blog goes into more detail on why this code was important for Mono.

Saturday, November 3, 2007

Oompa Loompa

Oompa. Loompa. Doompity dosum. Dwight is now gone which is totally awesome. -- Andy Bernard, The Office

Thursday, November 1, 2007

Re: Squandering one of the industry's best open source talents

Matt Asay, in a recent blog on CNET, tries to imply that Mono/Moonlight are a waste of Miguel's time/talents.

For these reasons I can't help but wonder why he's squandering his talents on writing largely irrelevant code (Mono, Moonlight) that appeals to himself, Novell, Microsoft, and no one else.

Really? No one else? Are you saying that the thousands of users and developers involved in the Mono community are nobodies? Are you saying that OTEE, the developers of Unity3D, are nobodies? Are you inferring that Codice Software and the growing list of companies/projects basing their software on Mono are nobodies?

Miguel says that he's doing this to bring the proprietary world into the open-source camp. It's not working, Miguel. You don't convince by capitulating. You convince by winning

Based on the growing number of users and developers getting involved with the Mono project (and recognizing that there are probably far more out there that just haven't made their presence known), I would say it is working. Because of the Mono project, more people are taking an interest in Open/Free Software, which I and many others would call "winning".

Rather, Miguel de Icaza can turn the industry on its head by putting his knowledge of interoperability and open source to work on developing the next-generation desktop (and not by recreating the "best" of Microsoft on Linux).

It's unfortunate that you are so short-sighted, Matt. That seems to be one of the more important differences between you and Miguel. Miguel sees the big picture. He knows that in order to create the "next-generation desktop", he'll need better tools in which to do it. He also sees that "winning" takes "developers, developers, developers!" He knows that you can't "win" if it takes developers on Linux 2 or 3 times as long to write an application as it takes to write on Windows with proprietary tools.

I think Miguel de Icaza is an exceptional developer. He's also a fantastically effective community leader.

All the more reason to let him continue doing what he does best.

Update: Joe Shaw makes an excellent point.

Update: It appears that Matt Asay has retracted his post and made an apology.

Tuesday, October 2, 2007

Tom Adelstein on Mono

I was just pointed at a nice article by Tom Adelstein called The Mono Project: You Might Expect the Unexpected.

It makes me happy when people publish insightful pieces like this. It's a nice change of pace from all of the lies and false propaganda that so many people are spreading without actually doing any fact checking (they make Fox News look credible).

Thanks Tom.

Friday, September 28, 2007

Text Rendering

Moonlight has some fairly unique text rendering requirements that I've not seen done anywhere else in Linux, Gtk+ application or no. Most applications stick to rendering text horizontally or vertically, they rarely, if ever, perform unusual matrix transformations on text (e.g. rotations).

When I first implemented text rendering for Moonlight, I used the obvious choice: Pango, using the Cairo backend (since we're using cairo in Moonlight for 2D graphics rendering anyway).

Unfortunately, we ran into some problems...

The first major problem is that as we applied rotation transforms and called pango_cairo_show_layout(), we'd get rendering glitches in that each glyph seemed to have its own independent baseline and so each frame, glyphs appeared to jitter.

The second major problem was that calling pango_cairo_show_layout() each frame had some major performance problems.

Thirdly, there appears to be no way to tell pango to load a font face from a specific file path.

As far as the rendering performance issue, we considered caching the layout path but my discussion with Owen suggested that it would gain us nothing. That, plus the perceived difficulty of doing this (since we may have to change brushes mid-stroke), shied me away from bothering to try.

These problems led me to consider implementing our own text layout/rendering engine to see if we could solve the above problems since the pango maintainers didn't seem to know what the problems could be offhand and thus had no suggestions for us.

At first, my text layout/rendering engine only handled rendering of glyphs via bitmaps, but even so, the result was that this new layout/rendering engine was quite a bit faster than pango.

Seeing this, Chris Toshok, Larry Ewing and I started digging into pango text rendering performance problems a bit more, not quite willing to give up on pango.

Toshok noticed that pango was loading the same font over and over again each frame, so started digging into that aspect a bit and came up with a patch to pango to fix a bug where it used the entire transform matrix as part of the hash instead of just the scaling component (which is all that was needed for uniqueness).

For one of the very simple text rendering test cases we had (the text "Moonlight in 21 Days" spinning and resizing via cairo matrix transforms), Toshok's patch nearly doubled the speed of pango rendering from something like 20 to 40fps (40fps was our cap, so it may have even rendered faster).

Meanwhile, I began looking into that cairo path caching idea and discovered it wasn't nearly as complicated to implement as I had originally feared. The results were just as amazing, again doubling the performance or better, altho this was before I had applied Toshok's patch (so don't get the idea that my patch + Toshok's patch = 4x speed improvement).

Not only did my patch make a huge performance improvement, it also got rid of the glyph jittering.

Unfortunately, this still left us with problem #3 as well as a few other problems regarding layout dissimilarities between pango and Microsoft's text layout in Silverlight, so for now, it seems I needed to go back to my own text layout/rendering engine.

Once I had finished adding support for rendering glyph paths, I implemented a similar cairo_path_t caching hack for my own text rendering engine and made it possible to choose which text layout/rendering engine to use at runtime via an environment variable.

Out of curiosity, I decided to compare performance of my own text layout/rendering engine vs pango on a test case I had of several "Hello" strings each having different combinations of matrix transforms applied to them in an ongoing animation. One of the "Hello" strings was simply undergoing FontSize changes which cause each of the text layout engines to have to recalculate the layout (wrapping, etc).

The performance difference was shocking... the pango implementation (which doesn't even render the underline for one of the text strings due to a bug in my cairo_path_t caching hack? If anyone has any suggestions on how to fix this in mango.cpp, don't hesitate to poke me) only gets about 23fps while my home-rolled implementation gets 45fps.

It might be possible to improve the performance of my home-rolled implementation if I were to fix my code to use a true FT_Face LRU cache... right now it simply keeps a hash of loaded FT_Faces with a ref_count, when that ref_count hits 0, it gets removed from the hash table. This means that each frame it has to load a new FT_Face from FontConfig because the FontSize attribute changes and since it was had the only ref on that particular FT_Face, it goes away and has to be reloaded again next time it changes back to that size. Oh, and the 45 fps was with debug spew turned on showing me whenever a new font got loaded - so turning off that printf() would probably bump me up to 50fps (which is the new fps cap on my machine).

As a further test, I removed the "Hello" TextBlock that had the FontSize attribute changes each frame. The result was that both pango and my own text layout/rendering engines jumped to ~50fps.

This suggests pango's layout calculation is where the performance bottleneck is.

I guess I'll have to dig into this problem some more later... Or, even better, maybe one of the pango developers can take a peek at Moonlight's font.cpp and see if they can maybe glean some ideas from there that they can apply to pango :)

Thursday, June 21, 2007

Implementing Silverlight in 21 Days

For those of you that want to know what we're all about, it's like this ya'll: this is 10% luck, 20% skill, 15% concentrated power of will, 5% pleasure, 50% pain and 100% reason to remember the name...
  -- Fort Minor

Miguel de Icaza, Chris Toshok, Jackson Harper, Sebastien Pouliot, Everaldo Canuto, Rolf Bjarne Kvinge, Atsushi Enomoto, myself and a few other guys from the kick-ass Mono Team, here at Novell, managed to do the impossible: Implement Silverlight in just 21 days.

3 Weeks ago, Miguel sent out a Call-To-Arms email to our internal Mono Team mailing list explaining that he'd been invited to participate in Microsoft's conference in France where he'd he able to demo whatever he had accomplished as far as reimplementing a version of Silverlight for Linux. I would say that within minutes of the team reading that call to arms, I and a bunch of others replied, volunteering for the mission...

I think I speak for all of us when I say that it was a blast working on this project. I know that, for myself, even after heading home from work at around 8 or 9pm every night, I'd walk in the door, grab a quick bite to eat and immediately sit down in front of my computer and begin hacking on it again and I'd almost always find at least Jackson and Toshok on IRC hacking on it too.

I haven't been this enthusiastic about hacking in a long time (and that says a lot seeing as how I said the same thing just a few months ago when I joined the Mono Team to work on MonoDevelop). Even now, I've got ideas I want to experiment with to improve Moonlight even further. I'll probably get back to hacking on it (even though the 21 day hackathon is over) as soon as I finish posting this.

To all the people out there that congratulated us in response to reading Miguel's blog, thanks - we appreciate it. It feels good to both have accomplished something this amazing and be cheered at the same time :-)

To those of you who think this is a waste of time because similar things exist, I have this to say:

nothing ventured = nothing gained.

We, as the Free Software Community, wouldn't have all the great and wonderful things that we have today if the developers didn't "waste their time implementing alternatives." I urge you to take a moment to reflect on that.

If, after that reflection, you still wish to resist change: no one is forcing you to use Mono - you're free to remain in the stone age, living in a cave, trying to write software by scraping coal along the cave walls ;-)

That said, Rock On fellow Mono lovers!

Saturday, May 26, 2007

Great day today

I know a movie star
I’ve got her plastered to my wall.
Just like we’re dear old friends
Like she already knows me.
She’s as perfect as she seems,
Lifts me right out the mezzanine.
I finally fell in love,
I’d been waiting forever.

   -- Guster

Sunday, May 20, 2007

Saturday, May 12, 2007

Implement Interface

MonoDevelop Refactory work update:

This past week, after getting Renaming working for method parameters, I added context menu items that could implement a target interface's properties, methods, and events implicitly or explicitly auto-magically for the programmer.

You can see screenshots of these menus in an older post of mine, Refactory Operations

Friday, May 4, 2007

Godspeed, Robert Love

Today was Robert Love's last day at Novell, Inc.

Take care, Mr. Architect! Hopefully we'll continue to run into each other from time to time.

MonoDevelop Refactoring

Have you ever wanted to rename a variable in a source file and have all of the references to that variable be renamed as well? I'm sure you have. If you are a typical Linux developer, the only option you've ever had available to you would be to perform a Search & Replace action in your editor (bound to Alt-% in my Emacs editor).

...But this solution is hardly perfect. Search & Replace might match substrings you wouldn't want to replace, so you'd have to examine each substring match manually to accept the change, but this is prone to mistakes. Using a regex Search & Replace might help minimize mistakes in that you might be able to eliminate substring matches that were part of other identifiers, but you'd still have to manually examine each change if you re-used variable names in different scopes or different functions/methods within the same file.

...Which brings us to the next problem: editors historically used by programmers on Linux have never offered to update source files that depend on the current buffer, so programmers would have to manually update the other source files as well.

Welcome... to the Future.

This past week I've been working on a feature allowing MonoDevelop users to rename local variables, classes, interfaces, class members, methods, events, properties, etc and have all the appropriate code in the entire project auto-magically updated, no matter where you invoke the rename action (whether it be in the base class, inner class, some other section of code that uses the class, etc).

Just another example of Mono bringing improvements to Linux developers everywhere.

Monday, April 16, 2007

Childhood Memories

Today, Miguel asked me to take a look at fixing some of the System.Console 2.0 bugs. I managed to fix some of the ReadKey() and ReadLine() bugs, although the Backspace bug illustrated in Iron Python is still out there (it appears that the cursor X,Y position is not correctly kept track of in Mono).

Earlier today, when Miguel was explaining to me what sorts of problems existed in System.Console, one of the things he was hoping I could take a look at (although there was no bug # that I know of?) was optimizing Console.Write[Line]().

I didn't actually have time to look at that until tonight when I was sitting at my home computer waiting for my dinner to finish cooking. The solution was fairly simple (for those interested in seeing the patch, check out revision 75806).

An important part of optimizing a section of code is to compare actual running times of the old code vs the new code (and, obviously, to check that the results are correct), so I wrote a simple program that could be measured for performance improvements:

using System;

public class Program {
    static void Main () {
        Console.ForegroundColor = ConsoleColor.Cyan;
        string abc = new String ('a', 100000);
        Console.WriteLine (abc);
    }
}

Using the system `time' command, I got the following times:

pre optimization:

real 0m13.177s
user 0m0.308s
sys 0m0.232s

post optimization:

real 0m0.238s
user 0m0.124s
sys 0m0.004s

Wowzers!

Anyways, the reason the title of this blog is "Childhood Memories" is because after running this test program, I couldn't help but remember my first assembler program for the 6502 that I wrote back when I was 8 years old - it was a program which filled the screen with the character 'A' (which I then compared to a program written in BASIC that did the same thing). The difference in speed here was about the same as it was back then, too, funnily enough :)

Update: This morning I woke up realizing that I had a bug in my optimization patch last night, but I had a fix that lost no performance and then later today (in part thanks to Alan's prompting me to re-evaluate my need for a temporary buffer, which truly became unnecessary after my fix this morning) was able to optimize it even further (and eliminated the need for a temporary buffer) by blitting chunks of the input buffer between special escape sequences at a time (we have to handle certain escape sequences specially as they can relocate the cursor).

Oh, and the Iron Python bug is now solved... it was actually arguably a bug in Iron Python in that it goes behind Mono's back when writing to stdout so it was impossible for us to keep track of the cursor position. They do, however, use Console.In.ReadLine() (would be better if they simply used Console.ReadLine() but I digress), and so what I did was make ReadLine() query the terminal for the cursor position (rather than rely on our own state).

Friday, April 13, 2007

Refactory Operations

I think I'm finally finished hacking on the Navigation History feature in MonoDevelop, which started off as more-or-less a direct rip-off of the one in SharpDevelop, but quickly began incorporating improvement ideas from Miguel, Lluis, and myself.

Now on to bigger and more exciting things... namely, refactory features.

A few weeks ago I had taken a look at some of Microsoft's Visual Studio 2005's and SharpDevelop's context-menu-based refactoring functions and have just stubbed a few of them out in current MonoDevelop svn (added the Rename and Implement Interface items):

In the above screenshot, you can see the menu we get when right-clicking on the name of an interface in a class definition.

Here, I right-clicked on the name of the class in the class definition.

Going forward, I'll be looking at the refactoring functionality that Eclipse provides along with Intellij's, ReSharper's and Refactor Pro's features.

So far I've really only had time to watch the Refactor Pro demos. They are pretty awesome, but I think the UI bits (at least) are a bit over the top. Certainly not going to be easy to implement. I'll probably be looking at easier approaches.

Thursday, April 12, 2007

GMime is damn fast

Came across this in the zsh archives:

I've been using the stand-alone:
http://www.fourmilab.ch/webtools/base64/

I never realised I already had it in openssl.
Thanks for the tip.


This led me to see what else I might already have or could get in a ready made package rather than having to use a source. (I do need this util in some form on every box and previously all my boxes were SCO OpenServer and I had been using the fourmilab source for years.)

Then I decided to run a few simple time tests, since this gets used a lot in various system() commands in application code and cgi scripts etc...

I expected the simple plain base64 util to be the fastest but I was wrong:

In each case I repeated the same command many times and always got almost exactly the same results, that is, +- 0m0.004 of the numbers shown so these are representative not just a fluke or caching effects or effects of the os being busy elsewhere.

the fourmilab source:
nj4:~ # time base64 </boot/vmlinuz >/dev/null

real    0m0.074s
user    0m0.072s
sys     0m0.000s

openssl:
nj4:~ # time opsnssl base64 </boot/vmlinuz >/dev/null
-bash: opsnssl: command not found

real    0m0.001s
user    0m0.000s
sys     0m0.000s

mimencode comes in the metamail package:
nj4:~ # time mimencode </boot/vmlinuz >/dev/null

real    0m0.078s
user    0m0.076s
sys     0m0.004s


gmime-uuencode:

It actually does base64 not uuencode if you give it the -m or --base64 option.

It comes in the gmime package.

It prepends and appends some junk to the actual base64 output so it's inconvenient for me to actually use:

   nj4:~ # echo this is a test |gmime-uuencode -m -
   begin-base64 600 -
   dGhpcyBpcyBhIHRlc3QK
   ====
   nj4:~ #
As for the speed:
nj4:~ # time gmime-uuencode -m - </boot/vmlinuz >/dev/null

real    0m0.009s
user    0m0.008s
sys     0m0.000s

Ooenssl destroys the rest.

So even though I have the dedicated util it's actually better to use openssl.

As a side benefit, thats one less special thing to maintain on all my boxes.

The funny thing here is that `opsnssl` doesn't even exist, so his performance comparison is kinda... well, non-existent. It's still interesting, though, because `gmime-uuencode -m` can encode the file almost as fast as it takes the system to realize that opsnssl doesn't exist!

I find it amusing that at the end of his comparison section, he spells OpenSSL wrong yet again ("Ooenssl").

Good stuff ;-)

Anyways... we've probably all made goofs like this. I mostly found it amusing because of how awesome GMime is :p

Saturday, April 7, 2007

New Bike

Just got back from picking up my new bike... got a Specialized Allez Elite Double:

Spent the extra money for rim upgrades as well (Roval Pavel SL's).

Monday, April 2, 2007

Another One Bites the Dust

Woke up this morning, went to get some breakfast in the kitchen and then heard this really odd whirring sound. At first I thought it was coming from outside, but as I started moving toward my window I passed by my desk and could definitely tell it was coming from my laptop. Sure enough, my laptop's hard drive was the culprit, so I immediately turned the computer off in the hope that I might be able to save my data later if I shut it down soon enough.

I've known I was going to need a new laptop sometime this year for a while (it's over 3 years old now), but I was hoping that I'd have gotten a new laptop before this one died so suddenly so that I could avoid losing all my data. Oh well...

Now I gotta make up my mind, do I want to go for a Lenovo T60 (my current one was a an IBM T40)? Or do I want to try a Macbook?

Sunday, March 25, 2007

GMime Test Suite

Spent my weekend rewriting GMime's test-suite to be nearly fully automated. I say "nearly" because it's not completely done yet.

I've currently got fully automated tests for streams (concat stream has its very own), iconv utility functions, address parser, rfc2047/rfc2184 header encoders/decoders, quoted-string utilities, mbox parser, gpg context, and the pgp/mime utility classes.

The stream (including concat) tests generate their own test-case data and most of the others have built-in test data, however the mbox parser and pgp tests require manually created test data (the pgp tests just need pre-generated keys).

I could probably include the pre-generated gpg pub/private keys in the tarball distribution, but the test data for the mbox parser is around 11 megs, so I'm not sure how to approach that - I suppose I could tarball it up separately and upload it somewhere and upload a new version whenever I need to change it (or add more test data?).

I was kind of surprised that the only MIME parser test-suite messages I could find out there on the internet was the bundle of messages I used to host on my personal Ximian website (many of which I think I got from jwz?), put together for Camel a number of years ago (which is what I've been using to regression test GMime as well - only, until now, it hadn't been fully automated).

Do authors of other MIME parsers out there simply not have test suites for their parsers?

On a related note, I've just released GMime 2.2.5 with a few fixes (including some fixes for bugs I found only via the new automated test suite).

Wednesday, March 21, 2007

Will it Blend?

These are, apparently, part of Novell's new Linux ad campaigne. I found them pretty amusing...

mac_pc_linux.mpg
mac_pc_linux.ogg

mac_pc_linux_2.mpg
mac_pc_linux_2.ogg

mac_pc_linux_3.mpg
mac_pc_linux_3.ogg

Although... blending your end users? I'm not so sure... ;-)

You can find more of these videos at http://www.novell.com/video

Monday, March 19, 2007

DemoLib

Started hacking on a new project this weekend... just a little something to break the monotony.

The idea of DemoLib is to make my life simpler in those rare moments when I get inspired enough to write a 3d graphics demo. Just so I don't ever again have the excuse of not having all the proper utility classes and/or framework for writing a demo, I've gone and written those foundation classes/interfaces. I now have reasonable Vertex, Vector and Camera classes... along with a "Scene" interface and a default "Demo" class which plays the scenes, trying to keep a reasonable frame rate by slowing down or dropping frames, depending on the speed of rendering on the system.

I've also implemented 2 example Scene subclasses (neither take advantage of my Camera class yet... but that's because they were written before I had implemented it... they were both also precursors to my Vertex and Vector classes, but have been mostly ported to using those classes now).

Here are some token screenshots:

It's all in c#, because, well, why not? For those interested in watching my progress (or for laughing at the 3d graphics programming noob), I've checked DemoLib into Mono's subversion repository, so... `svn co` away.

Saturday, March 10, 2007

Indenting case contents

On Friday, I decided to start implementing some customizable settings for my Smart Indent feature in MonoDevelop. I started off easy by adding the ability to choose how you'd like to indent (or not) your goto labels:



When I had first implemented Smart Indent, I did a lot of mimicking of Emacs because that was my choice in editors. Having programmed in (nay, lived in) Emacs for going on a decade, I did with goto labels what Emacs did (in C), which is to say that I put them in column 2 (one space over from fully left-justified).

However, when I was rewriting my original implementation in C# using Emacs csharp-mode, I decided that when editing C# code, I much preferred the way csharp-mode unindented the goto label by only a single level. At first I was simply going to change the behavior but I had remembered that Visual Studio 2005 had an option for this, so I decided to implement the 3 options it had instead, defaulting, of course, to the same style as csharp-mode.

Today I implemented something a little more useful, a config option for case label indentation. While I was looking at Visual Studio's C# Formatting options, I discovered they had a "Indent case contents" option, so I figured I'd have a go at implementing that as well. After getting it to mostly work, I decided to test how exactly the option was handled in some situations in Visual Studio. Turns out, the option is completely ignored - no matter what the toggle state of the option was, it always indented case contents exactly the same. After discovering this, thinking it was a rather absurd option anyway, I decided to just drop it.

Tuesday, March 6, 2007

MonoDevelop Smart Indent

Since I joined the Mono team about 2 weeks ago, I've been working on a developer suite, MonoDevelop.

Anyways, for the past week or so I've been working on implementing "Smart Indent" for the editor to make my (and, likely, your) life easier when it comes to pumping out code. The less time I have to spending formatting my code, the more code I can write and the less frustrated and distracted I get having to fix the formatting. Net result? A happier programmer and more bang for the buck, which is what .NET is all about in the first place, right? ;)

So tonight, if you update your local MonoDevelop svn repository and build the latest revision, you'll be able to test out my first (of many?) gift to you, Smart Indent. May it put a smile on your face and make your dreams come true.

For those interested in how I did it, read on...

I started out taking a look at the old code. The way it worked was that effectively every time the programmer typed <Return>, it would take a peek at the previous line or two in order to get enough context to make a decision on how much to indent the code. Unfortunately, one or two lines is rarely ever enough context to really tell what is going on, so it was really difficult to get indenting right that way. I needed more context... way more context. Ideally, we'd also be able to readjust indenting on-the-fly as the user typed if he typed something that would change the indent level for that line (think: user types a lone '}' on a line to finish off a block).

To do this The Right Way(tm), I would need to keep a parse tree handy and keep it updated as the user typed, constantly checking state to see if the current line indent level needed to be altered. So that's effectively what I did... I wrote a state machine that kept a rough parse tree handy at all times (see Extras/CSharpBinding/FormattingStrategy/CSharpIndentEngine.cs for the code). It's somewhat crude, but it works (although I'm sure it will need some tweaking here and there). In order to keep the parse tree up-to-date, I hooked it up to the editor's KeyPress() event and that's where I did my tinkering with the indent level as well (after querying the CSharpIndentEngine state).

So there you have it...

Saturday, March 3, 2007

Quick Sort

Like Merge Sort, Quick Sort promises O(n lg n) time in the average case, and, like Merge Sort, is also a recursive algorithm. However, it is important to be aware that Quick Sort is O(n2) worst case, where, ironically, the input is already mostly sorted in either direction.

The way Quick Sort works is that it first chooses a pivot point (in our case, we'll just choose the middle element) with which to partition the working sequence into two partitions. Next, it puts all items with values smaller than the pivot value into the first partition and all items with values greater than the pivot value into the second partition. Recursively repeat the process on each partition.

Now that we understand how things are supposed to work, lets implement our QuickSort function (keeping our same call signature from previous sorting posts):

#define MID(lo, hi) (lo + ((hi - lo) >> 1))

#define SWAP(a, b) tmp = array[a], array[a] = array[b], array[b] = tmp

static void
QuickSortR (int *array, size_t lo, size_t hi)
{
    register size_t i, k;
    register int pivot;
    int tmp;
    
    /* make sure we have at least 2 elements */
    if ((hi - lo) < 1)
        return;
    
    /* cache our pivot value */
    pivot = array[MID (lo, hi)];
    
    i = lo;
    k = hi;
    
    do {
        /* find the first element with a value >= pivot value */
        while (i < k && array[i] < pivot)
            i++;
        
        /* find the last element with a value <= pivot value */
        while (k > i && array[k] > pivot)
            k--;
        
        if (i <= k) {
            SWAP (i, k);
            i++;
            k--;
        } else {
            break;
        }
    } while (1);
    
    if (lo < k) {
        /* recursively sort the first partition */
        QuickSortR (array, lo, k);
    }
    
    if (i < hi) {
        /* recursively sort the second partition */
        QuickSortR (array, i, hi);
    }
}

void
QuickSort (int *array, int n)
{
    QuickSortR (array, 0, n - 1);
}

Before we go any further, lets see how this compares to our Merge Sort implementation from earlier.

Since I no longer have the same machine that I timed Merge Sort with, I had to re-time it on my laptop (which is where I've been doing this since I can hack while laying back on my comfortable couch). My laptop is an IBM T40 ThinkPad which sports an Intel Pentium M Processor rated at 1600 MHz.

Since I left off sorting 10,000,000 items in the Merge Sort post, I figured I'd use the same array size here.

For Merge Sort, I'm consistently getting around 6.96s. For our Quick Sort implementation, I'm getting about 5.40s.

It would be nice if we could get rid of the function call overhead of recursion like we did with Binary Insertion Sort.

We're going to have to get crafty in order to work around the need to make our function recursive. What we can do is keep our own stack, but growing it dynamically could potentially kill any performance gains we could hope for... so, lets see what Donald Knuth has to say about the mathematical properties of this sort algorithm.

An auxiliary stack with at most [lg N] entries is needed for temporary storage.

As it happens, log2 of the max unsigned value of any integer type is the number of bits in said integer type. This means that on my system, where integers are 32bit, I'll need a stack size of 32.

What do we store on our stack? Well, what variables do we need for QuickSortR()? We need array, high, and low. If we're going to make QuickSort() non-recursive, then that means we'll always have access to array which means the only variables we need to hold on our stack are high and low.

So here it is, your Moment of Zen:

typedef struct qstack {
    size_t lo, hi;
} qstack_t;

void
QuickSort (int *array, size_t n)
{
    qstack_t stack[32], *sp;
    register size_t i, k;
    register int pivot;
    size_t lo, hi;
    int tmp;
    
    /* initialize our stack */
    sp = stack;
    sp->hi = n - 1;
    sp->lo = 0;
    sp++;
    
    do {
        /* pop our lo and hi indexes off the stack */
        sp--;
        lo = sp->lo;
        hi = sp->hi;
        
        if ((hi - lo) < 1)
            continue;
        
        /* cache our pivot value */
        pivot = array[MID (lo, hi)];
        
        i = lo;
        k = hi;
        
        do {
            /* find the first element with a value >= pivot value */
            while (i < k && array[i] < pivot)
                i++;
            
            /* find the last element with a value <= pivot value */
            while (k > i && array[k] > pivot)
                k--;
            
            if (i <= k) {
                SWAP (i, k);
                i++;
                k--;
            } else {
                break;
            }
        } while (1);
        
        if (lo < k) {
            /* push the first partition onto our stack */
            sp->lo = lo;
            sp->hi = k;
            sp++;
        }
        
        if (i < hi) {
            /* push the second partition onto our stack */
            sp->lo = i;
            sp->hi = hi;
            sp++;
        }
    } while (sp > stack);
}

Once again, we see a slight improvement in execution time: 5.10s. Didn't seem to help all that much, but was worth giving it a shot. Might be more beneficial on non-x86 platforms, though...

As I mentioned in the opening paragraph, one of the major problems with Quick Sort is that for already sorted (or nearly sorted) inputs, Quick Sort hits its pathological corner case and becomes extremely inefficient compared to other sorts. One of the more popular solutions to this problem (as used by most libc qsort() implementations) is to fall back to Insertion Sort once an input segment is small enough. I'll leave this up to the reader to consider ways of implementing this approach.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

25 Byte Sort Routine

On the topic of sorting, I came across something cool the other day in Michael Abrash's Graphics Programming Black Book that I thought I'd share: a 25-byte long integer sort routine (that is to say, the routine itself is only 25 bytes long).

Here it is, your Moment of Zen:

;---------------------------------------------------------------
; Sorts an array of ints. C-callable (small model). 25 bytes.
; void sort (int n, int a[]);
;
; Courtesy of David Stafford.
;---------------------------------------------------------------

      .model small
      .code
        public _sort

top:    mov     dx,[bx]     ; swap two adjacent ints
        xchg    dx,[bx+2]
        xchg    dx,[bx]

        cmp     dx,[bx]     ; did we put them in the right order?
        jl      top         ; no, swap them back

        inc     bx          ; go to the next integer
        inc     bx
        loop    top

_sort:  pop     dx          ; get return address (entry point)
        pop     cx          ; get count
        pop     bx          ; get pointer
        push    bx          ; restore pointer
        dec     cx          ; decrement count
        push    cx          ; save count
        push    dx          ; save return address
        jg      top         ; if cx > 0

        ret

      end

Sunday, February 25, 2007

Merge Sort

Note: This is a repost from an old weblog.

Like Binary Insertion Sort, Merge Sort is a recursive algorithm, but Merge Sort is O(n lg n) whereas Binary Insertion Sort had been O(n2). It works by splitting up the set into halves until each set is down to 1 item. It then merges them back together again in sorted order. It's a very simple concept although a bit hard to explain in words. Allow me to illustrate instead:

Given: [6 8 1 3 7 2 5 4]

The first thing we do is split this set in half:

[6 8 1 3] [7 2 5 4]

We keep splitting in half until we can't split anymore:

[6 8] [1 3] [7 2] [5 4]

[6] [8] [1] [3] [7] [2] [5] [4]

We now merge the items back into their previous sets in sorted order:

[6 8] [1 3] [2 7] [4 5]

And again:

[1 3 6 8] [2 4 5 7]

And again:

[1 2 3 4 5 6 7 8]

Not so bad, right? So how do we implement this? Well, breaking the sets into halves is fairly straight forward so I won't bother explaining that. The “hard” part is the merging, and even that's not so hard. Let me illustrate step by step how we merged 2 sets back into a single set:

set #1: [2 7]
set #2: [4 5]

We start by comparing the first item in each set. Is 2 less than 4? Yes, so we put 2 into our “merged” set:

set #1: [7]
set #2: [4 5]
merged: [2]

Now compare 7 and 4. Since 4 is less than 7, we append 4 into our merged set:

set #1: [7]
set #2: [5]
merged: [2 4]

Now compare 7 and 5. Since 5 is less than 7, we append 5 into our merged set:

set #1: [7]
set #2: []
merged: [2 4 5]

Since set #2 is out of items, we just append the remainder of set #1 into our merged set:

merged: [2 4 5 7]

Tada!

As you can probably see, it looks like we'll need a temporary array to hold our merged set as we merge the 2 subsets back together. If we want to keep to the same API we've been using for the previous sorting routines, we'll need a second function that will do most of the work. Since we know we'll never need our temporary array to be larger than the original input array, we might as well create it in our main function and pass it off to our recursive worker function:

void
MergeSort (int a[], int n)
{
    int *b;

    if (!n || !(b = malloc (sizeof (int) * n)))
        return;

    msort (a, b, 0, n - 1);

    free (b);
}

As you can see here, I called my worker function msort(), which takes as input the original array, the temporary array (b[]), the offset of the first item, and the offset of the last item. Now let's take a look at msort()'s implementation:

static void
msort (int a[], int b[], int lo, int hi)
{
    register int h, i, l;
    int mid;

    if (lo >= hi)
        return;

    mid = lo + ((hi - lo) / 2);
    msort (a, b, lo, mid);
    msort (a, b, mid + 1, hi);

    for (i = 0, l = lo, h = mid + 1; l <= mid && h <= hi; i++) {
        if (a[l] <= a[h])
            b[i] = a[l++];
        else
            b[i] = a[h++];
    }

    while (l <= mid)
        b[i++] = a[l++];
    while (h <= hi)
        b[i++] = a[h++];

    memcpy (a + lo, b, sizeof (int) * i);
}

You'll notice that the first thing we do is find the midpoint of our set starting with lo and ending with hi. We then call msort() on each of the 2 resulting subsets (lo to mid, and mid + 1 to hi). After msort() is finished with each of those subsets, the 2 subsets will be in sorted order and will be stored back in array a[] at positions lo through mid and mid + 1 through hi.

We then need to merge the 2 subsets back together again into a sorted set starting at position lo and ending with position hi. We'll use i to represent the next position to add an item into our temporary array, b[]. We'll use l and h to represent the cursor for each of the subsets (l for low and h for high). Notice that I check a[l] <= a[h] rather than a[l] > a[h], this is so that items with the same value in the array are in the same order they were in originally. For integer arrays, this is no big deal – but it is often a preferred feature when sorting arrays of custom data. Sorting algorithms that have this property are called “stable sorts”.

The for-loop merges 1 item from one of the 2 subsets into the temporary array until one of the subsets becomes empty. At this point, one of the 2 while-loops will be executed, appending the remainder of the non-empty set to the merged set.

Now that we've merged the 2 subsets back together into our temporary array, b[], we need to copy the result back into a[] at the subrange we started with.

And there you have it.

Since I'm sure you're as anxious to find out how fast this is compared to our previous sorts for sorting 100,000 items as I am, let's give it a whirl and find out. Wow, that was fast – it averaged 0.044s for sorting that 100,000 item array of random integers. That's the fastest time yet! Now that we have such a fast algorithm for sorting, we'll need to bump up our array size in order to compare further sorting algorithms. Let's go for 10,000,000. This time I get 6.344s which is still faster than most of our previous sort algorithms at 100,000 items!

The first optimization I see is that we could replace those while-loops with memcpy()'s. Turns out, that doesn't really change the results for our particular input (it might make a difference if the majority of values of one set or the other was larger than those in the other, but this wouldn't be the case for our input).

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Update: I've written a follow-up article on Optimizing Merge Sort that may be of interest as well. This follow-up article deals with optimizing a general-purpose Merge Sort implementation to be on par with the libc qsort() function.

Shell Sort

Note: This is a repost from an old weblog.

The Shell Sort algorithm is designed to move items over large distances each iteration. The idea behind this is that it will get each item closer to its final destination quicker saving a lot of shuffling by comparing items farther apart.

The way it works is it subdivides the dataset into smaller groups where each item in the group is a set distance apart. For example, if we use h to represent our distance and R to represent an item, we might have groups: { R1, Rh+1, R2h+1, ... }, { R2, Rh+2, R2h+2, ... }, ...

We then sort each subgroup individually.

Keep repeating the above process, continually reducing h, until h becomes 1. After one last run-through where h is a value of 1, we stop.

At this point, I'm sure you are wondering where h comes from and what values to reduce it by each time though. If and when you ever figure it out, let me know - you might also want to publish a paper and submit it to ACM so that your name might go down in the history (and algorithm) books. That's right, as far as I'm aware, no one knows the answer to this question, except, perhaps, God Himself :-)

If you are interested, you might take a look at Donald Knuth's book, Art of Computer Programming, Volume 3: Sorting and Searching (starting on page 83), for some mathematical discussion on the matter.

As far as I understand from Sorting and Searching, it is theoretically possible to get the Shell Sort algorithm to approach O(n1.5) given an ideal increment table which is quite impressive.

Knuth gives us a couple of tables to start with: [8 4 2 1] and [7 5 3 1] which seem to work okay, but are far from being ideal for our 100,000 item array that we are trying to sort in this exercise, however, for the sake of keeping our first implementation simple, we'll use the [7 5 3 1] table since it has the charming property where each increment size is 2 smaller than the previous. Yay for simplicity.

void
ShellSort (int a[], int n)
{
    int h, i, j;
    int tmp;

    for (h = 7; h >= 1; h -= 2) {
        for (i = h; i < n; i++) {
            tmp = a[i];
            for (j = i; j >= h && a[j - h] > tmp; j -= h)
                a[j] = a[j - h];
            a[j] = tmp;
        }
    }
}

The nice thing about Shell Sort is that, while the increment table is a complete mystery, the algorithm itself is quite simple and well within our grasp.

Let's plug this into our sort program and see how well it does.

I seem to get a pretty consistent 6.3 seconds for 100,000 items on my AMD Athlon XP 2500 system which is almost as good as the results we were getting from our Binary Insertion Sort implementation.

Now for some optimizations. We know that an ideal set of increment sizes will get us down to close to O(n1.5) and that it is unlikely that the [7 5 3 1] set is ideal, so I suggest we start there.

On a hunch, I just started adding more and more primes to our [7 5 3 1] table and noticed that with each new prime added, it seemed to get a little faster. At some point I decided to experiment a bit and tried using a set of primes farther apart and noticed that with a much smaller set of increments, I was able to get about the same performance as my much larger set of primes. This spurred me on some more and I eventually came up with the following set:

{ 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1 }

In order to use this set, however, we need a slightly more complicated method for determining the next value for h than just h -= 2, so I used a lookup table instead:

static int htab[] = { 14057, 9371, 6247, 4177, 2777, 1861, 1237, 823, 557, 367, 251, 163, 109, 73, 37, 19, 11, 7, 5, 3, 1, 0 };

void
ShellSort (int a[], int n)
{
    int *h, i, j;
    int tmp;

    for (h = htab; *h; h++) {
        for (i = *h; i < n; i++) {
            tmp = a[i];
            for (j = i; j >= *h && a[j - *h] > tmp; j -= *h)
                a[j] = a[j - *h];
            a[j] = tmp;
        }
    }
}

With this new table of increments, I was able to achieve an average sort time of about 0.09 seconds. In fact, ShellSort() in combination with the above increment table will sort an array of 2 million items in about the same amount of time as our optimized BinaryInsertionSort() took to sort 100 thousand items. That's quite an improvement, wouldn't you say!?

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Binary Insertion Sort

Note: this is a repost from an old weblog.

The Binary variant of Insertion Sort uses a binary search to find the appropriate location to insert the new item into the output. Before we can implement this variant of the insertion sort, we first need to know and understand how the binary search algorithm works:

Binary Search

Binary searching, for those who don't know (no need to raise your hand in shame), is a method of searching (duh) in which you attempt to compare as few items in the dataset to the item you are looking for as possible by halving your dataset with each comparison you make. Hence the word binary (meaning two). Allow me to illustrate:

Given the following dataset of 9 items in sorted order, find the value 8:

[1 2 3 4 5 6 7 8 9]

The first step is to check the midpoint (there are 9 elements, so the midpoint would be (9+1)/2 which is the 5th element):

[1 2 3 4 5 6 7 8 9]

Well, 5 is not 8 so we're not done yet. Is 8 less than 5? No. Is 8 greater than 5? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[6 7 8 9]

Oops. 4 does not have a midpoint, so we round off any way we want to (it really doesn't matter). (4+1)/2 = 2.5, so for the sake of argument lets just drop the decimal and check the 2nd element (this is how integer math works anyway, so it's simpler):

[6 7 8 9]

Well, 7 is not 8 so we're not done yet. Is 8 less than 7? No. Is 8 greater than 7? Yes. So we know that the value we seek is in the second half of the array (if it's there). We are now left with:

[8 9]

Again we find the midpoint, which in this case is 1.

[8 9]

Does 8 equal 8? Yes! We're done!

How many tries did that take us? Three. Three tries and we found what we were looking for. And, in the worst possible case for this particular array (searching for the 9), we would have been able to do it in 4 tries.

Not bad, huh?

To implement our BinarySearch() function, the easiest way will be to write it using recursion. Recursion is a technique that allows us to break the problem into smaller and smaller pieces (or, the array in this case - as illustrated above) until we arrive at the solution (or, in this case, the item we are looking for). Recursion is implemented by having a function call itself to process a subset of the data, which, may again call itself (repeatedly until the problem is solved).

Notice in my above explanation of how to search the 9-item array using the binary search algorithm, how I continually break the array into smaller and smaller subsets? That's why binary search is often described as a recursive algorithm - you keep repeating the the process on smaller and smaller subsets until your subset cannot get any smaller or until you find what you are looking for.

Hopefully I've explained that well enough, if not - perhaps taking a look at the following implementation will help?

int
BinarySearch (int a[], int low, int high, int key)
{
    int mid;

    mid = low + ((high - low) / 2);

    if (key > a[mid])
        return BinarySearch (a, mid + 1, high, key);
    else if (key < a[mid])
        return BinarySearch (a, low, mid, key);

    return mid;
}

Note: To get the midpoint, we use the formula low + ((high - low) / 2) instead of (high + low) / 2 because we want to avoid the possibility of an integer overflow.

In my above implementation, a[] is our integer array, low is the low-point of the subset in which we are looking, high is the high-point of the subset in which we are looking, and key is the item that we are looking for. It returns the integer index of the value we are looking for in the array. Here's how we would call this function from our program: BinarySearch (a, 0, 9, 8);

Remember that in C, array indexes start at 0. We pass 9 as the high-point because there are 9 elements in our array. I don't think I need to explain why we pass 8 as the key.

If we then plug this into our Insertion Sort algorithm from yesterday (instead of doing a linear search), we get a Binary Insertion Sort... almost. There's one minor change we have to make to our BinarySearch() function first - since we are not necessarily trying to find an already-existing item in our output, we need to adjust BinarySearch() to return the ideal location for our key even in the event that it doesn't yet exist. Don't worry, this is a very simple adjustment:

int
BinarySearch (int a[], int low, int high, int key)
{
    int mid;

    if (low == high)
        return low;

    mid = low + ((high - low) / 2);

    if (key > a[mid])
        return BinarySearch (a, mid + 1, high, key);
    else if (key < a[mid])
        return BinarySearch (a, low, mid, key);

    return mid;
}

Okay, now we're ready to plug it in. Here's the result:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        tmp = a[i];
        for (j = i - 1; j >= ins; j--)
            a[j + 1] = a[j];
        a[ins] = tmp;
    }
}

There's a couple of optimizations we can make here. If ins is equal to i, then we don't need to shift any items nor do we need to set a[ins] to the value in a[i], so we can wrap those last 4 lines in an if-statement block:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        if (ins < i) {
            tmp = a[i];
            for (j = i - 1; j >= ins; j--)
                a[j + 1] = a[j];
            a[ins] = tmp;
        }
    }
}

If you remember from our Insertion Sort optimizations, the memmove() function really helped speed up shifting a lot of items all at once. But before we do that, lets see what kind of times our current implementation takes to sort 100,000 items (just so we have something to compare against after we make that memmove() optimization).

On my AMD Athlon XP 2500, it takes roughly 25 seconds to sort 100,000 random items (using the main() function from our Bubble Sort analysis a few days ago). That's already on par with yesterday's optimized insertion sort implementation and we haven't even plugged in memmove() yet!

I'm pretty excited and can't wait to plug in memmove() to see what kind of results we get, so here we go:

void
BinaryInsertionSort (int a[], int n)
{
    int ins, i, j;
    int tmp;

    for (i = 1; i < n; i++) {
        ins = BinarySearch (a, 0, i, a[i]);
        if (ins < i) {
            tmp = a[i];
            memmove (a + ins + 1, a + ins, sizeof (int) * (i - ins));
            a[ins] = tmp;
        }
    }
}

After plugging in memmove(), our BinaryInsertionSort() sorts 100,000 items in about 5.5 seconds! Whoo!

Lets recap: We started off with the goal of sorting 100,000 items using Bubble Sort which took over over 103 seconds to accomplish. We then scratched our heads and optimized the Bubble Sort algorithm the best we could and obtained a result that was twice as fast. Then we traded our BubbleSort() implementation in for another O(n2) algorithm (generally not something I would suggest wasting time on, but for the sake of learning about a bunch of sorting algorithms, why not?). Using InsertionSort(), we got that down to 26.5 seconds which is 4 times faster than our original sort! But wait, then we took our InsertionSort() one step farther and used a binary search algorithm to help us find the ideal location in which to insert our item and we got it down to a whopping 5.5 seconds - 20 times faster than what we started with!

That's pretty awesome if you ask me...

There's still one more optimization we can try on our BinaryInsertionSort() implementation to attempt to get faster speeds, but it may be over your head if you're a beginner so feel free to ignore my following code dump:

void
BinaryInsertionSort (int a[], int n)
{
    register int i, m;
    int hi, lo, tmp;

    for (i = 1; i < n; i++) {
        lo = 0, hi = i;
        m = i / 2;

        do {
            if (a[i] > a[m]) {
                lo = m + 1;
            } else if (a[i] < a[m]) {
                hi = m;
            } else
                break;

            m = lo + ((hi - lo) / 2);
        } while (lo < hi);
 
        if (m < i) {
            tmp = a[i];
            memmove (a + m + 1, a + m, sizeof (int) * (i - m));
            a[m] = tmp;
        }
    }
}

The basic idea here was that I wanted to get rid of recursion because, unfortunately, as nice as recursion is for visualizing (and implementing) our binary search, there is a cost penalty for every time we recurse - there is both function call overhead (which slows us down) and a memory overhead (we need to grow the stack). If we're clever enough, we can sometimes bypass the need for recursion like I did above which can often lead to significant performance gains.

In this case, however, we didn't improve the performance all that much (down from 5.5 to 5.2 seconds) but on other architectures (such as SPARC), it may be a lot more noticeable. It may also be a lot more noticeable for larger datasets. In fact, lets try sorting 200,000 items and see what sort of times we get.

Ah, much more noticeable now. With recursion, it takes 46 seconds and without, it takes 37 seconds - so there is definitely an advantage to cleverly working around the need for recursion.

Okay, that's all I've got for Binary Insertion Sort, but there are still plenty more sorting algorithms to try before we call it a day - we've just barely scratched the surface - and so far we've only looked into the O(n2) algorithms - there are others that promise O(n lg n) performance!

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Insertion Sort

Note: this is a repost from an old weblog.

Insertion Sort is another O(n2) sorting algorithm (just like Bubble Sort was) but is a little bit faster. Now, ideally, when trying to optimize by picking a new algorithm - you don't want to go from one slow algorithm to another, but I figure while I'm explaining sorting techniques - I might as well explain a bunch of them. Anyways... it turns out that insertion sort works out to be faster than Bubble Sort in the general case.

The way Insertion Sort works, is you move one item from your input dataset into an output dataset one item at a time. Take an item from the input and place it in the output at the proper position such that the output dataset is always in sorted order. Allow me to demonstrate:

Given: [5 8 2 9 6 3 7 1 0 4]

That's our input dataset. Our output dataset is so far empty. The first thing we do, is to take the first item of our input dataset (or any item, really, but we might as well grab them out of the input dataset in order, right?) and place it in the output dataset. Since the output dataset will only have 1 item this first round, there's nothing to sort. The result is:

Input: [8 2 9 6 3 7 1 0 4] Output: [5]

Now take the next input item and place it in the output, but in the proper sorted order:

Input: [2 9 6 3 7 1 0 4] Output: [5 8]

Lather. Rinse. Repeat.

Input: [9 6 3 7 1 0 4] Output: [2 5 8]
Input: [6 3 7 1 0 4] Output: [2 5 8 9]
Input: [3 7 1 0 4] Output: [2 5 6 8 9]
Input: [7 1 0 4] Output: [2 3 5 6 8 9]
Input: [1 0 4] Output: [2 3 5 6 7 8 9]
Input: [0 4] Output: [1 2 3 5 6 7 8 9]
Input: [4] Output: [0 1 2 3 5 6 7 8 9]
Input: [] Output: [0 1 2 3 4 5 6 7 8 9]

Tada! We have a sorted dataset. This is the basis of all insertion sort variants, the difference between the various insertion sorts (I only know of 2, but wouldn't be surprised if there were more) is the way in which we find the ideal place to insert the new item from the input into the output.

Before I get into the different variants, however, let me address one of the concerns you likely have thus far: if we're sorting a large-ish array (note: if we are sorting a linked list, then this isn't an issue), we don't want to have two arrays because that doubles the amount of memory we're using!

Indeed, but if you'll notice: as the output size increases, the input size decreases - together, both the input and output sizes total the same number of elements and if we iterate through the input elements in order, then we can share the same array as both the input and output arrays. Let me re-illustrate:

Output | Input
[][5 8 2 9 6 3 7 1 0 4]
[5 ][8 2 9 6 3 7 1 0 4]
[5 8 ][2 9 6 3 7 1 0 4]
[2 5 8 ][9 6 3 7 1 0 4]
...

See how we can cheat now? Visualization is the key to solving so many problems. Don't be afraid to take a pencil and paper and "draw" the problem - many times the solution will present itself.

Linear Insertion Sort

Generally when someone refers to Insertion Sort, this is the variant that they are talking about. As I was saying above, the difference between the variants is how they find the proper position in which to insert the new item into the output. As the name suggests, Linear insertion sort uses a linear search in order to find this position. Basically, this comes down to scanning through the output array starting at the first position and comparing the values: if the new item is larger than the first output value, move to the next output item and so on until the new item's value is less than the item we are comparing against in the output - once we find that, we have found the position to insert the new item.

Note: we could actually get away with using <= to minimize the number of comparisons we have to do, but then if we have multiple items with the same values, they won't remain in the same order they were in in the input, but if that doesn't bother you, then definitely use <=. Sorts that guarantee that items in the output with the same comparison values are in the same order as they were in the input are called "stable sorts". This is usually a pretty desirable trait (unless performance is more important). The lucky thing for us, though, is that in this particular case, keeping duplicate items in the same order we found them in the input, would force us to work backwards starting at the end of the output array rather than working forwards starting at the beginning of the output array which just so happens to allows us a more promising optimization. Pretty crafty, eh? I thought so...

One thing I should remind you about is that when you insert an item into the output, you'll need to shift all the items after that position one place to the right.

Okay, on to the implementation...

The first optimization you should notice is that after the first "loop", the first item of the input is always the first item of the output, so we might as well start with the second item of the input array. So with that, go ahead and implement this algorithm - you should get something similar to this:

void
InsertionSort (int a[], int n)
{
    int i, j = 0;
    int key;

    for (i = 1; i < n; j = i, i++) {
        key = a[i];
        while (j >= 0 && a[j] > key) {
            a[j + 1] = a[j];
            j--;
        }
        a[j + 1] = key;
    }
}

You'll notice that I took the "work backwards starting at the end of the output array" approach that I noted above. As I started to mention, this approach has the added benefit of allowing us to shift the output items as we go which means we get to shift fewer items per loop on average than if we had worked in the forward direction, resulting in a performance gain (shifting a bunch of items is more expensive than an integer comparison).

If we feed this into our sort program and have main() call this new function rather than the BubbleSort() implementation we wrote yesterday, we find that we have a sort that is significantly faster - down to 36 seconds from the optimized BubbleSort()'s 51 seconds. Not bad...

See any obvious improvements we can make that might result in a significantly faster implementation of our InsertionSort() implementation? I don't see anything I would consider major, but we might try using memmove() instead of manually moving items one space to the right. We'll get the most bang for the buck if we wait until we find our optimal insertion point before moving anything around, so:

void
InsertionSort (int a[], int n)
{
    int i, j = 0;
    int key;

    for (i = 1; i < n; j = i, i++) {
        key = a[i];
        while (j >= 0 && a[j] > key)
            j--;
        memmove (a + (j + 2), a + (j + 1), sizeof (int) * ((i - 1) - j));
        a[j + 1] = key;
    }
}

Okay, so with that very simple adjustment, we got our total sort time from 36 seconds down to 26.5 seconds. Nothing to scoff at, surely, but at the same time not as drastic an improvement as changing algorithms had been (see why algorithms are so important for optimizations?).

Now just imagine if we used an algorithm that promised better than O(n2) performance!

Before we jump to something better than O(n2) though, I still have one more trick up my sleeve to improve our insertion sort algorithm (not implementation this time, but rather: algorithm). I dub thee: Binary Insertion Sort.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Bubble Sort

Note: This is a repost from an old weblog.

Bubble Sort works by iterating through the dataset, comparing two neighboring items at a time, and swapping them if the first item is larger than the second item. In order to sort a dataset of N items, this operation must be done N-1 times. Let me illustrate:

Given:

Figure 1

We start by comparing the first two items, 5 and 8:

Figure 2

Since 5 is smaller than 8, we don't need to swap them.

Next, we compare the second set of items, 8 and 2: Figure 3

8 is larger than 2, so we have to swap them, resulting in:

Figure 4

We keep doing this until we reach the end of the dataset and then we'll start all over again - repeating the process N-1 times (in this example, 9 times since we have 10 items).

Figure 5

At the end of our first loop, we end up with:

Figure 6

Now we start the whole process over again (we have 8 more loops to go!).

Figure 7

Notice anything interesting at the very end? You guessed it! That last comparison is pretty worthless - the last item in the dataset is always going to be the largest value after the first loop, right? Right, but that's not all - each loop through the items, we can ignore more and more items from the end of the dataset (after the second loop, we can ignore the last 2 items; after the third loop, we can ignore the last 3 items and so on). That little trick is our first optimization for this algorithm!

Let's see what the code would look like:

void
BubbleSort (int a[], int n)
{
    int i, j, tmp;
    
    for (i = n - 1; i >= 0; i--) {
        for (j = 0; j < i; j++) {
            if (a[j] > a[j + 1]) {
                tmp = a[j];
                a[j] = a[j + 1];
                a[j + 1] = tmp;
            }
        }
    }
}

Now that we have some code to test, lets see how long it takes to sort a random array of 100,000 integers. To do that, we'll need to write a little program to use our BubbleSort() routine:

#include <stdlib.h>
#include <time.h>

int main (int argc, char **argv)
{
    int array[100000], i;
    
    srand (time (NULL));
    
    for (i = 0; i < 100000; i++)
        array[i] = rand ();
    
    BubbleSort (array, 100000);
    
    return 0;
}

We'll just use our system's time command to get an estimate of how long it takes to sort.

Go ahead and run our little program a few times. For me, it seems to average about 103 seconds on my AMD Athlon XP 2500.

I don't know about you, but I saw a pretty obvious optimization that we could make to our BubbleSort() implementation that might make it a bit faster. Remember how we noticed that each time through the inner loop, the net result was that the largest item was moved all the way to the right (discounting the largest items from previous loops)? What if, instead of swapping each time through the inner loop, we waited until the inner loop finished and then swapped? Let's try it:

void
BubbleSort (int a[], int n)
{
    int i, j, max, tmp;
    
    for (i = n - 1; i >= 0; i--) {
        for (max = 0, j = 1; j < i + 1; j++) {
            if (a[j] > a[max])
                max = j;
        }
        
        if (max < i) {
            tmp = a[max];
            a[max] = a[i];
            a[i] = tmp;
        }
    }
}

In the above code, we use max to hold the index of the largest item we find. We initialize it to the index of the first item (0 in languages like C) and then start our inner for-loop. You'll notice a change here: instead of starting j at 0, we start it at 1 because we've already "looked" at the first item. Also, we need to loop the same number of times - so we continue to iterate as long as j is less than i + 1 (rather than j < i of the previous implementation).

If I now compile our benchmarking program to use this new BubbleSort() implementation, we notice a huge performance increase - or at least I did on my machine. The average execution time for this new implementation seems to be about 51 seconds. That sure beats the pants off 103 seconds, doesn't it?

Even so, 51 seconds to sort 100,000 items is a long wait, especially since they are just simple integers. There's only one more optimization that I can think of. If you think about how bubble sort works, each time through the outer loop, you notice that if we ever go through one of those N-1 iterations without having to swap any items that it is safe to conclude that we're done and so can skip performing the remainder of the N-1 iterations. I'll leave this optimization as an exercise for my readers rather than posting a new implementation here.

Since I don't see any other real obvious optimizations that we can make which would drastically improve performance, I think it's time we consider another sorting algorithm.

To learn more about sorting, I would recommend reading Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition)

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.