Saturday, May 31, 2008

Interesting link of the day

What IronRuby Running Rails *REALLY* Means and Why Miguel de Icaza Deserves The Credit

Grats to Miguel ;-)

GMime 2.3.1

I've been continuing to make improvements to the API over the past few days and have just made a 2.3.1 release.

The main focus of this release has been a redesign of the header API which has been nagging at me for quite a while now. To fix it, I've done the following:

  • Renamed GMimeHeader to GMimeHeaderList, GMimeHeader felt more like it should be a single header (name/value pair).
  • Introduced a new GMimeHeader, altho it is internal only at the moment. I had originally planned on making it public (hence the rename of the old GMimeHeader struct).
  • Introduced a GMimeHeaderIter which can be used to iterate forward and backward over the headers, allowing you to change header values (but not names). It also allows removal of headers which makes it possible to remove any header you want, whereas the old API would only let you remove a header by name (which was a problem if there was more than 1 by the same name).
  • Got rid of g_mime_header_foreach() because that API sucked and has been replaced by the far more useful GMimeHeaderIter API.

The GMimeHeaderIter API looks like this:

GMimeHeaderIter *g_mime_header_iter_copy (GMimeHeaderIter *iter);
void g_mime_header_iter_free (GMimeHeaderIter *iter);

bool g_mime_header_iter_equal (GMimeHeaderIter *iter1, GMimeHeaderIter *iter2);

bool g_mime_header_iter_is_valid (GMimeHeaderIter *iter);

bool g_mime_header_iter_first (GMimeHeaderIter *iter);
bool g_mime_header_iter_last (GMimeHeaderIter *iter);

bool g_mime_header_iter_next (GMimeHeaderIter *iter);
bool g_mime_header_iter_prev (GMimeHeaderIter *iter);

gint64 g_mime_header_iter_get_offset (GMimeHeaderIter *iter);
const char *g_mime_header_iter_get_name (GMimeHeaderIter *iter);
bool g_mime_header_iter_set_value (GMimeHeaderIter *iter, const char *value);
const char *g_mime_header_iter_get_value (GMimeHeaderIter *iter);

/* if the iter is valid, removes the current header and advances 
   to the next - returns TRUE on success or FALSE otherwise */
bool g_mime_header_iter_remove (GMimeHeaderIter *iter);

Currently, the way to get a GMimeHeaderIter is to call g_mime_header_list_get_iter() which returns a newly allocated iter. I'm not sure if this is the best API or not tho... some other thoughts I've had are:

GMimeHeaderIter *g_mime_header_iter_new (void);
bool g_mime_header_list_get_iter (GMimeHeaderList *list, GMimeHeaderIter *iter);

The second function would initialize iter and return TRUE, or, if list was empty, it could return FALSE.

Another option would be to just have:

GMimeHeaderIter *g_mime_header_iter_new (GMimeHeaderList *list);

Then, if list is empty you'd just get back an invalidated iter. If, later, headers were added to list, then perhaps the iter could auto-magically become valid again. This would more-or-less work the same as the current code - except that the way to instantiate a GMimeHeaderIter is different.

To be honest, now that I've written these 2 ideas down, I think I prefer the first idea.

So... iterators. Since you can have multiple iterators for the same GMimeHeaderList, it is important to invalidate them on at least two occasions:

  • when the GMimeHeaderList is destroyed
  • when the header that an iter points to is removed; either by a call to g_mime_header_list_remove() or via a call to g_mime_header_iter_remove() on another iter instance.

You'll be pleased to note that only iters that reference a header that got removed are invalidated. This fact seems to be an argument in favor of my first idea above, as it would allow re-use of iters once they get invalidated.

Thursday, May 29, 2008

GMime 2.3.0

GMime 2.3.0 has been released which marks the first development release of GMime since 2002. A new age has begun.

This new development release gets rid of off_t from all public interfaces, replacing them with gint64 instead so that there is no API/ABI breakage when building it with Large File support. But more importantly (to me), it also fixes a number of the API inconsistencies/uglyness that I discovered while looking over the GMime-Sharp bindings.

The GMime-Sharp bindings still have a long way to go to be beautiful, but they are becoming more of a priority as I am more interested in using GMime-Sharp myself. If you are a user of the GMime-Sharp bindings, I'd love for you to try out GMime 2.3.0 and give me some feedback and how to improve the API even more. I, myself, will continue to scrutinize the API and make adjustments as I go - in the end, I hope to have the most beautiful MIME .NET API ever created.

At first, I was only going to fix GMime's public interfaces to use gint64 instead of off_t and save the rest of my changes for a "3.0" or at least a "2.6", but I decided that since I was breaking the ABI anyway - I might as well do some cleanup and I think developers worldwide will thank me after their initial pain of porting their software to the new API (for which I hope to write a script to help in their troubles).

Largely these changes are renaming of functions, sometimes shuffling them to become methods on a base class, etc.

I got rid of GMimePartEncodingType and created a GMimeContentEncoding enum instead, complete with a g_mime_content_encoding_to_string() and g_mime_content_encoding_from_string(). I also created a new state object called GMimeEncoding (may rename to GMimeEncoder?) which simplifies the base64/QuotedPrintable/UU encoder/decoder APIs a little in that it handles initializing the state variables for you as you encode or decode to any of those encodings.

As I was updating code to use these new enums/structs, I was able to simplify GMimeFilterBasic to not need its own GMimeFilterBasicType struct - instead, you pass the encoding and a bool to encode vs decode to the new g_mime_filter_basic_new() constructor. This has helped simplify a lot of code inside GMime and I'm sure it will help simply everyone elses code as well, because there is no longer a need to map GMIME_PART_ENCODING_BASE64 to GMIME_FILTER_BASIC_TYPE_BASE64_ENC or GMIME_FILTER_BASIC_TYPE_BASE64_DEC.

I've also fixed GMimeContentType and GMimeContentDisposition to notify their parent GMimeObject when they are changed so that the following C#ism becomes possible:

part.ContentType.SetParameter ("name", "fubar.txt");

Previously, you'd have to use the GMimeObject's Content-Type helper methods instead of using the Content-Type's methods directly, like so:

part.SetContentTypeParameter ("name", "fubar.txt");

There are quite a few other improvements like this that I've added to the GMime-Sharp API as well. Sure, they may be minor, but they make the API much more polished.

Wednesday, May 28, 2008

A Moment of Zen

And here it is, your Moment of Zen:

Those who assume the worst in other people, are by their very nature among the worst people.

Those who assume the best in other people, are by their very nature among the best people.

-- Jeffrey Stedfast

Saturday, May 17, 2008

Debian Language Benchmarks - SumFile

Since someone brought it up the other day on IRC, I figured I'd take a look at some of the Java vs C# benchmarks where Java was claimed to be faster than C#.

Scrolling through the list, there were 3 tests where Java was supposedly more than 2x faster than C# on Mono.

The SumFile test looked fairly simple and I figured the bottleneck there might be related to something silly being done with StandardInput (Miguel replaced the Int.Parse() routines with my more optimal implementations a while back, so I figured that was probably ok).

The first thing I tried to do was reproduce their findings, so I grabbed the Java and C# sources from the Debian site and compiled them.

Both programs read from StandardInput an integer value per line, so I wrote a simple program to generate some input for the test:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    unsigned int max, i;
    int sum = 0;

    max = atoi (argv[1]);

    for (i = 0; i <= max; i++) {
        printf ("%u\n", i);
        sum += i;
    }

    fprintf (stderr, "master value: %d\n", sum);

    return 0;
}

As you can see above, the program outputs the real sum of all the digits it is outputting to stderr so that we can compare the results of the C# and Java programs to make sure they are correct for 32bit signed integers.

Then I compiled the Java and C# versions using the same compile options mentioned on the Debian Language Shootout pages for each language:

javac -classpath '.' sumcol.java

and

gmcs sumcol.cs

I then ran each program using the following commands (using the same java and mono command-line options):

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 1647668640
2147483647

real    0m4.606s
user    0m5.264s
sys     0m0.276s
[fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe
master value: 1647668640
1647668640

real    0m1.415s
user    0m2.136s
sys     0m0.228s

As you can see above, there's no way that Java is 2.3x faster than Mono... so I'm not sure where they are getting these results from. Not only that, but Java is getting the wrong answer!

My best guess is that Java does not allow integer wrapping, so I suppose that that is okay... but it's a bit odd.

For comparison's sake, here's what I get with the c implementation of sumfile:

[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol
master value: 1647668640
1647668640

real    0m1.043s
user    0m1.604s
sys     0m0.188s

Thinking that Java might be taking a performance hit from the bounds checking, I changed my original number-generating program to always print '1' on each line:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
    unsigned int i, max;
    int sum = 0;

    max = atoi (argv[1]);

    for (i = 0; i < max; i++) {
        printf ("1\n");
        sum++;
    }

    fprintf (stderr, "master value: %d\n", sum);

    return 0;
}

Running the C, Mono, and Java versions again, I get the following results:

[fejj@moonlight benchmarks]$ time ./output 5000000 | ./sumcol
master value: 5000000
5000000

real    0m0.601s
user    0m0.828s
sys     0m0.032s
[fejj@moonlight benchmarks]$ time ./output 5000000 | mono sumcol.exe
master value: 5000000
5000000

real    0m0.774s
user    0m0.876s
sys     0m0.064s
[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 5000000
5000000

real    0m0.751s
user    0m0.824s
sys     0m0.096s

Here we can see that Java is slightly faster than Mono when comparing the 'real' and 'user' times.

I figured that in order to get a more accurate reading, I should re-run using a larger value, so...

[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol
master value: 21474836
21474836

real    0m2.625s
user    0m3.236s
sys     0m0.304s
[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe 
master value: 21474836
21474836

real    0m3.225s
user    0m3.712s
sys     0m0.272s

Again, Java is faster according to 'real' and 'user' times.

What have we learned?

The Java program appears to be slightly faster if and only if we do not overflow the signed integer, otherwise Mono takes the prize.

If we revert back to using the original program I wrote to generate input and re-run using our most recent 'max' value, we find:

[fejj@moonlight benchmarks]$ time ./output 21474836 | java sumcol
master value: 370655678
2147483647

real    0m19.157s
user    0m22.757s
sys     0m0.568s
[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol.exe
master value: 370655678
370655678

real    0m6.004s
user    0m9.513s
sys     0m0.940s

Here we see that Mono clearly outperforms Java, no matter how you slice it.

Conclusion?

You can't compare the performance of the Java and C# programs without first understanding the implications of the input provided to them.

Had the Debian Language Shootout used a different input, the performance tests might have instead shown Java getting raped by Mono for this particular test. Basically, Java just got lucky that the inputs were in their favor.

Update: Oops, I forgot to give specifics of my Mono/Java versions and the specs of my laptop:

Mono: JIT compiler version 1.9 (/trunk/ r103228)

java -version gives the following output:

java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

The hardware used was my Lenovo T61 Core2Duo @ 2.4GHz

This was running on OpenSuSE 10.3 with the latest updates.

Update 2: I was pointed to http://shootout.alioth.debian.org/download/sumcol-input.txt to use as input for this SumFile test. Below I've pasted the results:

[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 
500

real    0m0.137s
user    0m0.052s
sys     0m0.028s
[fejj@moonlight benchmarks]$ time mono sumcol.exe < ~/sumcol-input.txt 
500

real    0m0.078s
user    0m0.060s
sys     0m0.012s

Sorry guys, but even with this input, Java is not 2x faster than Mono.

Update 3: Since the above comparisons were all done using the StreamTokenizer version of the Java implementation, I figured I should go back and get the run times of the original Java implementation.

Using my original implementation of output.c, each line having a value 1 larger than the previous line, we get:

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 1647668640
1647668640

real    0m1.492s
user    0m2.272s
sys     0m0.168s

Interesting note: this Java implementation does not get the wrong answer...

The results for "1" on ever line gives the following results:

[fejj@moonlight benchmarks]$ time ./output 5000000 | java -server -Xbatch sumcol
master value: 5000000
5000000

real    0m0.924s
user    0m0.984s
sys     0m0.176s

Repeating the trial run that totally killed Java's performance with the Tokenizer implementation, we get:

[fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol
master value: 370655678
370655678

real    0m6.564s
user    0m9.893s
sys     0m0.688s

And finally, the results for the sumcol-input.txt, we get:

[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol < ~/sumcol-input.txt 
500

real    0m0.107s
user    0m0.056s
sys     0m0.032s

Conclusion:

The original Java implementation gets much better results than the StreamTokenizer implementation which was supposedly faster according to the Benchmarks table, however, it's still not 2x faster than the Mono implementation (in fact, this version of the Java implementation is roughly on par with the C# implementation).

Update 4: One of the comments to this blog post requested I try the Java6 SumCol #3 program which includes its own buffering/tokenizer implementation.

Since this program "cheats" a bit, I figured I'd pit it against my own implementation of equal craftyness ;-)

You can find my C# implementation here.

The results are as follows:

[fejj@moonlight benchmarks]$ time ./output 21474836 | mono sumcol2.exe
master value: 370655678
370655678

real    0m4.996s
user    0m4.904s
sys     0m0.676s
[fejj@moonlight benchmarks]$ time ./output 21474836 | java -server -Xbatch sumcol3
master value: 370655678
370655678

real    0m5.117s
user    0m5.748s
sys     0m1.088s

Am I a crafty bugger or what? ;-)

Running these implementations against sumcol-input.txt, I get the following results:

[fejj@moonlight benchmarks]$ time mono sumcol2.exe < ~/sumcol-input.txt 
500

real    0m0.076s
user    0m0.064s
sys     0m0.016s
[fejj@moonlight benchmarks]$ time java -server -Xbatch sumcol3 < ~/sumcol-input.txt 
500

real    0m0.108s
user    0m0.064s
sys     0m0.040s

Damn I'm good! ;-)

Update 5: It's been suggested that the -server and -Xbatch args might make Java slower, so I've redone the tests (this time using sumcol-input.txt repeated 100,000 times)

Here are the results:

[fejj@moonlight benchmarks]$ time java -server sumcol < sumcol-input100000.txt 
50000000

real    0m19.025s
user    0m18.657s
sys     0m0.412s
[fejj@moonlight benchmarks]$ time java sumcol < sumcol-input100000.txt 50000000

real    0m20.643s
user    0m20.269s
sys     0m0.364s
[fejj@moonlight benchmarks]$ time mono sumcol.exe < sumcol-input100000.txt 
50000000

real    0m18.884s
user    0m18.245s
sys     0m0.604s

In the case of Java, it looks like using the -server flag performs a little bit better on this particular test than without.

Overall, the Java and Mono performance for this test are remarkably close.

In the end, I think I can feel pretty confidant that Mono's performance on this test is acceptable and it's not worth wasting any more time on it for now.

Update 6: Okay, one last update to show that a fully managed C# implementation can outperform the fastest C implementation submitted to the Debian Language Benchmarks:

[fejj@moonlight benchmarks]$ time mono sumcol2.exe < sumcol-input100000.txt 
50000000

real    0m2.906s
user    0m2.572s
sys     0m0.308s
[fejj@moonlight benchmarks]$ time ./sumcol < sumcol-input100000.txt 
50000000

real    0m13.967s
user    0m13.277s
sys     0m0.668s

Yes, ladies and gentlemen, that is C# wiping the floor with C.

Saturday, May 10, 2008

Wouldn't it be nice if...

Over the past week, I've been spending some time hacking on Evolution again because of my frustration with the current IMAP backend. This got me to wondering... why hasn't anyone stepped up to the plate and rewritten Evolution's IMAP code yet?

I think the reason can be summed up with the following 2 problems:

1. IMAP is hard

2. Coding something complicated like a multithreaded multiplexed IMAP client library in C is harder.

That got me and Michael Hutchinson wondering... wouldn't it be nice if we could write Camel provider plugins in C# or in any other managed language that we might prefer?

I think Camel, like Evolution itself, should allow developers to implement plugins in C# as well. I really think this might help lessen the burden of implementing new mail protocol backends for Camel/Evolution.

On that note, I've created a new svn module called camel-imap4 which can be built against your installed evolution-data-server devel packages.

Currently, however, it'll probably only work with e-d-s >= 2.23.x because some things (and assumptions) in Camel have changed recently.

One problem I'm having is that the symbol camel_folder_info_new() used to not exist in older versions of e-d-s, but recently that symbol was added and makes use of g_slice_alloc0(). The problem is that the way providers used to allocate CamelFolderInfo structures before was using g_new0() themselves. Why does this pose a problem? There's no guarantee that I'm aware of that you can mix and match g_malloc/g_slice_free or g_slice_alloc/g_free.

This makes it difficult for me to implement a plugin that builds and works with my installed version of Evolution (2.12) and also work with Evolution svn (2.23). This is quite unfortunate :(

While I'm at it, allow me to also propose some changes to the GChecksum API. Please, please, please make it so that we ned not allocate/free a GChecksum variable each time we need to checksum something?

I propose the ability to do the following:

GChecksum checksum;

g_checksum_init (&checksum, G_CHECKSUM_MD5);
g_checksum_update (&checksum, data, len);
g_checksum_get_digest (&checksum, digest, &len);

Then I'd like to be able to either call g_checksum_init() on checksum again or maybe have another function to clear state, maybe g_checksum_clear() which would allow me to once again use the same checksum variable for calculating the md5 of some other chunk of data.

Camel generates md5sums for a lot of data, sometimes in loops. Having to alloc/free every iteration is inefficient and tedious.

Update: It now builds and works with Evolution 2.12 (I haven't tested anything else). But the new and improved IMAP back-end for Evolution is now actually working. Whoohoo!

Code Snippet Licensing

All code posted to this blog is licensed under the MIT/X11 license unless otherwise stated in the post itself.