Sunday, April 26, 2009

Async Programming and Bottlenecks

An attendant of Devscovery asked Richter what are the gotchas when using the AsyncEnumerator. Jeff gave an example specific to the async enumerator, but here are a couple more that are applicable to multi-context programming. That’s a better name than multi-threaded since the CLR could execute everything on a single thread and these problems would still exist.

To use Jeff’s example,

If it takes five seconds to fetch an image, and you fetch two synchronously it’ll take ten seconds. But if you fetch them asynchronously it will only take five seconds. In fact if you fetch 1,000 asynchronously it will still only take five seconds.

There are two, perhaps unobvious, gotchas with that scenario. By removing the request-generation bottleneck by quickly initiating requests you can create new problems.

If you were creating a photo gallery listing the image sizes you might run out of memory downloading the images before you can start freeing the memory by de-queuing the IAsyncResult. I imagine their are other finite resources that could be exhausted this way, network connections, file handles perhaps.

The other potential problem, depending on your goal. What happens to the image server? It might have enjoyed having a request-generation bottleneck!

So watch out for the implications of solving a request generation bottleneck, unless your job is writing the stress-tester.

Friday, April 24, 2009

Compression Benchmarking in .Net

At Devscovery this year one of the libraries Wintellect distributed was a granular performance “timer” that John Robbins and Jeff Richter wrote.  Like System.Diagnostics.Stopwatch it tracks time, but it also tracks CPU cycles, number of garbage collections (GCs) including their generation, and has an easy way to run the test multiple iterations.

To try it out and do something useful at the same time I benchmarked different compression algorithms/libraries.  ms is time in milliseconds, Kc is kilocycles, G# is the number of garbage collections.

The test data was 95 MB of text files I downloaded from the SEC

System.IO.Compression.GZipStream Compression
     5,620ms  10,930,530Kc (G0=   4, G1=   2, G2=   1)
     25.08% = compressed size / uncompressed
Decompression
     2,696ms   5,253,716Kc (G0=   1, G1=   1, G2=   1)
------------------------------------------------------------
GNU BZip2 Compression
    36,574ms  70,954,961Kc (G0=  16, G1=   1, G2=   1)
    13.82% = compressed size / uncompressed
Decompression
    6,201ms  11,536,973Kc (G0=  10, G1=   1, G2=   1)
------------------------------------------------------------
Xceed BZip2 Compression
   380,690ms 740,282,089Kc (G0= 999, G1=  16, G2=   5)
   13.82% = compressed size / uncompressed
Decompression
    10,295ms  20,068,563Kc (G0=   6, G1=   3, G2=   3)
------------------------------------------------------------
LZMA Compression
   137,585ms 269,214,645Kc (G0=  12, G1=  10, G2=  10)
   13.7% = compressed size / uncompressed
Decompression
   3,628ms   7,081,899Kc (G0=   2, G1=   2, G2=   2)
------------------------------------------------------------
SharpZipLib BZip2 Compression
    88,732ms 172,606,685Kc (G0=3356, G1=  12, G2=   3)
    13.83% = compressed size / uncompressed
Decompression
    11,696ms  22,875,325Kc (G0=   4, G1=   3, G2=   3)

BZip2 and LZMA (7zip) get roughly equivalent compression ratios, and each are about twice at good at compressing the test data than GZip.  But if you look at the performance between the LZMA and BZip2 libraries there are huge differences. 

LZMA and SharpZibLib are over twice as fast as Xceed.  And the GNU BZip2 (C code) with a managed wrapper is three times faster than them.  So taking a the time to test out a couple options can pay off, especially if you use compression to speed up moving data over the network.

Wednesday, April 15, 2009

Experiment in Writing

At Devscovery in New York Scott Hanselman proposed that all software developers should have a blog.  His most compelling reasons were that writing a blog would help you become a better communicator and that it’s a neat way to track what you did in the past.  That was demonstrated by some pretty comical first postings he had from back in 2002.  It’s been something I have thought about doing for a while, so that was my motivation to start.

That was the first talk, and the last talk was an introduction to Microsoft’s Live Framework and Mesh by Jeff Richter, followed by a meandering conversation about cloud computing and if it will be the next big thing.

Here are some of the questions I’d want to answer before I’d start building software on something like Mesh / Live.

1) Would I bet my business on a service that might not be around in a couple years, or might be prohibitively expensive? 

Where I work that would be called a “supplier risk” what if the provider decided to stop their product, what if they realized it was key to my business and decided to raise the price to take some of the profit margin.  It’s always a particular risk with a service, and obviously companies are already reliant on services to run internet businesses, but they’re often commodities.  Multiple companies provide essentially similar data center hosting service and compete on price, same thing for internet bandwidth etc…

2) Related to 1, it doesn’t sound like there will be an option to host my own data.  It has to be on Microsoft’s cloud.  If the cloud code was sold the supplier risk would decrease.  Independent of supplier risk I imagine that financial institutions, law firms, or medical care providers wouldn’t want or be able to trust Microsoft to host their data or to be the authentication / authorization provider.  Again that could be solved with releasing the software to build your own cloud cluster.

3) Versioning.  Is this solved already?  If Amazon upgrades their API, perhaps even fixing a bug, but somehow it breaks your code do you have the option to keep running on their old API?  I recently ran into this problem with MSFT.  They released a patch that bundled security fixes with other fixes, the one of the changes introduced a bug that broke our code.  Since we run our own environment we were able to hold off on the patch until MSFT patches their bug.  How does a cloud provider handle problems like that.  Maybe it will take a few years for problems like that to come up and shake out.