Friday, April 24, 2009

Compression Benchmarking in .Net

At Devscovery this year one of the libraries Wintellect distributed was a granular performance “timer” that John Robbins and Jeff Richter wrote.  Like System.Diagnostics.Stopwatch it tracks time, but it also tracks CPU cycles, number of garbage collections (GCs) including their generation, and has an easy way to run the test multiple iterations.

To try it out and do something useful at the same time I benchmarked different compression algorithms/libraries.  ms is time in milliseconds, Kc is kilocycles, G# is the number of garbage collections.

The test data was 95 MB of text files I downloaded from the SEC

System.IO.Compression.GZipStream Compression
     5,620ms  10,930,530Kc (G0=   4, G1=   2, G2=   1)
     25.08% = compressed size / uncompressed
Decompression
     2,696ms   5,253,716Kc (G0=   1, G1=   1, G2=   1)
------------------------------------------------------------
GNU BZip2 Compression
    36,574ms  70,954,961Kc (G0=  16, G1=   1, G2=   1)
    13.82% = compressed size / uncompressed
Decompression
    6,201ms  11,536,973Kc (G0=  10, G1=   1, G2=   1)
------------------------------------------------------------
Xceed BZip2 Compression
   380,690ms 740,282,089Kc (G0= 999, G1=  16, G2=   5)
   13.82% = compressed size / uncompressed
Decompression
    10,295ms  20,068,563Kc (G0=   6, G1=   3, G2=   3)
------------------------------------------------------------
LZMA Compression
   137,585ms 269,214,645Kc (G0=  12, G1=  10, G2=  10)
   13.7% = compressed size / uncompressed
Decompression
   3,628ms   7,081,899Kc (G0=   2, G1=   2, G2=   2)
------------------------------------------------------------
SharpZipLib BZip2 Compression
    88,732ms 172,606,685Kc (G0=3356, G1=  12, G2=   3)
    13.83% = compressed size / uncompressed
Decompression
    11,696ms  22,875,325Kc (G0=   4, G1=   3, G2=   3)

BZip2 and LZMA (7zip) get roughly equivalent compression ratios, and each are about twice at good at compressing the test data than GZip.  But if you look at the performance between the LZMA and BZip2 libraries there are huge differences. 

LZMA and SharpZibLib are over twice as fast as Xceed.  And the GNU BZip2 (C code) with a managed wrapper is three times faster than them.  So taking a the time to test out a couple options can pay off, especially if you use compression to speed up moving data over the network.

No comments:

Post a Comment