gzip vs. bzip2 vs. rzip for log files
with my curiousity piqued by jeremy’s tests of gzip vs. bzip2 vs. rzip using a bunch of mail as the test data, i tried compressing an apache log file with the three tools, plus lzop:
program | cpu time (s) | size |
---|---|---|
gzip | 19.210 | 28,362,079 |
gzip -9 | 32.400 | 27,036,433 |
bzip2 -9 | 849.489 | 15,496,248 |
rzip | 147.460 | 18,823,330 |
lzop | 3.240 | 48,719,254 |
lzop -9 | 80.810 | 32,531,485 |
the original file size is 295,927,205.
it’s too bad rzip can’t decompress to a stream. that makes it much less attractive as a log compression solution.
Comments
Add a comment
Sorry, comments on this post are closed.
sounds like your log files are too much repetitive and do not even fill the dictionary which describes used patterns to short represenations. that's scenario where bzip wins over rzip. maybe you may gain some bytes even by reducing to -8 or -6 with bzip.
but mail is more random compared to log files and the larger the files are = closer to 1GB+ the more noticable difference you get.