gzip vs. bzip2 vs. rzip for log files

with my curiousity piqued by jeremy’s tests of gzip vs. bzip2 vs. rzip using a bunch of mail as the test data, i tried compressing an apache log file with the three tools, plus lzop:

programcpu time (s)size
gzip19.21028,362,079
gzip -932.40027,036,433
bzip2 -9849.48915,496,248
rzip147.46018,823,330
lzop3.24048,719,254
lzop -980.81032,531,485

the original file size is 295,927,205.

it’s too bad rzip can’t decompress to a stream. that makes it much less attractive as a log compression solution.

comments

sounds like your log files are too much repetitive and do not even fill the dictionary which describes used patterns to short represenations. that's scenario where bzip wins over rzip. maybe you may gain some bytes even by reducing to -8 or -6 with bzip.

but mail is more random compared to log files and the larger the files are = closer to 1GB+ the more noticable difference you get.

» asd » december 5, 2005 12:17pm

add a comment

sorry, comments on this post are closed.