Unix Data Compression Shootout
I wanted to try a new-to-me compressor, lz4, but it turned into a full ADHD-fueled file compression shoot-out:
Dang, lz4 is crazy fast!
Data/setup
The corpus is a 2.29 GiB uncompressed tar file consisting of several years worth of GPS data in various plain-text formats.
The computer is a Thinkpad x260 with the CPU governor set to performance. The CPU is an Intel i5-6200U
Outcome
Chart: (grouped by compressor) command/compressor time (user) size ratio none/cat 0.077 2462955520 gzip 57.283 338289587 7.28 gzip -1 22.682 400956710 6.14 gzip -9 113.047 325547190 7.57 bzip2 319.847 262857414 9.37 bzip2 -1 255.654 278217711 8.85 bzip2 -9 326.718 262857414 9.37 bzip3 205.822 231173201 10.65 zstd 12.520 321229917 7.67 zstd -1 8.812 317234226 7.76 zstd -9 63.019 282940675 8.70 zstd -11 101.278 281894351 8.74 zstd --ultra -22 7317.944 230075751 10.70 xz 1476.153 228082956 10.80 xz -1 201.569 290137816 8.49 xz -9e 4683.144 212748984 11.58 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 lz4 -9 74.670 434543206 5.67
Sorted by size: (descending) command/compressor time (user) size ratio none/cat 0.077 2462955520 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 lz4 -9 74.670 434543206 5.67 gzip -1 22.682 400956710 6.14 gzip 57.283 338289587 7.28 gzip -9 113.047 325547190 7.57 zstd 12.520 321229917 7.67 zstd -1 8.812 317234226 7.76 xz -1 201.569 290137816 8.49 zstd -9 63.019 282940675 8.70 zstd -11 101.278 281894351 8.74 bzip2 -1 255.654 278217711 8.85 bzip2 319.847 262857414 9.37 bzip2 -9 326.718 262857414 9.37 bzip3 205.822 231173201 10.65 zstd --ultra -22 7317.944 230075751 10.70 xz 1476.153 228082956 10.80 xz -9e 4683.144 212748984 11.58
Sorted by time: (ascending) command/compressor time (user) size ratio none/cat 0.077 2462955520 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 zstd -1 8.812 317234226 7.76 zstd 12.520 321229917 7.67 gzip -1 22.682 400956710 6.14 gzip 57.283 338289587 7.28 zstd -9 63.019 282940675 8.70 lz4 -9 74.670 434543206 5.67 zstd -11 101.278 281894351 8.74 gzip -9 113.047 325547190 7.57 xz -1 201.569 290137816 8.49 bzip3 205.822 231173201 10.65 bzip2 -1 255.654 278217711 8.85 bzip2 319.847 262857414 9.37 bzip2 -9 326.718 262857414 9.37 xz 1476.153 228082956 10.80 xz -9e 4683.144 212748984 11.58 zstd --ultra -22 7317.944 230075751 10.70
Chart: (compression ratio / time score) command/compressor time (user) size ratio ratio/time zstd --ultra -22 7317.944 230075751 10.70 0.0015 xz -9e 4683.144 212748984 11.58 0.0025 xz 1476.153 228082956 10.80 0.0073 bzip2 -9 326.718 262857414 9.37 0.0287 bzip2 319.847 262857414 9.37 0.0293 bzip2 -1 255.654 278217711 8.85 0.0346 xz -1 201.569 290137816 8.49 0.0421 bzip3 205.822 231173201 10.65 0.0518 gzip -9 113.047 325547190 7.57 0.0669 lz4 -9 74.67 434543206 5.67 0.0759 zstd -11 101.278 281894351 8.74 0.0863 gzip 57.283 338289587 7.28 0.1271 zstd -9 63.019 282940675 8.70 0.1381 gzip -1 22.682 400956710 6.14 0.2708 zstd 12.52 321229917 7.67 0.6124 lz4 -1 5.762 549838913 4.48 0.7774 lz4 5.744 549838913 4.48 0.7798 zstd -1 8.812 317234226 7.76 0.8811 none/cat 0.077 2462955520 1.00 12.9870 (nonsensical)
Conclusion
lz4 is the fastest compressor… but zstd -1 still kicks butt While it doesn’t score well in the overalls core, bzip3 still provides excellent compression in a reasonable amount of time.
Raw output
tmp $ lscpu |grep i5 Model name: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz tmp $ time cat < corpus.tar |wc -c 2462955520 real 0m1.388s user 0m0.077s sys 0m1.497s tmp $ time gzip < corpus.tar |wc -c 338289587 real 0m57.971s user 0m57.283s sys 0m0.633s tmp $ time bzip2 < corpus.tar |wc -c 262857414 real 5m21.280s user 5m19.847s sys 0m1.192s tmp $ time bzip3 < corpus.tar |wc -c 231173201 real 3m26.608s user 3m25.822s sys 0m0.712s tmp $ time zstd < corpus.tar |wc -c 321229917 real 0m11.717s user 0m12.520s sys 0m1.278s tmp $ time xz < corpus.tar |wc -c 228082956 real 6m15.579s user 24m36.153s sys 0m1.481s tmp $ time lz4 < corpus.tar |wc -c 549838913 real 0m2.190s user 0m5.744s sys 0m0.833s tmp $ time lz4 -9 < corpus.tar |wc -c 434543206 real 0m25.151s user 1m14.670s sys 0m0.869s tmp $ time zstd -9 < corpus.tar |wc -c 282940675 real 1m2.564s user 1m3.019s sys 0m1.351s tmp $ time zstd -11 < corpus.tar |wc -c 281894351 real 1m40.556s user 1m41.278s sys 0m1.292s tmp $ time zstd --ultra -22 < corpus.tar |wc -c 230075751 real 122m1.384s user 121m57.944s sys 0m2.642s tmp $ time xz -9e < corpus.tar |wc -c 212748984 real 78m3.870s user 78m3.144s sys 0m1.345s tmp $ tmp $ time xz -1 < corpus.tar |wc -c 290137816 real 0m50.878s user 3m21.569s sys 0m1.083s tmp $ time zstd -1 < corpus.tar |wc -c 317234226 real 0m8.282s user 0m8.812s sys 0m1.162s tmp $ time gzip -1 < corpus.tar |wc -c 400956710 real 0m23.496s user 0m22.682s sys 0m0.721s tmp $ time gzip -9 < corpus.tar |wc -c 325547190 real 1m55.453s user 1m53.047s sys 0m0.730s tmp $ time bzip2 -1 < corpus.tar |wc -c 278217711 real 4m16.753s user 4m15.654s sys 0m1.376s tmp $ time bzip2 -9 < corpus.tar |wc -c 262857414 real 5m27.726s user 5m26.718s sys 0m1.157s tmp $ time lz4 -1 < corpus.tar |wc -c 549838913 real 0m2.212s user 0m5.762s sys 0m0.832s
~~~~~~~~