Unix Data Compression Shootout

I wanted to try a new-to-me compressor, lz4, but it turned into a full ADHD-fueled file compression shoot-out:

Dang, lz4 is crazy fast!

Data/setup

The corpus is a 2.29 GiB uncompressed tar file consisting of several years worth of GPS data in various plain-text formats.

The computer is a Thinkpad x260 with the CPU governor set to performance. The CPU is an Intel i5-6200U

Outcome

Chart: (grouped by compressor) command/compressor time (user) size ratio none/cat 0.077 2462955520 gzip 57.283 338289587 7.28 gzip -1 22.682 400956710 6.14 gzip -9 113.047 325547190 7.57 bzip2 319.847 262857414 9.37 bzip2 -1 255.654 278217711 8.85 bzip2 -9 326.718 262857414 9.37 bzip3 205.822 231173201 10.65 zstd 12.520 321229917 7.67 zstd -1 8.812 317234226 7.76 zstd -9 63.019 282940675 8.70 zstd -11 101.278 281894351 8.74 zstd --ultra -22 7317.944 230075751 10.70 xz 1476.153 228082956 10.80 xz -1 201.569 290137816 8.49 xz -9e 4683.144 212748984 11.58 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 lz4 -9 74.670 434543206 5.67

Sorted by size: (descending) command/compressor time (user) size ratio none/cat 0.077 2462955520 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 lz4 -9 74.670 434543206 5.67 gzip -1 22.682 400956710 6.14 gzip 57.283 338289587 7.28 gzip -9 113.047 325547190 7.57 zstd 12.520 321229917 7.67 zstd -1 8.812 317234226 7.76 xz -1 201.569 290137816 8.49 zstd -9 63.019 282940675 8.70 zstd -11 101.278 281894351 8.74 bzip2 -1 255.654 278217711 8.85 bzip2 319.847 262857414 9.37 bzip2 -9 326.718 262857414 9.37 bzip3 205.822 231173201 10.65 zstd --ultra -22 7317.944 230075751 10.70 xz 1476.153 228082956 10.80 xz -9e 4683.144 212748984 11.58

Sorted by time: (ascending) command/compressor time (user) size ratio none/cat 0.077 2462955520 lz4 5.744 549838913 4.48 lz4 -1 5.762 549838913 4.48 zstd -1 8.812 317234226 7.76 zstd 12.520 321229917 7.67 gzip -1 22.682 400956710 6.14 gzip 57.283 338289587 7.28 zstd -9 63.019 282940675 8.70 lz4 -9 74.670 434543206 5.67 zstd -11 101.278 281894351 8.74 gzip -9 113.047 325547190 7.57 xz -1 201.569 290137816 8.49 bzip3 205.822 231173201 10.65 bzip2 -1 255.654 278217711 8.85 bzip2 319.847 262857414 9.37 bzip2 -9 326.718 262857414 9.37 xz 1476.153 228082956 10.80 xz -9e 4683.144 212748984 11.58 zstd --ultra -22 7317.944 230075751 10.70

Chart: (compression ratio / time score) command/compressor time (user) size ratio ratio/time zstd --ultra -22 7317.944 230075751 10.70 0.0015 xz -9e 4683.144 212748984 11.58 0.0025 xz 1476.153 228082956 10.80 0.0073 bzip2 -9 326.718 262857414 9.37 0.0287 bzip2 319.847 262857414 9.37 0.0293 bzip2 -1 255.654 278217711 8.85 0.0346 xz -1 201.569 290137816 8.49 0.0421 bzip3 205.822 231173201 10.65 0.0518 gzip -9 113.047 325547190 7.57 0.0669 lz4 -9 74.67 434543206 5.67 0.0759 zstd -11 101.278 281894351 8.74 0.0863 gzip 57.283 338289587 7.28 0.1271 zstd -9 63.019 282940675 8.70 0.1381 gzip -1 22.682 400956710 6.14 0.2708 zstd 12.52 321229917 7.67 0.6124 lz4 -1 5.762 549838913 4.48 0.7774 lz4 5.744 549838913 4.48 0.7798 zstd -1 8.812 317234226 7.76 0.8811 none/cat 0.077 2462955520 1.00 12.9870 (nonsensical)

Conclusion

lz4 is the fastest compressor… but zstd -1 still kicks butt While it doesn’t score well in the overalls core, bzip3 still provides excellent compression in a reasonable amount of time.

Raw output

tmp $ lscpu |grep i5
Model name:                           Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
tmp $ time cat < corpus.tar |wc -c
2462955520

real    0m1.388s
user    0m0.077s
sys 0m1.497s
tmp $ time gzip < corpus.tar |wc -c
338289587

real    0m57.971s
user    0m57.283s
sys 0m0.633s
tmp $ time bzip2 < corpus.tar |wc -c
262857414

real    5m21.280s
user    5m19.847s
sys 0m1.192s
tmp $ time bzip3 < corpus.tar |wc -c
231173201

real    3m26.608s
user    3m25.822s
sys 0m0.712s
tmp $ time zstd < corpus.tar |wc -c
321229917

real    0m11.717s
user    0m12.520s
sys 0m1.278s
tmp $ time xz < corpus.tar |wc -c
228082956

real    6m15.579s
user    24m36.153s
sys 0m1.481s
tmp $ time lz4 < corpus.tar |wc -c
549838913

real    0m2.190s
user    0m5.744s
sys 0m0.833s
tmp $ time lz4 -9 < corpus.tar |wc -c
434543206

real    0m25.151s
user    1m14.670s
sys 0m0.869s
tmp $ time zstd -9 < corpus.tar |wc -c
282940675

real    1m2.564s
user    1m3.019s
sys 0m1.351s
tmp $ time zstd -11 < corpus.tar |wc -c
281894351

real    1m40.556s
user    1m41.278s
sys 0m1.292s
tmp $ time zstd --ultra -22 < corpus.tar |wc -c
230075751

real    122m1.384s
user    121m57.944s
sys 0m2.642s
tmp $ time xz -9e < corpus.tar |wc -c
212748984

real    78m3.870s
user    78m3.144s
sys 0m1.345s
tmp $ 
tmp $ time xz -1 < corpus.tar |wc -c
290137816

real    0m50.878s
user    3m21.569s
sys 0m1.083s
tmp $ time zstd -1 < corpus.tar |wc -c
317234226

real    0m8.282s
user    0m8.812s
sys 0m1.162s
tmp $ time gzip -1 < corpus.tar |wc -c
400956710

real    0m23.496s
user    0m22.682s
sys 0m0.721s
tmp $ time gzip -9 < corpus.tar |wc -c
325547190

real    1m55.453s
user    1m53.047s
sys 0m0.730s
tmp $ time bzip2 -1 < corpus.tar |wc -c
278217711

real    4m16.753s
user    4m15.654s
sys 0m1.376s
tmp $ time bzip2 -9 < corpus.tar |wc -c
262857414

real    5m27.726s
user    5m26.718s
sys 0m1.157s
tmp $ time lz4 -1 < corpus.tar |wc -c
549838913

real    0m2.212s
user    0m5.762s
sys 0m0.832s

~~~~~~~~

# 100 Days to Offload 2025 - Day 32