Friday, December 11, 2015

For the love of bits, stop using gzip!

Everytime you download a tar.gz, Zod kills a kitten. Everytime you generate a tar.gz, well, let's keep this family-safe.

In 2015, there should be very few reasons to generate .tar.gz files. We might as well just use .zip for all the progress we have made since 1992 when gzip was initially released. xz has been a thing since 2009, yet I still see very little adoption of the format. Today, I downloaded Meteor and noticed that it downloads a .tar.gz. I manually downloaded the file and then recompressed it using xz:

$ du -h meteor-bootstrap-os.osx.x86_64.tar.*
139M meteor-bootstrap-os.osx.x86_64.tar.gz
67M meteor-bootstrap-os.osx.x86_64.tar.xz

Seriously, less than half the size! Maybe it's the amount of time is takes to compress? Let's see:

$ cat meteor-bootstrap-os.osx.x86_64.tar | time xz -9 -c > /dev/null xz -9 -c > /dev/null 165.21s user 1.14s system 99% cpu 2:47.70 total

$ cat meteor-bootstrap-os.osx.x86_64.tar | time gzip -9 -c > /dev/null
gzip -9 -c > /dev/null 35.03s user 0.23s system 99% cpu 35.583 total

Ok, so compressing takes longer. You have to do it once. It's still on the order of reasonable for something that compiles a 600MB tarball in the first place. What about decompressing?

$ time xz -d -c meteor-bootstrap-os.osx.x86_64.tar.xz > /dev/null
4.25s user 0.08s system 99% cpu 4.327 total

$ time gzip -d -c meteor-bootstrap-os.osx.x86_64.tar.gz > /dev/null
1.35s user 0.04s system 99% cpu 1.389 total


... and decompressing takes a little longer. But, wait a second, how long does it take to download the file in the first place? I'm on a decent connection and the file is being hosted on something that delivers the content at an average of (say) 1.5 MB/s. That's 88 seconds for the .tar.gz and 42 seconds for the tar.xz. Since the content is streamed directly to tar (a la: curl ... | tar -xf - ), we actually don't see a time slowdown because xz is slower, we see an overall speedup because the slowest operation is getting the bits in the first place!

What about tooling?

OSX: tar -xf some.tar.xz (WORKS!)
Linux: tar -xf some.tar.xz (WORKS!)
Windows: ? (No idea, I haven't touched the platform in a while... should WORK!)

Why am I picking on Meteor? Well, they place the tagline of "Build apps that are a delight to use, faster than you ever thought possible" right on their homepage. I just ran their install incantation and timed it:

./install.sh 2.32s user 7.67s system 14% cpu 1:10.53 total

70 seconds! Nice job! I must have downloaded it slightly faster than in my initial testing. It also means that the install is extremely limited by download speeds. So ... I can easily imagine this being twice as fast. All that needs to be done is change the compression format and I should be able to install this in 33 seconds!

So, who *does* use xz? kernel.org. Also, the linux kernel itself optionally supports xz compression of initrd images. Vendors just need to pay attention and turn the flags on. Anyone else want to be part of the elite field of people who use xz? Please?