Speeding up compressing with parallel computing

Creating backups and compressing files are always a time-consuming task, for example, to create the daily backup on the kinja related databases were taken about 6,5 hours every day. The first part of creating the backup itself is about 40 minutes long task – that’s the time of the run of innobackupex, and applying the changed logfiles to db. (I’ll write about this later!) The second part of the backup is to compress files before they will be copied to the storage server…and this step took about 5,5 hours. I have written this part of the backup script with the old-fashioned compression utility – gzip (and tar of course!)

# Normally this is the way, how you compress a tarfile on the fly:
tar -czf backup.tar.gz /path/to/backupdir

This is totally awesome in the most of the times, but you have to know one thing: the gzip only uses one processor core during the operation, so if you have a beefy hardware, you can’t even scratch the total throughput what your machine could do.

So, the solution is to parallelize all the operation.

Here is a good comparison of parallel compressing software.

I’ve decided to use pigz, so I’ve modified the regarding part of the backup script like this:

tar -c --use-compress-program=pigz -f backup.tar.gz /path/to/backupdir

The result could be read from this article’s head image: the compression itself completes in a half of an hour (this is 11 times faster than before!)

So my rebuild part of the developer’s database could complete before the devs starting to use it. (hm… I think this will be a new post either.)