Home > mailing lists

fast-archiver tool, useful for pgsql DB backups - Mailing list pgsql-general

From	Mathieu Fenniak
Subject	fast-archiver tool, useful for pgsql DB backups
Date	August 24, 2012 21:48:29
Msg-id	CAHoiPjw9GDk3ab==zPScZnSyWfvZWRQZab4-kaF8tsLW3WvDyg@mail.gmail.com Whole thread Raw
Responses	Re: fast-archiver tool, useful for pgsql DB backups
List	pgsql-general

Tree view

Hi pgsql-general,

Has anyone else ever noticed how slow it can be to rsync or tar a pgdata directory with hundreds of thousands or millions of files? I thought this could be done faster with a bit of concurrency, so I wrote a little tool called fast-archiver to do so. My employer (Replicon) has allowed me to release this tool under an open source license, so I wanted to share it with everyone.

fast-archiver is written in Go, and makes uses of Go's awesome concurrency capabilities to read and write files in parallel. When you've got lots of small files, this makes a big throughput improvement.

For a 90GB PostgreSQL database with over 2,000,000 data files, fast-archiver can create an archive in 27 minutes, as compared to tar in 1hr 23 min.

Piped over an ssh connection, fast-archiver can transfer and write the same dataset on a gigabit network in 1hr 20min, as compared to rsync taking 3hrs for the same transfer.

fast-archiver is available at GitHub: https://github.com/replicon/fast-archiver

I hope this is useful to others. :-)

Mathieu

$ time fast-archiver -c -o /dev/null /db/data

skipping symbolic link /db/data/pg_xlog

1008.92user 663.00system 27:38.27elapsed 100%CPU (0avgtext+0avgdata 24352maxresident)k

0inputs+0outputs (0major+1732minor)pagefaults 0swaps

$ time tar -cf - /db/data | cat > /dev/null

tar: Removing leading `/' from member names

tar: /db/data/base/16408/12445.2: file changed as we read it

tar: /db/data/base/16408/12464: file changed as we read it

32.68user 375.19system 1:23:23elapsed 8%CPU (0avgtext+0avgdata 81744maxresident)k

0inputs+0outputs (0major+5163minor)pagefaults 0swaps

pgsql-general by date:

From: "John D. West"
Date: 24 August 2012, 21:47:04
Subject: Re: run function on server restart

From: Rob Sargent
Date: 24 August 2012, 21:52:38
Subject: Re: run function on server restart

fast-archiver tool, useful for pgsql DB backups - Mailing list pgsql-general

Previous

Next