Re: Implementing incremental backup - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Implementing incremental backup
Date
Msg-id CA+CSw_swVaWb7-tvSh70sdE2564VZZbeoPrD9hBN4K09EvBOwg@mail.gmail.com
Whole thread Raw
In response to Implementing incremental backup  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
On Wed, Jun 19, 2013 at 1:13 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> I'm thinking of implementing an incremental backup tool for
> PostgreSQL. The use case for the tool would be taking a backup of huge
> database. For that size of database, pg_dump is too slow, even WAL
> archive is too slow/ineffective as well. However even in a TB
> database, sometimes actual modified blocks are not that big, may be
> even several GB. So if we can backup those modified blocks only,
> that would be an effective incremental backup method.

PostgreSQL definitely needs better tools to cope with TB scale
databases. Especially when the ideas that get rid of anti-wraparound
vacuums materialize and make huge databases more practical.

> For now, my idea is pretty vague.
>
> - Record info about modified blocks. We don't need to remember the
>   whole history of a block if the block was modified multiple times.
>   We just remember that the block was modified since the last
>   incremental backup was taken.
>
> - The info could be obtained by trapping calls to mdwrite() etc. We need
>   to be careful to avoid such blocks used in xlogs and temporary
>   tables to not waste resource.

Unless I'm missing something, the information about modified blocks
can also be obtained by reading WAL, not requiring any modifications
to core.

> - If many blocks were modified in a file, we may be able to condense
>   the info as "the whole file was modified" to reduce the amount of
>   info.

You could keep a list of block ranges modified and when the list gets
too large, merge ranges that are close together.

> - How to take a consistent incremental backup is an issue. I can't
>   think of a clean way other than "locking whole cluster", which is
>   obviously unacceptable. Maybe we should give up "hot backup"?

I don't see why regular pg_start_backup(), copy out modified blocks,
pg_stop_backup(), copy WAL needed to recover approach wouldn't work
here.

A good feature of the tool would be to apply the incremental backup to
the previous backup while copying out old blocks so you could have the
latest full backup available and incremental changes to rewind it to
the previous version.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Implementing incremental backup
Next
From: Kevin Grittner
Date:
Subject: Re: Git-master regression failure