Re: block-level incremental backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: block-level incremental backup
Date
Msg-id CA+Tgmob6qXibHdW4h0=QWgfiStgCHmVrWfczYCwoMu14RO9e5A@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: block-level incremental backup
List pgsql-hackers
On Wed, Apr 10, 2019 at 10:22 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> Some times ago I have implemented alternative version of ptrack utility
> (not one used in pg_probackup)
> which detects updated block at file level. It is very simple and may be
> it can be sometimes integrated in master.

I don't think this is completely crash-safe.  It looks like it
arranges to msync() the ptrack file at appropriate times (although I
haven't exhaustively verified the logic), but it uses MS_ASYNC, so
it's possible that the ptrack file could get updated on disk either
before or after the relation file itself.  I think before is probably
OK -- it just risks having some blocks look modified when they aren't
really -- but after seems like it is very much not OK.  And changing
this to use MS_SYNC would probably be really expensive.  Likely a
better approach would be to hook into the new fsync queue machinery
that Thomas Munro added to PostgreSQL 12.

It looks like your system maps all the blocks in the system into a
fixed-size map using hashing.  If the number of modified blocks
between the full backup and the incremental backup is large compared
to the size of the ptrack map, you'll start to get a lot of
false-positives.  It will look as if much of the database needs to be
backed up.  For example, in your sample configuration, you have
ptrack_map_size = 1000003. If you've got a 100GB database with 20%
daily turnover, that's about 2.6 million blocks.  If you set bump a
random entry ~2.6 million times in a map with 1000003 entries, on the
average ~92% of the entries end up getting bumped, so you will get
very little benefit from incremental backup.  This problem drops off
pretty fast if you raise the size of the map, but it's pretty critical
that your map is large enough for the database you've got, or you may
as well not bother.

It also appears that your system can't really handle resizing of the
map in any friendly way.  So if your data size grows, you may be faced
with either letting the map become progressively less effective, or
throwing it out and losing all the data you have.

None of that is to say that what you're presenting here has no value,
but I think it's possible to do better (and I think we should try).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: Cleanup/remove/update references to OID column
Next
From: Ashwin Agrawal
Date:
Subject: Re: block-level incremental backup