Using base backup exclusion filters to reduce data transferred withpg_rewind - Mailing list pgsql-hackers

From Michael Paquier
Subject Using base backup exclusion filters to reduce data transferred withpg_rewind
Date
Msg-id 20180205071022.GA17337@paquier.xyz
Whole thread Raw
Responses Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
List pgsql-hackers
Hi all,

Many threads have touched $subject:
https://www.postgresql.org/message-id/CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com
https://www.postgresql.org/message-id/7c50423.5ad0.15e8b308b2f.Coremail.chjischj@163.com
https://www.postgresql.org/message-id/1516736993.5599.4.camel@cybertec.at

Thread [2] is a bit different as it discusses with WAL segments data
which is useless at the end, still it aimed at reducing the amount of
data transferred during a rewind.  I am not tackling this problem in the
patch set of this thread.  This can be discussed separately.

Attached is a patch set which implements what I have mentioned a couple
of times in those threads by having pg_rewind reuse the same exclusion
filtering rules as for base backups, as after a rewind a node enters in
recovery and a bunch of data which is copied during the rewind finishes
in the void.  There has been as well many complains about the need to
remove all the time replication slot data manually after a rewind, so I
believe that this makes the user experience much easier with the tool.
Something useful is replication slot data getting filtered out.

In order to reach this point, I have been hacking the backend code and
finished with quite a bit of shuffling around how system paths are
hardcoded in many places, like pg_replslot, pg_wal, etc.  Well you can
think here about all the paths hardcoded in initdb.c.  So I have
introduced a couple of things:
- src/include/pg_paths.h, a new header which gathers the set of system
file and directory names.  With this facility in place, the backend code
loses knowledge of hardcoded system-related paths, including things like
"base", "global", "pg_tblspc", "base/pg_control", etc.  A fun
consequence of that refactoring is that it is possible to just change
pg_paths.h and have change the system paths of a PostgreSQL instance
with one-liners.  For example you could change "base/" to "foo/".  This
can make PostgreSQL more malleable for forks.  It would be more simple
to just hardcode more the paths but I think that this would not be
manageable in the long-term, especially if similar logics spread more.
- src/include/replication/basebackup_paths.h, which extracts the exclude
rules now in basebackup.c into a header which can be consumed by both
frontends and backends.  This is useful for any backup tools.
- pg_rewind update to ensure that the filters are working correctly.

So the patch set attached is made of the following:
- 0001, which refactors all hardcoded system paths into pg_paths.h.
This modifies only initdb.c and basebackup.c to ease reviews.
- 0002 spreads the path changes and the use of pg_paths.h across the
core code.
- 0003 moves the last set of definitions with backup_label,
tablespace_map and pg_internal.init.
- 0004 creates basebackup_paths.h, this can be consumed by pg_rewind.
- 0005 makes the changes for pg_rewind.

0001~0003 can be merged together, I have just done a split to ease
reviews.

I am adding that to the next CF.
Thanks,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Lætitia Avrot
Date:
Subject: Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis
Next
From: Michael Paquier
Date:
Subject: Typo with pg_multixact/offset in multixact.c