Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog
Date
Msg-id CABUevEwCE=R1rjLYcwJMUoz=EnBFVhT-1TXervFpe8K09s8GdA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog
Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog
Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog
List pgsql-hackers

On Thu, Feb 23, 2017 at 7:36 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
Hi all,

When storing WAL segments on a dedicated partition with
pg_receivexlog, for some deployments, the removal of past WAL segments
depends on the frequency of base backups happening on the server. In
short, once a new base backup is taken, it may not be necessary to
keep around those past WAL segments. Of course a lot of things in this
area depend on the retention policy of the deployments. Still, if a
base backup kicks in rather late, there is a risk to have
pg_receivexlog fail because a full disk in the middle of a segment.
Using a replication slot, this would cause Postgres to get down
because of a bloated pg_wal. I am not talking about such cases here :)

On some other types of deployments I work on, one or more standbys are
waiting behind to take over the cluster in case of a failure of the
primary. In such cases, cold backups are still taken and saved
separately. Those are self-contained and can be restored independently
on the rest to put the cluster back on track using a past state.
Archiving using pg_receivexlog still happens to allow the standbys to
catch up if they get offline for a long period of time, something that
may be caused by an outage or simply by the cloning of a new VM that
can take minutes or dozens of minutes to finish deployment. This new
VM can be as well an atomic copy of the primary ready to be used as a
standby. As the primary server may have already recycled the oldest
wal segments in its pg_wal after two checkpoints, archiving plays an
important role in being sure that things can replay successfully. In
short what matters is that the VM cloning does not take longer than 2
checkpoints, but there is no guarantee that the cloning would finish
on time.

In short for such deployments, and contrary to the type of the first
paragraph, we don't care actually care about the past WAL segments, we
do care more about the newest ones (well, mainly about segments older
than the last 2 checkpoints to be correct still having the newest
segments at hand makes replay faster with restore_command with a local
archive). In such cases, I have found useful the possibility to
automatically remove the past WAL segments from the archive partition
if it gets full and allow archiving to move on to the latest data even
if a new base backup has not kicked in to make the past segments
useless.

The idea is really simple, in order to keep the newest WAL history as
large as possible (it is useful for debuggability purposes to exploit
as much as possible the archive partition), we look at the amount free
space available on the partition of the archives when switching to the
next segment, and simply remove as much data as needed to save space
worth one complete segment. Note that compressed WAL segments this may
be several segments removed at once as there is no way to be sure how
much compression will save. The central point of this reasoning is
really to do the decision-making within pg_receivexlog itself as it is
the only point where we know that a new segment is created. So this
makes the cleanup logic independent on the load of Postgres itself.

On any non-Windows systems, statvfs() would be enough to get the
amount of free space available on a partition and it is
posix-compliant. For Windows, there is GetDiskFreeSpace() available.

Is there any interest for a feature like that? I have a non-polished
patch at hand but I can work on that for the upcoming CF if there are
voices in favor of such a feature. The feature could be simply
activated with a dedicated switch, like --clean-oldest-wal,
--clean-tail-wal, or something like that.

I'm not sure this logic belongs in pg_receivexlog. If we put the decision making there, then we lock ourselves into one "type of policy".

Wouldn't this one, along with some other scenarios, be better provided by the "run command at end of segment" function that we've talked about before? And then that external command could implement whatever aging logic would be appropriate for the environment?


--

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] Documentation improvements for partitioning
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] Documentation improvements for partitioning