Thread: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

[HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

23 February 2017, 09:36:57

Hi all,

When storing WAL segments on a dedicated partition with
pg_receivexlog, for some deployments, the removal of past WAL segments
depends on the frequency of base backups happening on the server. In
short, once a new base backup is taken, it may not be necessary to
keep around those past WAL segments. Of course a lot of things in this
area depend on the retention policy of the deployments. Still, if a
base backup kicks in rather late, there is a risk to have
pg_receivexlog fail because a full disk in the middle of a segment.
Using a replication slot, this would cause Postgres to get down
because of a bloated pg_wal. I am not talking about such cases here :)

On some other types of deployments I work on, one or more standbys are
waiting behind to take over the cluster in case of a failure of the
primary. In such cases, cold backups are still taken and saved
separately. Those are self-contained and can be restored independently
on the rest to put the cluster back on track using a past state.
Archiving using pg_receivexlog still happens to allow the standbys to
catch up if they get offline for a long period of time, something that
may be caused by an outage or simply by the cloning of a new VM that
can take minutes or dozens of minutes to finish deployment. This new
VM can be as well an atomic copy of the primary ready to be used as a
standby. As the primary server may have already recycled the oldest
wal segments in its pg_wal after two checkpoints, archiving plays an
important role in being sure that things can replay successfully. In
short what matters is that the VM cloning does not take longer than 2
checkpoints, but there is no guarantee that the cloning would finish
on time.

In short for such deployments, and contrary to the type of the first
paragraph, we don't care actually care about the past WAL segments, we
do care more about the newest ones (well, mainly about segments older
than the last 2 checkpoints to be correct still having the newest
segments at hand makes replay faster with restore_command with a local
archive). In such cases, I have found useful the possibility to
automatically remove the past WAL segments from the archive partition
if it gets full and allow archiving to move on to the latest data even
if a new base backup has not kicked in to make the past segments
useless.

The idea is really simple, in order to keep the newest WAL history as
large as possible (it is useful for debuggability purposes to exploit
as much as possible the archive partition), we look at the amount free
space available on the partition of the archives when switching to the
next segment, and simply remove as much data as needed to save space
worth one complete segment. Note that compressed WAL segments this may
be several segments removed at once as there is no way to be sure how
much compression will save. The central point of this reasoning is
really to do the decision-making within pg_receivexlog itself as it is
the only point where we know that a new segment is created. So this
makes the cleanup logic independent on the load of Postgres itself.

On any non-Windows systems, statvfs() would be enough to get the
amount of free space available on a partition and it is
posix-compliant. For Windows, there is GetDiskFreeSpace() available.

Is there any interest for a feature like that? I have a non-polished
patch at hand but I can work on that for the upcoming CF if there are
voices in favor of such a feature. The feature could be simply
activated with a dedicated switch, like --clean-oldest-wal,
--clean-tail-wal, or something like that.

Thanks,
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Stephen Frost

Date:

23 February 2017, 16:54:03

Micahel,

* Michael Paquier (michael.paquier@gmail.com) wrote:
> Is there any interest for a feature like that? I have a non-polished
> patch at hand but I can work on that for the upcoming CF if there are
> voices in favor of such a feature. The feature could be simply
> activated with a dedicated switch, like --clean-oldest-wal,
> --clean-tail-wal, or something like that.

This sounds interesting, though I wouldn't base it on the amount of free
space on the partition but rather some user-set value (eg:
--max-archive-size=50GB or something).

I am a bit dubious about it in general though.  WAL that you don't have
a base backup for or a replica which needs it is really of very limited
value.  I understand your suggestion that it could be used for
'debugging', but that really seems like a stretch to me.  I would also
be concerned that people would set up their systems using this without
fully understanding it, or being prepared to handle what happens when it
kicks in and starts removing WAL that maybe they should have kept for a
base backup or a replica.  At least if we start failing when the
partition is full then they have alerts telling them that the partition
is full and they have a base backup and WAL to bring it forward to
almost current.

Thanks!

Stephen

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

23 February 2017, 19:10:39

On Thu, Feb 23, 2017 at 7:36 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

Hi all,

When storing WAL segments on a dedicated partition with
pg_receivexlog, for some deployments, the removal of past WAL segments
depends on the frequency of base backups happening on the server. In
short, once a new base backup is taken, it may not be necessary to
keep around those past WAL segments. Of course a lot of things in this
area depend on the retention policy of the deployments. Still, if a
base backup kicks in rather late, there is a risk to have
pg_receivexlog fail because a full disk in the middle of a segment.
Using a replication slot, this would cause Postgres to get down
because of a bloated pg_wal. I am not talking about such cases here :)

On some other types of deployments I work on, one or more standbys are
waiting behind to take over the cluster in case of a failure of the
primary. In such cases, cold backups are still taken and saved
separately. Those are self-contained and can be restored independently
on the rest to put the cluster back on track using a past state.
Archiving using pg_receivexlog still happens to allow the standbys to
catch up if they get offline for a long period of time, something that
may be caused by an outage or simply by the cloning of a new VM that
can take minutes or dozens of minutes to finish deployment. This new
VM can be as well an atomic copy of the primary ready to be used as a
standby. As the primary server may have already recycled the oldest
wal segments in its pg_wal after two checkpoints, archiving plays an
important role in being sure that things can replay successfully. In
short what matters is that the VM cloning does not take longer than 2
checkpoints, but there is no guarantee that the cloning would finish
on time.

In short for such deployments, and contrary to the type of the first
paragraph, we don't care actually care about the past WAL segments, we
do care more about the newest ones (well, mainly about segments older
than the last 2 checkpoints to be correct still having the newest
segments at hand makes replay faster with restore_command with a local
archive). In such cases, I have found useful the possibility to
automatically remove the past WAL segments from the archive partition
if it gets full and allow archiving to move on to the latest data even
if a new base backup has not kicked in to make the past segments
useless.

The idea is really simple, in order to keep the newest WAL history as
large as possible (it is useful for debuggability purposes to exploit
as much as possible the archive partition), we look at the amount free
space available on the partition of the archives when switching to the
next segment, and simply remove as much data as needed to save space
worth one complete segment. Note that compressed WAL segments this may
be several segments removed at once as there is no way to be sure how
much compression will save. The central point of this reasoning is
really to do the decision-making within pg_receivexlog itself as it is
the only point where we know that a new segment is created. So this
makes the cleanup logic independent on the load of Postgres itself.

On any non-Windows systems, statvfs() would be enough to get the
amount of free space available on a partition and it is
posix-compliant. For Windows, there is GetDiskFreeSpace() available.

Is there any interest for a feature like that? I have a non-polished
patch at hand but I can work on that for the upcoming CF if there are
voices in favor of such a feature. The feature could be simply
activated with a dedicated switch, like --clean-oldest-wal,
--clean-tail-wal, or something like that.

I'm not sure this logic belongs in pg_receivexlog. If we put the decision making there, then we lock ourselves into one "type of policy".

Wouldn't this one, along with some other scenarios, be better provided by the "run command at end of segment" function that we've talked about before? And then that external command could implement whatever aging logic would be appropriate for the environment?

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Jim Nasby

Date:

23 February 2017, 23:37:41

On 2/23/17 10:10 AM, Magnus Hagander wrote:
> Wouldn't this one, along with some other scenarios, be better provided
> by the "run command at end of segment" function that we've talked about
> before? And then that external command could implement whatever aging
> logic would be appropriate for the environment?

That kind of API lead to difficulties with archiving direct from the 
database, so I'm not sure it's the best way to go.

ISTM what's really needed is a good way for users to handle retention 
for both WAL as well as base backups. A tool that did that would need to 
understand what WAL is required to safely restore a base backup. It 
should be possible for users to have a separate retention policy for 
just base backups as well as backups that support full PITR. You'd also 
need an easy way to deal with date ranges (so you can do things like 
"delete all backups more than 1 year old").

Perhaps a good starting point would be a tool that lets you list what 
base backups you have, what WAL those backups need, when the backups 
were taken, etc.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

24 February 2017, 03:29:54

On Thu, Feb 23, 2017 at 10:54 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Micahel,
>
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> Is there any interest for a feature like that? I have a non-polished
>> patch at hand but I can work on that for the upcoming CF if there are
>> voices in favor of such a feature. The feature could be simply
>> activated with a dedicated switch, like --clean-oldest-wal,
>> --clean-tail-wal, or something like that.
>
> This sounds interesting, though I wouldn't base it on the amount of free
> space on the partition but rather some user-set value (eg:
> --max-archive-size=50GB or something).

Doable. This part is not that hard to use for pg_receivexlog kicked as
a service if the parameter passed is an environment variable.

> I am a bit dubious about it in general though.  WAL that you don't have
> a base backup for or a replica which needs it is really of very limited
> value.  I understand your suggestion that it could be used for
> 'debugging', but that really seems like a stretch to me.

Putting your hands on what happens in the database at page level
helps. I have used that once to look at page-level data to see that
the page actually got corrupted by an incorrect failover flow (page
got flushed and this node was incorrectly reused as a standby
afterwards).

> I would also
> be concerned that people would set up their systems using this without
> fully understanding it, or being prepared to handle what happens when it
> kicks in and starts removing WAL that maybe they should have kept for a
> base backup or a replica. At least if we start failing when the
> partition is full then they have alerts telling them that the partition
> is full and they have a base backup and WAL to bring it forward to
> almost current.

Sure. Documentation is really key here. No approaches are perfect,
each one has its own value. I am of course not suggesting to make any
of that enabled. In my case not getting a failure because of a full
partition mattered more.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

24 February 2017, 05:47:28

On Fri, Feb 24, 2017 at 5:37 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> ISTM what's really needed is a good way for users to handle retention for
> both WAL as well as base backups. A tool that did that would need to
> understand what WAL is required to safely restore a base backup. It should
> be possible for users to have a separate retention policy for just base
> backups as well as backups that support full PITR. You'd also need an easy
> way to deal with date ranges (so you can do things like "delete all backups
> more than 1 year old").

Anything else than measured in bytes either requires a lookup at the
file timestamp, which is not reliable with noatime or a lookup at WAL
itself to decide when is the commit timestamp that matches the oldest
point in time of the backup policy. That could be made performance
wise with an archive command. With pg_receivexlog you could make use
of the end-segment command to scan the completely written segment for
this data before moving on to the next one. At least it gives an
argument for having such a command. David Steele mentioned that he
could make use of such a thing.

> Perhaps a good starting point would be a tool that lets you list what base
> backups you have, what WAL those backups need, when the backups were taken,
> etc.

pg_rman? barman?
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

24 February 2017, 05:52:16

On Fri, Feb 24, 2017 at 1:10 AM, Magnus Hagander <magnus@hagander.net> wrote:
> I'm not sure this logic belongs in pg_receivexlog. If we put the decision
> making there, then we lock ourselves into one "type of policy".
>
> Wouldn't this one, along with some other scenarios, be better provided by
> the "run command at end of segment" function that we've talked about before?
> And then that external command could implement whatever aging logic would be
> appropriate for the environment?

OK, I forgot a bit about this past discussion. So let's say that we
have a command, why not also allow users to use at will a marker %f to
indicate the file name just completed? One use case here is to scan
the file for the oldest and/or newest timestamps of the segment just
finished to do some retention policy with something else in charge of
the cleanup.

The option name would be --end-segment-command? Any better ideas of names?
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Jim Nasby

Date:

24 February 2017, 05:56:03

On 2/23/17 8:47 PM, Michael Paquier wrote:
> Anything else than measured in bytes either requires a lookup at the
> file timestamp, which is not reliable with noatime or a lookup at WAL
> itself to decide when is the commit timestamp that matches the oldest
> point in time of the backup policy.

An indication that it'd be nice to have a better way to store this 
information as part of a base backup, or the archived WAL files.

> That could be made performance
> wise with an archive command. With pg_receivexlog you could make use
> of the end-segment command to scan the completely written segment for
> this data before moving on to the next one. At least it gives an
> argument for having such a command. David Steele mentioned that he
> could make use of such a thing.

BTW, I'm not opposed to an end-segment command; I'm just saying I don't 
think having it would really help users very much.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Jim Nasby

Date:

24 February 2017, 05:58:11

On 2/23/17 8:52 PM, Michael Paquier wrote:
> OK, I forgot a bit about this past discussion. So let's say that we
> have a command, why not also allow users to use at will a marker %f to
> indicate the file name just completed? One use case here is to scan
> the file for the oldest and/or newest timestamps of the segment just
> finished to do some retention policy with something else in charge of
> the cleanup.

Why not provide % replacements that contain that info? pg_receivexlog 
has a much better shot at doing that correctly than some random user tool...

(%f could certainly be useful for other things)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

24 February 2017, 06:01:12

On Fri, Feb 24, 2017 at 11:56 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/23/17 8:47 PM, Michael Paquier wrote:
>>
>> Anything else than measured in bytes either requires a lookup at the
>> file timestamp, which is not reliable with noatime or a lookup at WAL
>> itself to decide when is the commit timestamp that matches the oldest
>> point in time of the backup policy.
>
> An indication that it'd be nice to have a better way to store this
> information as part of a base backup, or the archived WAL files.

An idea here would be to add in the long header of the segment a
timestamp of when it was created. This is inherent to only the server
generating the WAL.

>> That could be made performance
>> wise with an archive command. With pg_receivexlog you could make use
>> of the end-segment command to scan the completely written segment for
>> this data before moving on to the next one. At least it gives an
>> argument for having such a command. David Steele mentioned that he
>> could make use of such a thing.
>
> BTW, I'm not opposed to an end-segment command; I'm just saying I don't
> think having it would really help users very much.

Thanks. Yes that's hard to come up here with something that would
satisfy enough users without giving much maintenance penalty.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Jim Nasby

Date:

24 February 2017, 06:12:57

On 2/23/17 9:01 PM, Michael Paquier wrote:
> An idea here would be to add in the long header of the segment a
> timestamp of when it was created. This is inherent to only the server
> generating the WAL.

ISTM it'd be reasonable (maybe even wise) for WAL files to contain info 
about the first and last LSN, commit xid, timestamps, etc.

>>> That could be made performance
>>> wise with an archive command. With pg_receivexlog you could make use
>>> of the end-segment command to scan the completely written segment for
>>> this data before moving on to the next one. At least it gives an
>>> argument for having such a command. David Steele mentioned that he
>>> could make use of such a thing.
>> BTW, I'm not opposed to an end-segment command; I'm just saying I don't
>> think having it would really help users very much.
> Thanks. Yes that's hard to come up here with something that would
> satisfy enough users without giving much maintenance penalty.

Yeah, I think it'd be a decent (though hopefully not huge) amount of work.

As I see it, we got away for years with no replication, but eventually 
realized that we were really leaving a hole in our capabilities by not 
having built-in binary rep. I think we're nearing a similar point with 
handling PITR backups. People have written some great tools to help with 
this, but at some point (PG 11? 13?) there should probably be some 
strong included tools.

I suspect that a huge improvement on the internal tools could be had for 
1/2 or less the effort that's been spent on all the external ones. Of 
course, much of that is because the external tools have helped prove out 
what works and what doesn't.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

25 February 2017, 16:31:12

On Fri, Feb 24, 2017 at 3:56 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 2/23/17 8:47 PM, Michael Paquier wrote:
Anything else than measured in bytes either requires a lookup at the
file timestamp, which is not reliable with noatime or a lookup at WAL
itself to decide when is the commit timestamp that matches the oldest
point in time of the backup policy.

An indication that it'd be nice to have a better way to store this information as part of a base backup, or the archived WAL files.

That could be made performance
wise with an archive command. With pg_receivexlog you could make use
of the end-segment command to scan the completely written segment for
this data before moving on to the next one. At least it gives an
argument for having such a command. David Steele mentioned that he
could make use of such a thing.

BTW, I'm not opposed to an end-segment command; I'm just saying I don't think having it would really help users very much.

It might not help end users directly, but it could certainly help tools-developers. At least that's what I'd think.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

25 February 2017, 16:32:57

On Fri, Feb 24, 2017 at 3:52 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Fri, Feb 24, 2017 at 1:10 AM, Magnus Hagander <magnus@hagander.net> wrote:
> I'm not sure this logic belongs in pg_receivexlog. If we put the decision
> making there, then we lock ourselves into one "type of policy".
>
> Wouldn't this one, along with some other scenarios, be better provided by
> the "run command at end of segment" function that we've talked about before?
> And then that external command could implement whatever aging logic would be
> appropriate for the environment?

OK, I forgot a bit about this past discussion. So let's say that we
have a command, why not also allow users to use at will a marker %f to
indicate the file name just completed? One use case here is to scan
the file for the oldest and/or newest timestamps of the segment just
finished to do some retention policy with something else in charge of
the cleanup.

The option name would be --end-segment-command? Any better ideas of names?

Oh, I definitely think such a command should be able to take a placeholder like %f telling which segment it has just processed. In fact, I'd consider it one of the most important features of it :)

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

25 February 2017, 17:00:11

On Sat, Feb 25, 2017 at 10:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Oh, I definitely think such a command should be able to take a placeholder
> like %f telling which segment it has just processed. In fact, I'd consider
> it one of the most important features of it :)

I cannot think about any other meaningful variables, do you?
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

25 February 2017, 18:41:51

On Feb 25, 2017 15:00, "Michael Paquier" <michael.paquier@gmail.com> wrote:

On Sat, Feb 25, 2017 at 10:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Oh, I definitely think such a command should be able to take a placeholder
> like %f telling which segment it has just processed. In fact, I'd consider
> it one of the most important features of it :)

I cannot think about any other meaningful variables, do you?

Not offhand. But one thing that could go to the question of parameter name - what if we finish something that's not a segment. During a time line switch for example, we also get other files don't we? We probably want to trigger at least some command in that case - either with an argument or by a different parameter?

/Magnus

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

26 February 2017, 02:24:04

On Sun, Feb 26, 2017 at 12:41 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Feb 25, 2017 15:00, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>
> On Sat, Feb 25, 2017 at 10:32 PM, Magnus Hagander <magnus@hagander.net>
> wrote:
>> Oh, I definitely think such a command should be able to take a placeholder
>> like %f telling which segment it has just processed. In fact, I'd consider
>> it one of the most important features of it :)
>
> I cannot think about any other meaningful variables, do you?
>
>
> Not offhand. But one thing that could go to the question of parameter name -
> what if we finish something that's not a segment. During a time line switch
> for example, we also get other files don't we? We probably want to trigger
> at least some command in that case - either with an argument or by a
> different parameter?

To be consistent with archive_command and restore_command I'd rather
not do that. The command called can decide by itself what to do by
looking at the shape of the argument string.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Robert Haas

Date:

26 February 2017, 12:46:08

On Thu, Feb 23, 2017 at 9:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I'm not sure this logic belongs in pg_receivexlog. If we put the decision
> making there, then we lock ourselves into one "type of policy".

That's not really true.  We can add other policies - or extensibility
- later.  A more accurate statement, ISTM, would be that initially we
only support one type of policy.  But that's fine; more can be added
later.

> Wouldn't this one, along with some other scenarios, be better provided by
> the "run command at end of segment" function that we've talked about before?
> And then that external command could implement whatever aging logic would be
> appropriate for the environment?

I don't think it's bad to have that, but I don't understand the
resistance to having a policy that by default lets us keep as much WAL
as will fit within our space budget.  That seems like an eminently
sensible thing to want.  Ideally, I'd like to be able to recovery any
backup, however old, from that point forward to a point of my
choosing.  But if I run out of disk space, removing the oldest WAL
files I have is more sensible than not accepting new ones.  Sure, I'll
have less ability to go back in time, but I'm less likely to need the
data from 3 or 4 backups ago than I am to need the most recent data.
I'm only going to go back to an older backup if I can't recover from
the most recent one, or if I need some data that was removed or
corrupted some time ago.  It's good to have that ability for as long
as it is sustainable, but when I have to pick, I want the new stuff.

I think we're actually desperately in need of smarter WAL management
tools in core not just in this respect but in a whole bunch of places,
and I think size is an excellent thing for those tools to be
considering.  When Heikki implemented min_wal_size and max_wal_size
(88e982302684246e8af785e78a467ac37c76dee9, February 2015) there was
quite a bit of discussion about how nice it would be to have a HARD
limit on WAL size rather than a soft limit.  When the system gets too
close to the hard limit, processes trying to write WAL slow or stop
until a checkpoint can be completed, allowing for the removal of WAL.
Heroku also previously advocated for such a system, to replace their
ad-hoc system of SIGSTOPping backends for short periods of time (!) to
accomplish the same thing.  When replication slots were added
(858ec11858a914d4c380971985709b6d6b7dd6fc, January 2014) we talked
about how nice it would be if there were a facility to detect when a
replication slot (or combination of slots) was forcing the retention
of too much WAL and, when some threshold is exceeded, disable WAL
retention for those slots to prevent disk space exhaustion.  When
pg_archivecleanup was added (ca65f2190ae20b8bba9aa66e4cab1982b95d109f,
24bfbb5857a1e7ae227b526e64e540752c3b1fe3, June 2010) it documented
that it was really only smart enough to handle the case of an archive
for the benefit of a single standby, and it didn't do anything to help
you if that one standby got far enough behind to fill up the disk.  In
all of these cases, we're still waiting for something smarter to come
along.  This is an enormous practical problem.  "pg_xlog filled up" is
a reasonably common cause of production outages, and "archive
directory filled up" is a reasonably common cause of "pg_xlog filled
up".  I don't mind having a mode where we give the user the tools with
which to build their own solution to these problems, but we shouldn't
ignore the likelihood that many people are likely to want the same
policies, and I'd rather have those commonly-used policies
well-implemented in core than implemented at highly varying levels of
quality in individual installations.

All IMHO, of course...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

27 February 2017, 00:48:01

On Fri, Feb 24, 2017 at 11:58 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Why not provide % replacements that contain that info? pg_receivexlog has a
> much better shot at doing that correctly than some random user tool...
>
> (%f could certainly be useful for other things)

(was unfortunately sent off-list, thanks Jim!)
%f maps with archive_command and restore_command, so for consistency
this makes sense, at least to me. And I cannot believe that all users
will actually need this argument, per se my use case upthread.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

27 February 2017, 08:32:00

On Sun, Feb 26, 2017 at 8:24 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> To be consistent with archive_command and restore_command I'd rather
> not do that. The command called can decide by itself what to do by
> looking at the shape of the argument string.

Just before the CF begins, I have taken some time to build up a patch
that implements this --end-segment-command, with %f as placeholder.
The patch is registered here:
https://commitfest.postgresql.org/13/1040/
Comments are welcome.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgreceivewal-endseg-cmd.patch

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

27 February 2017, 17:50:51

Sun, Feb 26, 2017 at 12:24 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Sun, Feb 26, 2017 at 12:41 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Feb 25, 2017 15:00, "Michael Paquier" <michael.paquier@gmail.com> wrote:
>
> On Sat, Feb 25, 2017 at 10:32 PM, Magnus Hagander <magnus@hagander.net>
> wrote:
>> Oh, I definitely think such a command should be able to take a placeholder
>> like %f telling which segment it has just processed. In fact, I'd consider
>> it one of the most important features of it :)
>
> I cannot think about any other meaningful variables, do you?
>
>
> Not offhand. But one thing that could go to the question of parameter name -
> what if we finish something that's not a segment. During a time line switch
> for example, we also get other files don't we? We probably want to trigger
> at least some command in that case - either with an argument or by a
> different parameter?

To be consistent with archive_command and restore_command I'd rather
not do that. The command called can decide by itself what to do by
looking at the shape of the argument string.

Not do which one -- trigger the command at all? archive_command triggers on non-segment files does it not?

If we want to trigger it with other files as well, then it shouldn't be called --end-segment-command, should it?

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Magnus Hagander

Date:

27 February 2017, 17:59:00

On Sun, Feb 26, 2017 at 10:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 23, 2017 at 9:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I'm not sure this logic belongs in pg_receivexlog. If we put the decision
> making there, then we lock ourselves into one "type of policy".

That's not really true. We can add other policies - or extensibility
- later. A more accurate statement, ISTM, would be that initially we
only support one type of policy. But that's fine; more can be added
later.

Certainly. In that case, though, we need to plan far enough ahead to make sure the interface in the form of what parameters and such are called is prepared for that.

We also have to decide if we want pg_receivexlog to be an actual "backup manager" that way, or whether it should provide the plumbing for something else to implement it on top of. For example, it seems reasonable that for *most* usecases you want something that manages both the log archive and the base backups together, since they have dependencies. And should that tool really be pg_receivexlog?

> Wouldn't this one, along with some other scenarios, be better provided by
> the "run command at end of segment" function that we've talked about before?
> And then that external command could implement whatever aging logic would be
> appropriate for the environment?

I don't think it's bad to have that, but I don't understand the
resistance to having a policy that by default lets us keep as much WAL
as will fit within our space budget. That seems like an eminently
sensible thing to want. Ideally, I'd like to be able to recovery any
backup, however old, from that point forward to a point of my
choosing. But if I run out of disk space, removing the oldest WAL
files I have is more sensible than not accepting new ones. Sure, I'll
have less ability to go back in time, but I'm less likely to need the
data from 3 or 4 backups ago than I am to need the most recent data.
I'm only going to go back to an older backup if I can't recover from
the most recent one, or if I need some data that was removed or
corrupted some time ago. It's good to have that ability for as long
as it is sustainable, but when I have to pick, I want the new stuff.

That's going to be very much "depends".

I might be much better off deleting an older base backup. Because deleting the oldest xlog might render multiple generations of base backups useless.

The point being that there are a lot of different policies that make sense depending on exactly how you're doing your backups. I'm not saying this one is not sensible in some cases, but but I don't think it's all that many.

Which is why I think it's better to implement an interface that gives the flexibility. I'm not saying we can't do both, but once we have command called at the end of the segment, implementing this policy outside pg_receivexlog becomes trivial. That does not hold if they are built in the other order.

I think we're actually desperately in need of smarter WAL management
tools in core not just in this respect but in a whole bunch of places,
and I think size is an excellent thing for those tools to be
considering. When Heikki implemented min_wal_size and max_wal_size

I definitely agree we need smarter tools there. Right now we are sending people to tools like backrest and barman. Which are fine tools, but at some point we should draw upon the experience of those tools and include something in core that solves at least the most common cases , kind of like pg_basebackup solves most of the *backup* cases, and pg_receivexlog most of the *archive* cases, we need something that solves most of the *management* cases. We're still going to send people to the external tools for advanced usecases I think, but right now we have nothing at all in core for helping people with the simple cases of management.

We also don't have any tools to help people *restore* a backup (other than a pg_basebackup -x which you can just uncompress -- but if you want to do PITR you have to do a lot of manual error-prone work)

So don't mistake my opinion for thinking we have sufficient tooling today :)

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Peter Eisentraut

Date:

04 March 2017, 07:09:05

On 2/27/17 00:32, Michael Paquier wrote:
> On Sun, Feb 26, 2017 at 8:24 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> To be consistent with archive_command and restore_command I'd rather
>> not do that. The command called can decide by itself what to do by
>> looking at the shape of the argument string.
> Just before the CF begins, I have taken some time to build up a patch
> that implements this --end-segment-command, with %f as placeholder.

I think this repeats all the mistakes of archive_command, which
ironically pg_receivexlog was intended to fix, such as: shell commands
not fully portable, improper fsync support, poor error handling, lack of
integration with synchronous replication, inability to handle multiple
actions properly.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

04 March 2017, 10:09:39

On Sat, Mar 4, 2017 at 1:09 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 2/27/17 00:32, Michael Paquier wrote:
>> On Sun, Feb 26, 2017 at 8:24 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> To be consistent with archive_command and restore_command I'd rather
>>> not do that. The command called can decide by itself what to do by
>>> looking at the shape of the argument string.
>> Just before the CF begins, I have taken some time to build up a patch
>> that implements this --end-segment-command, with %f as placeholder.
>
> I think this repeats all the mistakes of archive_command, which
> ironically pg_receivexlog was intended to fix, such as: shell commands
> not fully portable, improper fsync support, poor error handling, lack of
> integration with synchronous replication, inability to handle multiple
> actions properly.

Well, that's one reason why I was thinking that having an independent
in-core option to clean up the tail of the oldest segments is
interesting: users don't need to maintain their own infra logic to do
anything. Now this end-segment command can as well be used with a
small binary doing this cleanup, but the monitoring of the thing gets
harder as multiple processes get spawned.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Peter Eisentraut

Date:

06 March 2017, 23:26:11

On 3/4/17 02:09, Michael Paquier wrote:
> Well, that's one reason why I was thinking that having an independent
> in-core option to clean up the tail of the oldest segments is
> interesting: users don't need to maintain their own infra logic to do
> anything. Now this end-segment command can as well be used with a
> small binary doing this cleanup, but the monitoring of the thing gets
> harder as multiple processes get spawned.

I think the initial idea of having an option that does something
specific is better than an invitation to run a general shell command.  I
have some doubts that the proposal to clean up old segments based on
file system space is workable.  For example, that assumes that you are
the only one operating on the file system.  If something else fills up
the file system, this system could then be induced to clean up
everything immediately, without any reference to what you still need.
Also, the various man pages about statvfs() that I found are pretty
pessimistic about how portable it is.

I think something that works similar to pg_archivecleanup that knows
what the last base backup is could work.  In fact, could
pg_archivecleanup not be made to work here?  It's an archive, and it
needs cleaning.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Robert Haas

Date:

07 March 2017, 01:16:04

On Mon, Mar 6, 2017 at 3:26 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/4/17 02:09, Michael Paquier wrote:
>> Well, that's one reason why I was thinking that having an independent
>> in-core option to clean up the tail of the oldest segments is
>> interesting: users don't need to maintain their own infra logic to do
>> anything. Now this end-segment command can as well be used with a
>> small binary doing this cleanup, but the monitoring of the thing gets
>> harder as multiple processes get spawned.
>
> I think the initial idea of having an option that does something
> specific is better than an invitation to run a general shell command.  I
> have some doubts that the proposal to clean up old segments based on
> file system space is workable.  For example, that assumes that you are
> the only one operating on the file system.  If something else fills up
> the file system, this system could then be induced to clean up
> everything immediately, without any reference to what you still need.
> Also, the various man pages about statvfs() that I found are pretty
> pessimistic about how portable it is.

What if we told pg_receivewal (or pg_receivexlog, whatever that is) a
maximum number of segments to retain before removing old ones?  Like
pg_receivewal --limit-retained-segments=50GB, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

07 March 2017, 02:36:47

On Tue, Mar 7, 2017 at 7:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 6, 2017 at 3:26 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 3/4/17 02:09, Michael Paquier wrote:
>>> Well, that's one reason why I was thinking that having an independent
>>> in-core option to clean up the tail of the oldest segments is
>>> interesting: users don't need to maintain their own infra logic to do
>>> anything. Now this end-segment command can as well be used with a
>>> small binary doing this cleanup, but the monitoring of the thing gets
>>> harder as multiple processes get spawned.
>>
>> I think the initial idea of having an option that does something
>> specific is better than an invitation to run a general shell command.  I
>> have some doubts that the proposal to clean up old segments based on
>> file system space is workable.  For example, that assumes that you are
>> the only one operating on the file system.  If something else fills up
>> the file system, this system could then be induced to clean up
>> everything immediately, without any reference to what you still need.
>> Also, the various man pages about statvfs() that I found are pretty
>> pessimistic about how portable it is.

You can count Windows in that.

> What if we told pg_receivewal (or pg_receivexlog, whatever that is) a
> maximum number of segments to retain before removing old ones?  Like
> pg_receivewal --limit-retained-segments=50GB, or something like that.

That's of course doable as well by counting the entries available. Now
one reason why I did not do that is because in my case the archiver is
started as a service using a fixed script, and the size to retain can
be flexible as users can decide the VM size at deployment :)
Having this option would be better than nothing, just that it is not
that flexible if you think about it.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Peter Eisentraut

Date:

07 March 2017, 19:08:18

On 3/6/17 17:16, Robert Haas wrote:
> What if we told pg_receivewal (or pg_receivexlog, whatever that is) a
> maximum number of segments to retain before removing old ones?  Like
> pg_receivewal --limit-retained-segments=50GB, or something like that.

That would be doable, but would it solve anyone's problem?  I think
pg_receivewal retention would usually be governed either by the
available base backups, or by some time-based business metric.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Robert Haas

Date:

07 March 2017, 19:16:29

On Tue, Mar 7, 2017 at 11:08 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/6/17 17:16, Robert Haas wrote:
>> What if we told pg_receivewal (or pg_receivexlog, whatever that is) a
>> maximum number of segments to retain before removing old ones?  Like
>> pg_receivewal --limit-retained-segments=50GB, or something like that.
>
> That would be doable, but would it solve anyone's problem?  I think
> pg_receivewal retention would usually be governed either by the
> available base backups, or by some time-based business metric.

Well, if the problem you're trying to solve is "retain WAL for as long
as possible without running out of disk space and having everything go
kablooey", then it would solve that problem, and I think that's a very
reasonable problem to want to solve.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Peter Eisentraut

Date:

09 March 2017, 16:54:48

On 3/7/17 11:16, Robert Haas wrote:
> Well, if the problem you're trying to solve is "retain WAL for as long
> as possible without running out of disk space and having everything go
> kablooey", then it would solve that problem, and I think that's a very
> reasonable problem to want to solve.

Could be.  I'm not sure what that means for the presented patch, though.Or whether it addresses Michael's original use
caseat all.
 

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

10 March 2017, 01:03:04

On Thu, Mar 9, 2017 at 10:54 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/7/17 11:16, Robert Haas wrote:
>> Well, if the problem you're trying to solve is "retain WAL for as long
>> as possible without running out of disk space and having everything go
>> kablooey", then it would solve that problem, and I think that's a very
>> reasonable problem to want to solve.
>
> Could be.  I'm not sure what that means for the presented patch, though.
>  Or whether it addresses Michael's original use case at all.

The patch adding an end-segment command does address my problem,
because I just want to be sure that there is enough space left on disk
for one complete segment. And that's fine to check for that when the
last segment is complete. This needs some additional effort but that's
no big deal either.

Having something like --limit-retained-segments partially addresses
it, as long as there is a way to define an automatic mode, based on
statvfs() obviously.
-- 
Michael

Re: [HACKERS] Automatic cleanup of oldest WAL segments withpg_receivexlog

From

Peter Eisentraut

Date:

10 March 2017, 17:15:43

On 3/9/17 17:03, Michael Paquier wrote:
> Having something like --limit-retained-segments partially addresses
> it, as long as there is a way to define an automatic mode, based on
> statvfs() obviously.

But that is not portable/usable enough, as we have determined, I think.

Have you looked into using inotify for implementing your use case?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

David Steele

Date:

21 March 2017, 19:56:50

Hi Michael,

On 3/10/17 9:15 AM, Peter Eisentraut wrote:
> On 3/9/17 17:03, Michael Paquier wrote:
>> Having something like --limit-retained-segments partially addresses
>> it, as long as there is a way to define an automatic mode, based on
>> statvfs() obviously.
>
> But that is not portable/usable enough, as we have determined, I think.
>
> Have you looked into using inotify for implementing your use case?

This thread has been idle for quite a while.  Please respond and/or post 
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".

Thanks,
-- 
-David
david@pgmasters.net

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

From

Michael Paquier

Date:

22 March 2017, 02:47:26

On Wed, Mar 22, 2017 at 1:56 AM, David Steele <david@pgmasters.net> wrote:
> Hi Michael,
>
> On 3/10/17 9:15 AM, Peter Eisentraut wrote:
>>
>> On 3/9/17 17:03, Michael Paquier wrote:
>>>
>>> Having something like --limit-retained-segments partially addresses
>>> it, as long as there is a way to define an automatic mode, based on
>>> statvfs() obviously.
>>
>>
>> But that is not portable/usable enough, as we have determined, I think.
>>
>> Have you looked into using inotify for implementing your use case?
>
>
> This thread has been idle for quite a while.  Please respond and/or post a
> new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked
> "Returned with Feedback".

I have no idea what to do here, so I just marked it as returned with feedback.
-- 
Michael