Thread: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

Hi,

pg_archivecleanup currently takes a WAL file name as input to delete
the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
automatically detect the last checkpoint (from control file) LSN,
calculate the lowest restart_lsn required by the replication slots, if
any (by reading the replication slot info from pg_logical directory),
archive the unneeded (an archive_command similar to that of the one
provided in the server config can be provided as an input) WAL files
before finally deleting them? Making pg_archivecleanup tool as an
end-to-end solution will help greatly in disk full situations because
of WAL files growth (inactive replication slots, archive command
failures, infrequent checkpoint etc.).

Thoughts?

[1] - When used as a standalone program all WAL files logically
preceding the oldestkeptwalfile will be removed from archivelocation.
https://www.postgresql.org/docs/devel/pgarchivecleanup.html
[2] - https://www.postgresql.org/message-id/CAHg%2BQDc9xwN7EmuONT3T91pCqFG6Q-BCe6B-kM-by7r1uPEicg%40mail.gmail.com

Regards,
Bharath Rupireddy.



On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
pg_archivecleanup currently takes a WAL file name as input to delete
the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
automatically detect the last checkpoint (from control file) LSN,
calculate the lowest restart_lsn required by the replication slots, if
any (by reading the replication slot info from pg_logical directory),
archive the unneeded (an archive_command similar to that of the one
provided in the server config can be provided as an input) WAL files
before finally deleting them? Making pg_archivecleanup tool as an
end-to-end solution will help greatly in disk full situations because
of WAL files growth (inactive replication slots, archive command
failures, infrequent checkpoint etc.).
pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
suggesting to use it for removing files from pg_wal directory too? No, thanks.
WAL files are a key component for backup and replication. Hence, you cannot
deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
wouldn't occur if you have a monitoring system and alerts and someone to keep
an eye on it. If the disk full situation was caused by a failed archive command
or a disconnected standby, it is easy to figure out; the fix is simple.


--
Euler Taveira

Greetings,

* Euler Taveira (euler@eulerto.com) wrote:
> On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > pg_archivecleanup currently takes a WAL file name as input to delete
> > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > automatically detect the last checkpoint (from control file) LSN,
> > calculate the lowest restart_lsn required by the replication slots, if
> > any (by reading the replication slot info from pg_logical directory),
> > archive the unneeded (an archive_command similar to that of the one
> > provided in the server config can be provided as an input) WAL files
> > before finally deleting them? Making pg_archivecleanup tool as an
> > end-to-end solution will help greatly in disk full situations because
> > of WAL files growth (inactive replication slots, archive command
> > failures, infrequent checkpoint etc.).

The overall idea of having a tool for this isn't a bad idea, but ..

> pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> suggesting to use it for removing files from pg_wal directory too? No, thanks.

We definitely shouldn't have it be part of pg_archivecleanup for the
simple reason that it'll be really confusing and almost certainly will
be mis-used.  For my 2c, we should just remove pg_archivecleanup
entirely.

> WAL files are a key component for backup and replication. Hence, you cannot
> deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> wouldn't occur if you have a monitoring system and alerts and someone to keep
> an eye on it. If the disk full situation was caused by a failed archive command
> or a disconnected standby, it is easy to figure out; the fix is simple.

This is perhaps a bit far- PG does, in fact, remove WAL files from
PGDATA.  Having a tool which will do this safely when the server isn't
able to be brought online due to lack of disk space would certainly be
helpful rather frequently.  I agree that monitoring and alerting are
things that everyone should implement and pay attention to, but that
doesn't happen and instead people end up just blowing away pg_wal and
corrupting their database when, had a tool existed, they could have
avoided that happening and brought the system back online in relatively
short order without any data loss.

Thanks,

Stephen

Attachment
On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfrost@snowman.net> wrote:
> > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > > pg_archivecleanup currently takes a WAL file name as input to delete
> > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > > automatically detect the last checkpoint (from control file) LSN,
> > > calculate the lowest restart_lsn required by the replication slots, if
> > > any (by reading the replication slot info from pg_logical directory),
> > > archive the unneeded (an archive_command similar to that of the one
> > > provided in the server config can be provided as an input) WAL files
> > > before finally deleting them? Making pg_archivecleanup tool as an
> > > end-to-end solution will help greatly in disk full situations because
> > > of WAL files growth (inactive replication slots, archive command
> > > failures, infrequent checkpoint etc.).
>
> The overall idea of having a tool for this isn't a bad idea, but ..
>
> > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> > suggesting to use it for removing files from pg_wal directory too? No, thanks.
>
> We definitely shouldn't have it be part of pg_archivecleanup for the
> simple reason that it'll be really confusing and almost certainly will
> be mis-used.

+1

> > WAL files are a key component for backup and replication. Hence, you cannot
> > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> > wouldn't occur if you have a monitoring system and alerts and someone to keep
> > an eye on it. If the disk full situation was caused by a failed archive command
> > or a disconnected standby, it is easy to figure out; the fix is simple.
>
> This is perhaps a bit far- PG does, in fact, remove WAL files from
> PGDATA.  Having a tool which will do this safely when the server isn't
> able to be brought online due to lack of disk space would certainly be
> helpful rather frequently.  I agree that monitoring and alerting are
> things that everyone should implement and pay attention to, but that
> doesn't happen and instead people end up just blowing away pg_wal and
> corrupting their database when, had a tool existed, they could have
> avoided that happening and brought the system back online in relatively
> short order without any data loss.

Thanks. Yes, the end-to-end tool is helpful in rather eventual
situations and having it in the core is more helpful instead of every
postgres vendor developing their own solution and many times it's hard
to get it right. Also, I agree to not club this idea with
pg_archviecleanup. How about having a new tool like
pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in
eventual situations?

Regards,
Bharath Rupireddy.



On Wed, Dec 29, 2021 at 8:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > > > pg_archivecleanup currently takes a WAL file name as input to delete
> > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > > > automatically detect the last checkpoint (from control file) LSN,
> > > > calculate the lowest restart_lsn required by the replication slots, if
> > > > any (by reading the replication slot info from pg_logical directory),
> > > > archive the unneeded (an archive_command similar to that of the one
> > > > provided in the server config can be provided as an input) WAL files
> > > > before finally deleting them? Making pg_archivecleanup tool as an
> > > > end-to-end solution will help greatly in disk full situations because
> > > > of WAL files growth (inactive replication slots, archive command
> > > > failures, infrequent checkpoint etc.).
> >
> > The overall idea of having a tool for this isn't a bad idea, but ..
> >
> > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> > > suggesting to use it for removing files from pg_wal directory too? No, thanks.
> >
> > We definitely shouldn't have it be part of pg_archivecleanup for the
> > simple reason that it'll be really confusing and almost certainly will
> > be mis-used.
>
> +1
>
> > > WAL files are a key component for backup and replication. Hence, you cannot
> > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> > > wouldn't occur if you have a monitoring system and alerts and someone to keep
> > > an eye on it. If the disk full situation was caused by a failed archive command
> > > or a disconnected standby, it is easy to figure out; the fix is simple.
> >
> > This is perhaps a bit far- PG does, in fact, remove WAL files from
> > PGDATA.  Having a tool which will do this safely when the server isn't
> > able to be brought online due to lack of disk space would certainly be
> > helpful rather frequently.  I agree that monitoring and alerting are
> > things that everyone should implement and pay attention to, but that
> > doesn't happen and instead people end up just blowing away pg_wal and
> > corrupting their database when, had a tool existed, they could have
> > avoided that happening and brought the system back online in relatively
> > short order without any data loss.
>
> Thanks. Yes, the end-to-end tool is helpful in rather eventual
> situations and having it in the core is more helpful instead of every
> postgres vendor developing their own solution and many times it's hard
> to get it right. Also, I agree to not club this idea with
> pg_archviecleanup. How about having a new tool like
> pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in
> eventual situations?

Thanks for the comments. Here's a new tool called pg_walcleaner which
basically deletes (optionally archiving before deletion) the unneeded
WAL files.

Please provide your thoughts and review the patches.

Regards,
Bharath Rupireddy.

Attachment
Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> Thanks for the comments. Here's a new tool called pg_walcleaner which
> basically deletes (optionally archiving before deletion) the unneeded
> WAL files.
>
> Please provide your thoughts and review the patches.

Alright, I spent some more time thinking about this and contemplating
what the next steps are... and I feel like the next step is basically
"add a HINT when the server can't start due to being out of disk space
that one should consider running pg_walcleaner" and at that point... why
aren't we just, uh, doing that?  This is all still quite hand-wavy, but
it sure would be nice to be able to avoid downtime due to a broken
archiving setup.  pgbackrest has a way of doing this and while we, of
course, discourage the use of that option, as it means throwing away
WAL, it's an option that users have.  PG could have a similar option.
Basically, to archive_command/library what max_slot_wal_keep_size is for
slots.

That isn't to say that we shouldn't also have a tool like this, but it
generally feels like we're taking a reactive approach here rather than a
proactive one to addressing the root issue.

Thanks,

Stephen

Attachment
On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > basically deletes (optionally archiving before deletion) the unneeded
> > WAL files.
> >
> > Please provide your thoughts and review the patches.
>
> Alright, I spent some more time thinking about this and contemplating
> what the next steps are... and I feel like the next step is basically
> "add a HINT when the server can't start due to being out of disk space
> that one should consider running pg_walcleaner" and at that point... why
> aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> it sure would be nice to be able to avoid downtime due to a broken
> archiving setup.  pgbackrest has a way of doing this and while we, of
> course, discourage the use of that option, as it means throwing away
> WAL, it's an option that users have.  PG could have a similar option.
> Basically, to archive_command/library what max_slot_wal_keep_size is for
> slots.

Thanks. I get your point. The way I see it is that the postgres should
be self-aware of the about-to-get-full disk (probably when the data
directory size is 90%(configurable, of course) of total disk size) and
then freeze the new write operations (may be via new ALTER SYSTEM SET
READ-ONLY or setting default_transaction_read_only GUC) and then go
clean the unneeded WAL files by just invoking pg_walcleaner tool
perhaps. I think, so far, this kind of work has been done outside of
postgres. Even then, we might get into out-of-disk situations
depending on how frequently we check the data directory size to
compute the 90% configurable limit. Detecting the disk size is the KEY
here. Hence we need an offline invokable tool like pg_walcleaner.

 Actually, I was planning to write an extension with a background
worker doing this for us.

> That isn't to say that we shouldn't also have a tool like this, but it
> generally feels like we're taking a reactive approach here rather than a
> proactive one to addressing the root issue.

Agree. The offline tool like pg_walcleaner can help greatly even with
some sort of above internal/external disk space monitoring tools.

Regards,
Bharath Rupireddy.



Greeting,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote:
> > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > > basically deletes (optionally archiving before deletion) the unneeded
> > > WAL files.
> > >
> > > Please provide your thoughts and review the patches.
> >
> > Alright, I spent some more time thinking about this and contemplating
> > what the next steps are... and I feel like the next step is basically
> > "add a HINT when the server can't start due to being out of disk space
> > that one should consider running pg_walcleaner" and at that point... why
> > aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> > it sure would be nice to be able to avoid downtime due to a broken
> > archiving setup.  pgbackrest has a way of doing this and while we, of
> > course, discourage the use of that option, as it means throwing away
> > WAL, it's an option that users have.  PG could have a similar option.
> > Basically, to archive_command/library what max_slot_wal_keep_size is for
> > slots.
>
> Thanks. I get your point. The way I see it is that the postgres should
> be self-aware of the about-to-get-full disk (probably when the data
> directory size is 90%(configurable, of course) of total disk size) and
> then freeze the new write operations (may be via new ALTER SYSTEM SET
> READ-ONLY or setting default_transaction_read_only GUC) and then go
> clean the unneeded WAL files by just invoking pg_walcleaner tool
> perhaps. I think, so far, this kind of work has been done outside of
> postgres. Even then, we might get into out-of-disk situations
> depending on how frequently we check the data directory size to
> compute the 90% configurable limit. Detecting the disk size is the KEY
> here. Hence we need an offline invokable tool like pg_walcleaner.

Ugh, last I checked, figuring out if a given filesystem is near being
full is a pain to do in a cross-platform way.  Why not just do exactly
what we already are doing for replication slots, but for
archive_command?  Then we wouldn't need to go into a read-only mode.
Perhaps going into a read-only mode would be an alternative option to
that but we should definitely be letting the admin pick what to do in
such a case.  The idea of going read-only and then *also* removing WAL
files doesn't seem like it's ever the right choice though..?

As for worrying about frequency.. that seems unlikely to be that serious
of an issue, if we just check how far behind we are with each time we
try to run archive_command.  That's basically how the pgbackrest option
works and we've had few issues with it not being 'soon enough'.

>  Actually, I was planning to write an extension with a background
> worker doing this for us.

... but we have a background worker already for archiving that could
handle this for us, doesn't seem like we need another.

> > That isn't to say that we shouldn't also have a tool like this, but it
> > generally feels like we're taking a reactive approach here rather than a
> > proactive one to addressing the root issue.
>
> Agree. The offline tool like pg_walcleaner can help greatly even with
> some sort of above internal/external disk space monitoring tools.

See, this seems like a really bad idea to me.  I'd be very concerned
about people mis-using this tool in some way and automating its usage
strikes me as absolutely exactly that..  Are we sure that we can
guarantee that we don't remove things we shouldn't when this ends up
getting run against a running cluster from someone's automated tooling?
Or when someone sees that it refuses to run for $reason and tries to..
"fix" that?  Seems quite risky to me..  I'd probably want to put similar
caveats around using this tool as I do around pg_resetwal when doing
training- that is, don't ever, ever, ever use this, heh.

Thanks,

Stephen

Attachment
On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greeting,
>
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > > > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > > > basically deletes (optionally archiving before deletion) the unneeded
> > > > WAL files.
> > > >
> > > > Please provide your thoughts and review the patches.
> > >
> > > Alright, I spent some more time thinking about this and contemplating
> > > what the next steps are... and I feel like the next step is basically
> > > "add a HINT when the server can't start due to being out of disk space
> > > that one should consider running pg_walcleaner" and at that point... why
> > > aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> > > it sure would be nice to be able to avoid downtime due to a broken
> > > archiving setup.  pgbackrest has a way of doing this and while we, of
> > > course, discourage the use of that option, as it means throwing away
> > > WAL, it's an option that users have.  PG could have a similar option.
> > > Basically, to archive_command/library what max_slot_wal_keep_size is for
> > > slots.
> >
> > Thanks. I get your point. The way I see it is that the postgres should
> > be self-aware of the about-to-get-full disk (probably when the data
> > directory size is 90%(configurable, of course) of total disk size) and
> > then freeze the new write operations (may be via new ALTER SYSTEM SET
> > READ-ONLY or setting default_transaction_read_only GUC) and then go
> > clean the unneeded WAL files by just invoking pg_walcleaner tool
> > perhaps. I think, so far, this kind of work has been done outside of
> > postgres. Even then, we might get into out-of-disk situations
> > depending on how frequently we check the data directory size to
> > compute the 90% configurable limit. Detecting the disk size is the KEY
> > here. Hence we need an offline invokable tool like pg_walcleaner.
>
> Ugh, last I checked, figuring out if a given filesystem is near being
> full is a pain to do in a cross-platform way.  Why not just do exactly
> what we already are doing for replication slots, but for
> archive_command?

Do you mean to say that if the archvie_command fails, say, for "some
time" or "some number of attempts", just let the server not bother
about it and checkpoint delete the WAL files instead of going out of
disk? If this is the thought, then it's more dangerous as we might end
up losing the WAL forever. For invalidating replication slots, it's
okay because the required WAL can exist somewhere (either on the
primary or on the archive location).

> > > That isn't to say that we shouldn't also have a tool like this, but it
> > > generally feels like we're taking a reactive approach here rather than a
> > > proactive one to addressing the root issue.
> >
> > Agree. The offline tool like pg_walcleaner can help greatly even with
> > some sort of above internal/external disk space monitoring tools.
>
> See, this seems like a really bad idea to me.  I'd be very concerned
> about people mis-using this tool in some way and automating its usage
> strikes me as absolutely exactly that..  Are we sure that we can
> guarantee that we don't remove things we shouldn't when this ends up
> getting run against a running cluster from someone's automated tooling?
> Or when someone sees that it refuses to run for $reason and tries to..
> "fix" that?  Seems quite risky to me..  I'd probably want to put similar
> caveats around using this tool as I do around pg_resetwal when doing
> training- that is, don't ever, ever, ever use this, heh.

The initial version of the patch doesn't check if the server crashed
or not before running it. I was thinking of looking at the
postmaster.pid or pg_control file (however they don't guarantee
whether the server is up or crashed because the server can crash
without deleting postmaster.pid or updating pg_control file). Another
idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
if the server is up and running, if yes, exit, otherwise proceed with
its work.

Also, to not cause losing of WAL permanently, we must recommend using
archvie_command so that the WAL can be moved to an alternative
location (could be the same archvie_location that primary uses).

And yes, we must have clear usage guidelines in the docs.

Regards,
Bharath Rupireddy.



Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost <sfrost@snowman.net> wrote:
> > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > > > > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > > > > basically deletes (optionally archiving before deletion) the unneeded
> > > > > WAL files.
> > > > >
> > > > > Please provide your thoughts and review the patches.
> > > >
> > > > Alright, I spent some more time thinking about this and contemplating
> > > > what the next steps are... and I feel like the next step is basically
> > > > "add a HINT when the server can't start due to being out of disk space
> > > > that one should consider running pg_walcleaner" and at that point... why
> > > > aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> > > > it sure would be nice to be able to avoid downtime due to a broken
> > > > archiving setup.  pgbackrest has a way of doing this and while we, of
> > > > course, discourage the use of that option, as it means throwing away
> > > > WAL, it's an option that users have.  PG could have a similar option.
> > > > Basically, to archive_command/library what max_slot_wal_keep_size is for
> > > > slots.
> > >
> > > Thanks. I get your point. The way I see it is that the postgres should
> > > be self-aware of the about-to-get-full disk (probably when the data
> > > directory size is 90%(configurable, of course) of total disk size) and
> > > then freeze the new write operations (may be via new ALTER SYSTEM SET
> > > READ-ONLY or setting default_transaction_read_only GUC) and then go
> > > clean the unneeded WAL files by just invoking pg_walcleaner tool
> > > perhaps. I think, so far, this kind of work has been done outside of
> > > postgres. Even then, we might get into out-of-disk situations
> > > depending on how frequently we check the data directory size to
> > > compute the 90% configurable limit. Detecting the disk size is the KEY
> > > here. Hence we need an offline invokable tool like pg_walcleaner.
> >
> > Ugh, last I checked, figuring out if a given filesystem is near being
> > full is a pain to do in a cross-platform way.  Why not just do exactly
> > what we already are doing for replication slots, but for
> > archive_command?
>
> Do you mean to say that if the archvie_command fails, say, for "some
> time" or "some number of attempts", just let the server not bother
> about it and checkpoint delete the WAL files instead of going out of
> disk? If this is the thought, then it's more dangerous as we might end
> up losing the WAL forever. For invalidating replication slots, it's
> okay because the required WAL can exist somewhere (either on the
> primary or on the archive location).

I was thinking more specifically along the lines of "if there's > X GB
of WAL that hasn't been archived, give up on archiving anything new"
(which is how the pgbackrest option works).

As archiving with this command is optional, it does present the same
risk too.  Perhaps if we flipped it around to require the
archive_command be provided then it'd be a bit better, though we would
also need a way for users to ask for us to just delete the WAL without
archiving it.  There again though ... the server already has a way of
both archiving and removing archived WAL and also has now grown the
archive_library option, something that this tool would be pretty hard to
replicate, I feel like, as it wouldn't be loading the library into a PG
backend anymore.  As we don't have any real archive libraries yet, it's
hard to say if that's going to be an actual issue or not.  Something to
consider though.

> > > > That isn't to say that we shouldn't also have a tool like this, but it
> > > > generally feels like we're taking a reactive approach here rather than a
> > > > proactive one to addressing the root issue.
> > >
> > > Agree. The offline tool like pg_walcleaner can help greatly even with
> > > some sort of above internal/external disk space monitoring tools.
> >
> > See, this seems like a really bad idea to me.  I'd be very concerned
> > about people mis-using this tool in some way and automating its usage
> > strikes me as absolutely exactly that..  Are we sure that we can
> > guarantee that we don't remove things we shouldn't when this ends up
> > getting run against a running cluster from someone's automated tooling?
> > Or when someone sees that it refuses to run for $reason and tries to..
> > "fix" that?  Seems quite risky to me..  I'd probably want to put similar
> > caveats around using this tool as I do around pg_resetwal when doing
> > training- that is, don't ever, ever, ever use this, heh.
>
> The initial version of the patch doesn't check if the server crashed
> or not before running it. I was thinking of looking at the
> postmaster.pid or pg_control file (however they don't guarantee
> whether the server is up or crashed because the server can crash
> without deleting postmaster.pid or updating pg_control file). Another
> idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
> if the server is up and running, if yes, exit, otherwise proceed with
> its work.

All of which isn't an issue if we don't have an external tool trying to
do this and instead have the server doing it as the server knows its
internal status, that the archive command has been failing long enough
to pass the configuration threshold, and that the WAL isn't needed for
crash recovery.  The ability to avoid having to crash and go through
that process is pretty valuable.  Still, a crash may still happen and
it'd be useful to have a clean way to deal with it.  I'm not really a
fan of having to essentially configure this external command as well as
have the server configured.  Have we settled that there's no way to make
the server archive while there's no space available and before trying to
write out more data?

> Also, to not cause losing of WAL permanently, we must recommend using
> archvie_command so that the WAL can be moved to an alternative
> location (could be the same archvie_location that primary uses).

I agree we should recommend using archive_command or archive_library, of
course, but if that's been done and is working properly then this tool
isn't really needed.  The use-case we're trying to address, I believe,
is something like:

1) archive command starts failing for some reason
2) WAL piles up on the primary
3) primary runs out of disk space, crash happens
4) archive command gets 'fixed' in some fashion
5) WAL is archived and removed from primary
6) primary is restarted and able to go through crash recovery
7) server is online again

Now, I was trying to suggest an approach to addressing the issue at #2,
that is, avoid having WAL pile up without end on the primary and avoid
the crash in the first place.  For users who care more about uptime and
less about WAL, that's likely what they want.

For users who care more about WAL than uptime, it'd be good to have a
way to help them too, but to do that, #4 has to happen and, once that's
done, #5 and following just need to be accomplished in whatever way is
simplest.  The thought I'm having here is that the simplest answer, at
least from the user's perspective, is that the server is able to just be
brought up with the fixed archive command and everything just works-
archiving happens, space is free'd up, and the server comes up and
continues running.

I accept that it isn't this patch's job or mandate to go implement some
new option that I've thought up, and I could be convinced that this
separate tool is just how we're going to have to get #5 accomplished for
now due to the complexity of making the server do the archiving early on
and/or that it has other downsides (if the crash wasn't due to running
out of space, making the server wait to come online until after the WAL
has been archived wouldn't be ideal) that make it a poor choice overall,
but it seems like it's something that's at least worth some thought and
consideration of if there's a way to accomplish this with a bit less
manual user involvement, as that tends to be error-prone.

Thanks,

Stephen

Attachment
On Tue, Apr 26, 2022 at 12:09 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> I was thinking more specifically along the lines of "if there's > X GB
> of WAL that hasn't been archived, give up on archiving anything new"
> (which is how the pgbackrest option works).

IMO, this option is dangerous because the users might lose the
unarchived WAL files completely. Instead, fixing the archive_command
either before the server runs out of disk space (if someone/some
monitoring detected the archive_command failure) or after the server
crashes with no space.

> As archiving with this command is optional, it does present the same
> risk too.  Perhaps if we flipped it around to require the
> archive_command be provided then it'd be a bit better, though we would
> also need a way for users to ask for us to just delete the WAL without
> archiving it.  There again though ... the server already has a way of
> both archiving and removing archived WAL and also has now grown the
> archive_library option, something that this tool would be pretty hard to
> replicate, I feel like, as it wouldn't be loading the library into a PG
> backend anymore.  As we don't have any real archive libraries yet, it's
> hard to say if that's going to be an actual issue or not.  Something to
> consider though.

Actually, why are we just thinking that the WAL files grow up only
because of archive command failures (of course, it's the main cause
even in a well-configured server)?. Imagine if the checkpoints weren't
happening frequently for whatever reasons or the
max_slot_wal_keep_size or wal_keep_size weren't set appropriately
(even if they set properly, someone, could be an attacker, can reset
them), high write activity generating huge WAL files (especially on
smaller servers) - these too can be reasons for server going down
because of WAL files growth.

The main motto of this tool is to use (by DBAs or service engineers)
it as a last resort option after the server goes out of disk space to
quickly bring the server back online so that the regular cleaning up
activity can take place or the storage scale operations can be
performed if required. There's a dry run option in pg_walcleaner to
see if it can free up some space at all. If required we can provide an
option to archive and delete only the maximum of specified number of
WAL files.

I'm okay with making the archive_command mandatory and if archiving
isn't required to be used with the pg_walcleaner tool they can set
some dummy values such as archive_command='/bin/true'.

> > The initial version of the patch doesn't check if the server crashed
> > or not before running it. I was thinking of looking at the
> > postmaster.pid or pg_control file (however they don't guarantee
> > whether the server is up or crashed because the server can crash
> > without deleting postmaster.pid or updating pg_control file). Another
> > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
> > if the server is up and running, if yes, exit, otherwise proceed with
> > its work.
>
> All of which isn't an issue if we don't have an external tool trying to
> do this and instead have the server doing it as the server knows its
> internal status, that the archive command has been failing long enough
> to pass the configuration threshold, and that the WAL isn't needed for
> crash recovery.  The ability to avoid having to crash and go through
> that process is pretty valuable.  Still, a crash may still happen and
> it'd be useful to have a clean way to deal with it.  I'm not really a
> fan of having to essentially configure this external command as well as
> have the server configured.  Have we settled that there's no way to make
> the server archive while there's no space available and before trying to
> write out more data?

The pg_walcleaner tool isn't intrusive in the sense that it doesn't
delete the WAL files that are required for the server to come up (as
it checks for the checkpoint redo WAL file), apart from this it has
archive_command too so no loss of the WAL file(s) at all unlike the
pgbackrest option.

> > Also, to not cause losing of WAL permanently, we must recommend using
> > archvie_command so that the WAL can be moved to an alternative
> > location (could be the same archvie_location that primary uses).
>
> I agree we should recommend using archive_command or archive_library, of
> course, but if that's been done and is working properly then this tool
> isn't really needed.  The use-case we're trying to address, I believe,
> is something like:
>
> 1) archive command starts failing for some reason
> 2) WAL piles up on the primary
> 3) primary runs out of disk space, crash happens
> 4) archive command gets 'fixed' in some fashion
> 5) WAL is archived and removed from primary
> 6) primary is restarted and able to go through crash recovery
> 7) server is online again
>
> Now, I was trying to suggest an approach to addressing the issue at #2,
> that is, avoid having WAL pile up without end on the primary and avoid
> the crash in the first place.  For users who care more about uptime and
> less about WAL, that's likely what they want.
>
> For users who care more about WAL than uptime, it'd be good to have a
> way to help them too, but to do that, #4 has to happen and, once that's
> done, #5 and following just need to be accomplished in whatever way is
> simplest.  The thought I'm having here is that the simplest answer, at
> least from the user's perspective, is that the server is able to just be
> brought up with the fixed archive command and everything just works-
> archiving happens, space is free'd up, and the server comes up and
> continues running.
>
> I accept that it isn't this patch's job or mandate to go implement some
> new option that I've thought up, and I could be convinced that this
> separate tool is just how we're going to have to get #5 accomplished for
> now due to the complexity of making the server do the archiving early on
> and/or that it has other downsides (if the crash wasn't due to running
> out of space, making the server wait to come online until after the WAL
> has been archived wouldn't be ideal) that make it a poor choice overall,
> but it seems like it's something that's at least worth some thought and
> consideration of if there's a way to accomplish this with a bit less
> manual user involvement, as that tends to be error-prone.

As I said above, let's not just think that the archive_command
failures are the only reasons (of course they are the main causes) for
WAL file growth.

The pg_walcleaner proposed here is a better option IMO for the
developers/DBAs/service engineers to see if they can quickly bring up
the crashed server avoiding some manual work of looking at which WAL
files can be deleted as such. Also if they fixed the failed
archive_command or wrong settings that caused the WAL files growth,
the server will be able to do the other cleanup and come online
cleanly. I'm sure many would have faced this problem of server crashes
due to out of disk space at some point in time in production
environments, especially with the older versions of postgres.

Regards,
Bharath Rupireddy.



Greetings,

* Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> On Tue, Apr 26, 2022 at 12:09 AM Stephen Frost <sfrost@snowman.net> wrote:
> > I was thinking more specifically along the lines of "if there's > X GB
> > of WAL that hasn't been archived, give up on archiving anything new"
> > (which is how the pgbackrest option works).
>
> IMO, this option is dangerous because the users might lose the
> unarchived WAL files completely. Instead, fixing the archive_command
> either before the server runs out of disk space (if someone/some
> monitoring detected the archive_command failure) or after the server
> crashes with no space.

Certainly, fixing the archive_command would be the best answer.  We're
here because we know that doesn't always happen and are looking for
other ways to avoid the server crashing or to at least help clean things
up if a crash does happen.

> > As archiving with this command is optional, it does present the same
> > risk too.  Perhaps if we flipped it around to require the
> > archive_command be provided then it'd be a bit better, though we would
> > also need a way for users to ask for us to just delete the WAL without
> > archiving it.  There again though ... the server already has a way of
> > both archiving and removing archived WAL and also has now grown the
> > archive_library option, something that this tool would be pretty hard to
> > replicate, I feel like, as it wouldn't be loading the library into a PG
> > backend anymore.  As we don't have any real archive libraries yet, it's
> > hard to say if that's going to be an actual issue or not.  Something to
> > consider though.
>
> Actually, why are we just thinking that the WAL files grow up only
> because of archive command failures (of course, it's the main cause
> even in a well-configured server)?. Imagine if the checkpoints weren't
> happening frequently for whatever reasons or the
> max_slot_wal_keep_size or wal_keep_size weren't set appropriately
> (even if they set properly, someone, could be an attacker, can reset
> them), high write activity generating huge WAL files (especially on
> smaller servers) - these too can be reasons for server going down
> because of WAL files growth.

The option that I outline would probably be the "end" of all of the
above- that is, its job is to absolutely keep the pg_wal filesystem from
running out of space by making sure that pg_wal doesn't grow beyond the
configured size.  That might mean that we don't keep wal_keep_size
amount of WAL or maybe get rid of WAL before max_slot_wal_keep_size.  If
checkpoints aren't happening, that's different and throwing away WAL
would mean corruption on the primary and so we'd just have to accept a
crash in that case but I don't think that's anywhere near as common as
the other cases.  We would probably want to throw warnings if someone
configured wal_keep_size bigger than throw_away_unneeded_wal or
whatever.

> The main motto of this tool is to use (by DBAs or service engineers)
> it as a last resort option after the server goes out of disk space to
> quickly bring the server back online so that the regular cleaning up
> activity can take place or the storage scale operations can be
> performed if required. There's a dry run option in pg_walcleaner to
> see if it can free up some space at all. If required we can provide an
> option to archive and delete only the maximum of specified number of
> WAL files.

Unfortunately, it's not likely to be very quick if archive command is
set to anything real.  Interesting idea to have an option along the
lines of "please archive X amount of WAL", but there again- the server
could, in theory anyway, start up and get a few WAL segments archived
and then start doing replay from the crash, and if it ran out of space
and archiving was still happening, it could perhaps just wait and try
again.  With the ALTER SYSTEM READ ONLY, it could maybe even just go
read-only if it ran out of space in the first place and continue to try
and archive WAL.  Kind of hard to say how much additional space is
needed to get through crash recovery and so this suggested option would
always end up having to be a pretty broad guess and you absolutely
couldn't start the server while this command is running (which I would
definitely be worried would end up happening...), so that's also
something to think about.

> I'm okay with making the archive_command mandatory and if archiving
> isn't required to be used with the pg_walcleaner tool they can set
> some dummy values such as archive_command='/bin/true'.

Yeah, that seems at least a bit better.  I imagine a lot of people will
probably use it exactly like that, but this will hopefully at least
discourage them from just throwing it away.

> > > The initial version of the patch doesn't check if the server crashed
> > > or not before running it. I was thinking of looking at the
> > > postmaster.pid or pg_control file (however they don't guarantee
> > > whether the server is up or crashed because the server can crash
> > > without deleting postmaster.pid or updating pg_control file). Another
> > > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see
> > > if the server is up and running, if yes, exit, otherwise proceed with
> > > its work.
> >
> > All of which isn't an issue if we don't have an external tool trying to
> > do this and instead have the server doing it as the server knows its
> > internal status, that the archive command has been failing long enough
> > to pass the configuration threshold, and that the WAL isn't needed for
> > crash recovery.  The ability to avoid having to crash and go through
> > that process is pretty valuable.  Still, a crash may still happen and
> > it'd be useful to have a clean way to deal with it.  I'm not really a
> > fan of having to essentially configure this external command as well as
> > have the server configured.  Have we settled that there's no way to make
> > the server archive while there's no space available and before trying to
> > write out more data?
>
> The pg_walcleaner tool isn't intrusive in the sense that it doesn't
> delete the WAL files that are required for the server to come up (as
> it checks for the checkpoint redo WAL file), apart from this it has
> archive_command too so no loss of the WAL file(s) at all unlike the
> pgbackrest option.

Won't be any WAL loss with pgbackrest unless it's specifically
configured to throw it away- again, it's a tradeoff.  Just suggesting
that we could have that be part of core as an option.

> > > Also, to not cause losing of WAL permanently, we must recommend using
> > > archvie_command so that the WAL can be moved to an alternative
> > > location (could be the same archvie_location that primary uses).
> >
> > I agree we should recommend using archive_command or archive_library, of
> > course, but if that's been done and is working properly then this tool
> > isn't really needed.  The use-case we're trying to address, I believe,
> > is something like:
> >
> > 1) archive command starts failing for some reason
> > 2) WAL piles up on the primary
> > 3) primary runs out of disk space, crash happens
> > 4) archive command gets 'fixed' in some fashion
> > 5) WAL is archived and removed from primary
> > 6) primary is restarted and able to go through crash recovery
> > 7) server is online again
> >
> > Now, I was trying to suggest an approach to addressing the issue at #2,
> > that is, avoid having WAL pile up without end on the primary and avoid
> > the crash in the first place.  For users who care more about uptime and
> > less about WAL, that's likely what they want.
> >
> > For users who care more about WAL than uptime, it'd be good to have a
> > way to help them too, but to do that, #4 has to happen and, once that's
> > done, #5 and following just need to be accomplished in whatever way is
> > simplest.  The thought I'm having here is that the simplest answer, at
> > least from the user's perspective, is that the server is able to just be
> > brought up with the fixed archive command and everything just works-
> > archiving happens, space is free'd up, and the server comes up and
> > continues running.
> >
> > I accept that it isn't this patch's job or mandate to go implement some
> > new option that I've thought up, and I could be convinced that this
> > separate tool is just how we're going to have to get #5 accomplished for
> > now due to the complexity of making the server do the archiving early on
> > and/or that it has other downsides (if the crash wasn't due to running
> > out of space, making the server wait to come online until after the WAL
> > has been archived wouldn't be ideal) that make it a poor choice overall,
> > but it seems like it's something that's at least worth some thought and
> > consideration of if there's a way to accomplish this with a bit less
> > manual user involvement, as that tends to be error-prone.
>
> As I said above, let's not just think that the archive_command
> failures are the only reasons (of course they are the main causes) for
> WAL file growth.

I agree that we should consider other reasons and how the suggested
option might interact with the other existing options we have.  This
doesn't really address the point that I was making though.

> The pg_walcleaner proposed here is a better option IMO for the
> developers/DBAs/service engineers to see if they can quickly bring up
> the crashed server avoiding some manual work of looking at which WAL
> files can be deleted as such. Also if they fixed the failed
> archive_command or wrong settings that caused the WAL files growth,
> the server will be able to do the other cleanup and come online
> cleanly. I'm sure many would have faced this problem of server crashes
> due to out of disk space at some point in time in production
> environments, especially with the older versions of postgres.

For some folks, it'd be better to avoid having a crash at all, but of
course there's downsides to that too.

For others, it strikes me as clearly simpler if they could just fix the
archive_command and start PG again rather than having to run some other
command while making sure that PG isn't running (and doesn't end up
getting started), and then PG could go do the archiving and, ideally,
also be doing WAL replay as soon as there's enough space to allow it,
and possibly even come back online once crash recovery is done and
continue to archive WAL just like normal operation.  I didn't really see
that idea addressed in your response and was hoping to get your thoughts
on it as a possible alternative approach to having a separate tool such
as this.

Thanks,

Stephen

Attachment
On 5/3/22 17:17, Stephen Frost wrote:
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
>>
>> The pg_walcleaner tool isn't intrusive in the sense that it doesn't
>> delete the WAL files that are required for the server to come up (as
>> it checks for the checkpoint redo WAL file), apart from this it has
>> archive_command too so no loss of the WAL file(s) at all unlike the
>> pgbackrest option.
> 
> Won't be any WAL loss with pgbackrest unless it's specifically
> configured to throw it away- again, it's a tradeoff.  Just suggesting
> that we could have that be part of core as an option.

To be clear, pgBackRest never deletes WAL from the pg_wal directory (or 
modifies that directory in any way). If archive-push-queue-max is 
configured that simply means it will notify Postgres that WAL have been 
archived if the max queue size has been exceeded (even though they have 
not been archived).

This should never lead to WAL being required for crash recovery being 
deleted unless there is a bug in Postgres.

But yeah, if they configure it there could be a loss of PITR capability.

Regards,
-David



On 25.04.22 20:39, Stephen Frost wrote:
> All of which isn't an issue if we don't have an external tool trying to
> do this and instead have the server doing it as the server knows its
> internal status, that the archive command has been failing long enough
> to pass the configuration threshold, and that the WAL isn't needed for
> crash recovery.  The ability to avoid having to crash and go through
> that process is pretty valuable.  Still, a crash may still happen and
> it'd be useful to have a clean way to deal with it.  I'm not really a
> fan of having to essentially configure this external command as well as
> have the server configured.  Have we settled that there's no way to make
> the server archive while there's no space available and before trying to
> write out more data?

I would also be in favor of not having an external command and instead 
pursue a solution built into the server along the ways you have 
outlined.  Besides the better integration and less potential for misuse 
that can be achieved that way, maintaining a separate tool has some 
constant overhead and if users only use it every ten years on average, 
it seems not worth it.



On Thu, Jun 30, 2022 at 5:33 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 25.04.22 20:39, Stephen Frost wrote:
> > All of which isn't an issue if we don't have an external tool trying to
> > do this and instead have the server doing it as the server knows its
> > internal status, that the archive command has been failing long enough
> > to pass the configuration threshold, and that the WAL isn't needed for
> > crash recovery.  The ability to avoid having to crash and go through
> > that process is pretty valuable.  Still, a crash may still happen and
> > it'd be useful to have a clean way to deal with it.  I'm not really a
> > fan of having to essentially configure this external command as well as
> > have the server configured.  Have we settled that there's no way to make
> > the server archive while there's no space available and before trying to
> > write out more data?
>
> I would also be in favor of not having an external command and instead
> pursue a solution built into the server along the ways you have
> outlined.  Besides the better integration and less potential for misuse
> that can be achieved that way, maintaining a separate tool has some
> constant overhead and if users only use it every ten years on average,
> it seems not worth it.

Thanks for the feedback. My understanding is this: introduce a GUC
(similar to max_slot_wal_keep_size), when set, beyond which postgres
will not keep the WAL files even if archiving is failing, am I right?

If my understanding is correct, are we going to say that postgres may
not archive "all" the WAL files that may be needed for PITR if the
archiving is failing for long enough? Will this be okay in production
environments?

Regards,
Bharath Rupireddy.