Thread: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
From
Bharath Rupireddy
Date:
Hi, pg_archivecleanup currently takes a WAL file name as input to delete the WAL files prior to it [1]. As suggested by Satya (cc-ed) in pg_replslotdata thread [2], can we enhance the pg_archivecleanup to automatically detect the last checkpoint (from control file) LSN, calculate the lowest restart_lsn required by the replication slots, if any (by reading the replication slot info from pg_logical directory), archive the unneeded (an archive_command similar to that of the one provided in the server config can be provided as an input) WAL files before finally deleting them? Making pg_archivecleanup tool as an end-to-end solution will help greatly in disk full situations because of WAL files growth (inactive replication slots, archive command failures, infrequent checkpoint etc.). Thoughts? [1] - When used as a standalone program all WAL files logically preceding the oldestkeptwalfile will be removed from archivelocation. https://www.postgresql.org/docs/devel/pgarchivecleanup.html [2] - https://www.postgresql.org/message-id/CAHg%2BQDc9xwN7EmuONT3T91pCqFG6Q-BCe6B-kM-by7r1uPEicg%40mail.gmail.com Regards, Bharath Rupireddy.
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
From
"Euler Taveira"
Date:
On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
pg_archivecleanup currently takes a WAL file name as input to deletethe WAL files prior to it [1]. As suggested by Satya (cc-ed) inpg_replslotdata thread [2], can we enhance the pg_archivecleanup toautomatically detect the last checkpoint (from control file) LSN,calculate the lowest restart_lsn required by the replication slots, ifany (by reading the replication slot info from pg_logical directory),archive the unneeded (an archive_command similar to that of the oneprovided in the server config can be provided as an input) WAL filesbefore finally deleting them? Making pg_archivecleanup tool as anend-to-end solution will help greatly in disk full situations becauseof WAL files growth (inactive replication slots, archive commandfailures, infrequent checkpoint etc.).
pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
suggesting to use it for removing files from pg_wal directory too? No, thanks.
WAL files are a key component for backup and replication. Hence, you cannot
deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
wouldn't occur if you have a monitoring system and alerts and someone to keep
an eye on it. If the disk full situation was caused by a failed archive command
or a disconnected standby, it is easy to figure out; the fix is simple.
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
From
Stephen Frost
Date:
Greetings, * Euler Taveira (euler@eulerto.com) wrote: > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > pg_archivecleanup currently takes a WAL file name as input to delete > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > automatically detect the last checkpoint (from control file) LSN, > > calculate the lowest restart_lsn required by the replication slots, if > > any (by reading the replication slot info from pg_logical directory), > > archive the unneeded (an archive_command similar to that of the one > > provided in the server config can be provided as an input) WAL files > > before finally deleting them? Making pg_archivecleanup tool as an > > end-to-end solution will help greatly in disk full situations because > > of WAL files growth (inactive replication slots, archive command > > failures, infrequent checkpoint etc.). The overall idea of having a tool for this isn't a bad idea, but .. > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > suggesting to use it for removing files from pg_wal directory too? No, thanks. We definitely shouldn't have it be part of pg_archivecleanup for the simple reason that it'll be really confusing and almost certainly will be mis-used. For my 2c, we should just remove pg_archivecleanup entirely. > WAL files are a key component for backup and replication. Hence, you cannot > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > wouldn't occur if you have a monitoring system and alerts and someone to keep > an eye on it. If the disk full situation was caused by a failed archive command > or a disconnected standby, it is easy to figure out; the fix is simple. This is perhaps a bit far- PG does, in fact, remove WAL files from PGDATA. Having a tool which will do this safely when the server isn't able to be brought online due to lack of disk space would certainly be helpful rather frequently. I agree that monitoring and alerting are things that everyone should implement and pay attention to, but that doesn't happen and instead people end up just blowing away pg_wal and corrupting their database when, had a tool existed, they could have avoided that happening and brought the system back online in relatively short order without any data loss. Thanks, Stephen
Attachment
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
From
Bharath Rupireddy
Date:
On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfrost@snowman.net> wrote: > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > > pg_archivecleanup currently takes a WAL file name as input to delete > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > > automatically detect the last checkpoint (from control file) LSN, > > > calculate the lowest restart_lsn required by the replication slots, if > > > any (by reading the replication slot info from pg_logical directory), > > > archive the unneeded (an archive_command similar to that of the one > > > provided in the server config can be provided as an input) WAL files > > > before finally deleting them? Making pg_archivecleanup tool as an > > > end-to-end solution will help greatly in disk full situations because > > > of WAL files growth (inactive replication slots, archive command > > > failures, infrequent checkpoint etc.). > > The overall idea of having a tool for this isn't a bad idea, but .. > > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > > suggesting to use it for removing files from pg_wal directory too? No, thanks. > > We definitely shouldn't have it be part of pg_archivecleanup for the > simple reason that it'll be really confusing and almost certainly will > be mis-used. +1 > > WAL files are a key component for backup and replication. Hence, you cannot > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > > wouldn't occur if you have a monitoring system and alerts and someone to keep > > an eye on it. If the disk full situation was caused by a failed archive command > > or a disconnected standby, it is easy to figure out; the fix is simple. > > This is perhaps a bit far- PG does, in fact, remove WAL files from > PGDATA. Having a tool which will do this safely when the server isn't > able to be brought online due to lack of disk space would certainly be > helpful rather frequently. I agree that monitoring and alerting are > things that everyone should implement and pay attention to, but that > doesn't happen and instead people end up just blowing away pg_wal and > corrupting their database when, had a tool existed, they could have > avoided that happening and brought the system back online in relatively > short order without any data loss. Thanks. Yes, the end-to-end tool is helpful in rather eventual situations and having it in the core is more helpful instead of every postgres vendor developing their own solution and many times it's hard to get it right. Also, I agree to not club this idea with pg_archviecleanup. How about having a new tool like pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in eventual situations? Regards, Bharath Rupireddy.
pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Bharath Rupireddy
Date:
On Wed, Dec 29, 2021 at 8:06 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost <sfrost@snowman.net> wrote: > > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > > > pg_archivecleanup currently takes a WAL file name as input to delete > > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > > > automatically detect the last checkpoint (from control file) LSN, > > > > calculate the lowest restart_lsn required by the replication slots, if > > > > any (by reading the replication slot info from pg_logical directory), > > > > archive the unneeded (an archive_command similar to that of the one > > > > provided in the server config can be provided as an input) WAL files > > > > before finally deleting them? Making pg_archivecleanup tool as an > > > > end-to-end solution will help greatly in disk full situations because > > > > of WAL files growth (inactive replication slots, archive command > > > > failures, infrequent checkpoint etc.). > > > > The overall idea of having a tool for this isn't a bad idea, but .. > > > > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > > > suggesting to use it for removing files from pg_wal directory too? No, thanks. > > > > We definitely shouldn't have it be part of pg_archivecleanup for the > > simple reason that it'll be really confusing and almost certainly will > > be mis-used. > > +1 > > > > WAL files are a key component for backup and replication. Hence, you cannot > > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > > > wouldn't occur if you have a monitoring system and alerts and someone to keep > > > an eye on it. If the disk full situation was caused by a failed archive command > > > or a disconnected standby, it is easy to figure out; the fix is simple. > > > > This is perhaps a bit far- PG does, in fact, remove WAL files from > > PGDATA. Having a tool which will do this safely when the server isn't > > able to be brought online due to lack of disk space would certainly be > > helpful rather frequently. I agree that monitoring and alerting are > > things that everyone should implement and pay attention to, but that > > doesn't happen and instead people end up just blowing away pg_wal and > > corrupting their database when, had a tool existed, they could have > > avoided that happening and brought the system back online in relatively > > short order without any data loss. > > Thanks. Yes, the end-to-end tool is helpful in rather eventual > situations and having it in the core is more helpful instead of every > postgres vendor developing their own solution and many times it's hard > to get it right. Also, I agree to not club this idea with > pg_archviecleanup. How about having a new tool like > pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in > eventual situations? Thanks for the comments. Here's a new tool called pg_walcleaner which basically deletes (optionally archiving before deletion) the unneeded WAL files. Please provide your thoughts and review the patches. Regards, Bharath Rupireddy.
Attachment
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Stephen Frost
Date:
Greetings, * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > Thanks for the comments. Here's a new tool called pg_walcleaner which > basically deletes (optionally archiving before deletion) the unneeded > WAL files. > > Please provide your thoughts and review the patches. Alright, I spent some more time thinking about this and contemplating what the next steps are... and I feel like the next step is basically "add a HINT when the server can't start due to being out of disk space that one should consider running pg_walcleaner" and at that point... why aren't we just, uh, doing that? This is all still quite hand-wavy, but it sure would be nice to be able to avoid downtime due to a broken archiving setup. pgbackrest has a way of doing this and while we, of course, discourage the use of that option, as it means throwing away WAL, it's an option that users have. PG could have a similar option. Basically, to archive_command/library what max_slot_wal_keep_size is for slots. That isn't to say that we shouldn't also have a tool like this, but it generally feels like we're taking a reactive approach here rather than a proactive one to addressing the root issue. Thanks, Stephen
Attachment
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Bharath Rupireddy
Date:
On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote: > > Greetings, > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > Thanks for the comments. Here's a new tool called pg_walcleaner which > > basically deletes (optionally archiving before deletion) the unneeded > > WAL files. > > > > Please provide your thoughts and review the patches. > > Alright, I spent some more time thinking about this and contemplating > what the next steps are... and I feel like the next step is basically > "add a HINT when the server can't start due to being out of disk space > that one should consider running pg_walcleaner" and at that point... why > aren't we just, uh, doing that? This is all still quite hand-wavy, but > it sure would be nice to be able to avoid downtime due to a broken > archiving setup. pgbackrest has a way of doing this and while we, of > course, discourage the use of that option, as it means throwing away > WAL, it's an option that users have. PG could have a similar option. > Basically, to archive_command/library what max_slot_wal_keep_size is for > slots. Thanks. I get your point. The way I see it is that the postgres should be self-aware of the about-to-get-full disk (probably when the data directory size is 90%(configurable, of course) of total disk size) and then freeze the new write operations (may be via new ALTER SYSTEM SET READ-ONLY or setting default_transaction_read_only GUC) and then go clean the unneeded WAL files by just invoking pg_walcleaner tool perhaps. I think, so far, this kind of work has been done outside of postgres. Even then, we might get into out-of-disk situations depending on how frequently we check the data directory size to compute the 90% configurable limit. Detecting the disk size is the KEY here. Hence we need an offline invokable tool like pg_walcleaner. Actually, I was planning to write an extension with a background worker doing this for us. > That isn't to say that we shouldn't also have a tool like this, but it > generally feels like we're taking a reactive approach here rather than a > proactive one to addressing the root issue. Agree. The offline tool like pg_walcleaner can help greatly even with some sort of above internal/external disk space monitoring tools. Regards, Bharath Rupireddy.
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Stephen Frost
Date:
Greeting, * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote: > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > > Thanks for the comments. Here's a new tool called pg_walcleaner which > > > basically deletes (optionally archiving before deletion) the unneeded > > > WAL files. > > > > > > Please provide your thoughts and review the patches. > > > > Alright, I spent some more time thinking about this and contemplating > > what the next steps are... and I feel like the next step is basically > > "add a HINT when the server can't start due to being out of disk space > > that one should consider running pg_walcleaner" and at that point... why > > aren't we just, uh, doing that? This is all still quite hand-wavy, but > > it sure would be nice to be able to avoid downtime due to a broken > > archiving setup. pgbackrest has a way of doing this and while we, of > > course, discourage the use of that option, as it means throwing away > > WAL, it's an option that users have. PG could have a similar option. > > Basically, to archive_command/library what max_slot_wal_keep_size is for > > slots. > > Thanks. I get your point. The way I see it is that the postgres should > be self-aware of the about-to-get-full disk (probably when the data > directory size is 90%(configurable, of course) of total disk size) and > then freeze the new write operations (may be via new ALTER SYSTEM SET > READ-ONLY or setting default_transaction_read_only GUC) and then go > clean the unneeded WAL files by just invoking pg_walcleaner tool > perhaps. I think, so far, this kind of work has been done outside of > postgres. Even then, we might get into out-of-disk situations > depending on how frequently we check the data directory size to > compute the 90% configurable limit. Detecting the disk size is the KEY > here. Hence we need an offline invokable tool like pg_walcleaner. Ugh, last I checked, figuring out if a given filesystem is near being full is a pain to do in a cross-platform way. Why not just do exactly what we already are doing for replication slots, but for archive_command? Then we wouldn't need to go into a read-only mode. Perhaps going into a read-only mode would be an alternative option to that but we should definitely be letting the admin pick what to do in such a case. The idea of going read-only and then *also* removing WAL files doesn't seem like it's ever the right choice though..? As for worrying about frequency.. that seems unlikely to be that serious of an issue, if we just check how far behind we are with each time we try to run archive_command. That's basically how the pgbackrest option works and we've had few issues with it not being 'soon enough'. > Actually, I was planning to write an extension with a background > worker doing this for us. ... but we have a background worker already for archiving that could handle this for us, doesn't seem like we need another. > > That isn't to say that we shouldn't also have a tool like this, but it > > generally feels like we're taking a reactive approach here rather than a > > proactive one to addressing the root issue. > > Agree. The offline tool like pg_walcleaner can help greatly even with > some sort of above internal/external disk space monitoring tools. See, this seems like a really bad idea to me. I'd be very concerned about people mis-using this tool in some way and automating its usage strikes me as absolutely exactly that.. Are we sure that we can guarantee that we don't remove things we shouldn't when this ends up getting run against a running cluster from someone's automated tooling? Or when someone sees that it refuses to run for $reason and tries to.. "fix" that? Seems quite risky to me.. I'd probably want to put similar caveats around using this tool as I do around pg_resetwal when doing training- that is, don't ever, ever, ever use this, heh. Thanks, Stephen
Attachment
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Bharath Rupireddy
Date:
On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost <sfrost@snowman.net> wrote: > > Greeting, > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote: > > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > > > Thanks for the comments. Here's a new tool called pg_walcleaner which > > > > basically deletes (optionally archiving before deletion) the unneeded > > > > WAL files. > > > > > > > > Please provide your thoughts and review the patches. > > > > > > Alright, I spent some more time thinking about this and contemplating > > > what the next steps are... and I feel like the next step is basically > > > "add a HINT when the server can't start due to being out of disk space > > > that one should consider running pg_walcleaner" and at that point... why > > > aren't we just, uh, doing that? This is all still quite hand-wavy, but > > > it sure would be nice to be able to avoid downtime due to a broken > > > archiving setup. pgbackrest has a way of doing this and while we, of > > > course, discourage the use of that option, as it means throwing away > > > WAL, it's an option that users have. PG could have a similar option. > > > Basically, to archive_command/library what max_slot_wal_keep_size is for > > > slots. > > > > Thanks. I get your point. The way I see it is that the postgres should > > be self-aware of the about-to-get-full disk (probably when the data > > directory size is 90%(configurable, of course) of total disk size) and > > then freeze the new write operations (may be via new ALTER SYSTEM SET > > READ-ONLY or setting default_transaction_read_only GUC) and then go > > clean the unneeded WAL files by just invoking pg_walcleaner tool > > perhaps. I think, so far, this kind of work has been done outside of > > postgres. Even then, we might get into out-of-disk situations > > depending on how frequently we check the data directory size to > > compute the 90% configurable limit. Detecting the disk size is the KEY > > here. Hence we need an offline invokable tool like pg_walcleaner. > > Ugh, last I checked, figuring out if a given filesystem is near being > full is a pain to do in a cross-platform way. Why not just do exactly > what we already are doing for replication slots, but for > archive_command? Do you mean to say that if the archvie_command fails, say, for "some time" or "some number of attempts", just let the server not bother about it and checkpoint delete the WAL files instead of going out of disk? If this is the thought, then it's more dangerous as we might end up losing the WAL forever. For invalidating replication slots, it's okay because the required WAL can exist somewhere (either on the primary or on the archive location). > > > That isn't to say that we shouldn't also have a tool like this, but it > > > generally feels like we're taking a reactive approach here rather than a > > > proactive one to addressing the root issue. > > > > Agree. The offline tool like pg_walcleaner can help greatly even with > > some sort of above internal/external disk space monitoring tools. > > See, this seems like a really bad idea to me. I'd be very concerned > about people mis-using this tool in some way and automating its usage > strikes me as absolutely exactly that.. Are we sure that we can > guarantee that we don't remove things we shouldn't when this ends up > getting run against a running cluster from someone's automated tooling? > Or when someone sees that it refuses to run for $reason and tries to.. > "fix" that? Seems quite risky to me.. I'd probably want to put similar > caveats around using this tool as I do around pg_resetwal when doing > training- that is, don't ever, ever, ever use this, heh. The initial version of the patch doesn't check if the server crashed or not before running it. I was thinking of looking at the postmaster.pid or pg_control file (however they don't guarantee whether the server is up or crashed because the server can crash without deleting postmaster.pid or updating pg_control file). Another idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see if the server is up and running, if yes, exit, otherwise proceed with its work. Also, to not cause losing of WAL permanently, we must recommend using archvie_command so that the WAL can be moved to an alternative location (could be the same archvie_location that primary uses). And yes, we must have clear usage guidelines in the docs. Regards, Bharath Rupireddy.
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Stephen Frost
Date:
Greetings, * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > On Mon, Apr 18, 2022 at 8:48 PM Stephen Frost <sfrost@snowman.net> wrote: > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > > On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote: > > > > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > > > > > Thanks for the comments. Here's a new tool called pg_walcleaner which > > > > > basically deletes (optionally archiving before deletion) the unneeded > > > > > WAL files. > > > > > > > > > > Please provide your thoughts and review the patches. > > > > > > > > Alright, I spent some more time thinking about this and contemplating > > > > what the next steps are... and I feel like the next step is basically > > > > "add a HINT when the server can't start due to being out of disk space > > > > that one should consider running pg_walcleaner" and at that point... why > > > > aren't we just, uh, doing that? This is all still quite hand-wavy, but > > > > it sure would be nice to be able to avoid downtime due to a broken > > > > archiving setup. pgbackrest has a way of doing this and while we, of > > > > course, discourage the use of that option, as it means throwing away > > > > WAL, it's an option that users have. PG could have a similar option. > > > > Basically, to archive_command/library what max_slot_wal_keep_size is for > > > > slots. > > > > > > Thanks. I get your point. The way I see it is that the postgres should > > > be self-aware of the about-to-get-full disk (probably when the data > > > directory size is 90%(configurable, of course) of total disk size) and > > > then freeze the new write operations (may be via new ALTER SYSTEM SET > > > READ-ONLY or setting default_transaction_read_only GUC) and then go > > > clean the unneeded WAL files by just invoking pg_walcleaner tool > > > perhaps. I think, so far, this kind of work has been done outside of > > > postgres. Even then, we might get into out-of-disk situations > > > depending on how frequently we check the data directory size to > > > compute the 90% configurable limit. Detecting the disk size is the KEY > > > here. Hence we need an offline invokable tool like pg_walcleaner. > > > > Ugh, last I checked, figuring out if a given filesystem is near being > > full is a pain to do in a cross-platform way. Why not just do exactly > > what we already are doing for replication slots, but for > > archive_command? > > Do you mean to say that if the archvie_command fails, say, for "some > time" or "some number of attempts", just let the server not bother > about it and checkpoint delete the WAL files instead of going out of > disk? If this is the thought, then it's more dangerous as we might end > up losing the WAL forever. For invalidating replication slots, it's > okay because the required WAL can exist somewhere (either on the > primary or on the archive location). I was thinking more specifically along the lines of "if there's > X GB of WAL that hasn't been archived, give up on archiving anything new" (which is how the pgbackrest option works). As archiving with this command is optional, it does present the same risk too. Perhaps if we flipped it around to require the archive_command be provided then it'd be a bit better, though we would also need a way for users to ask for us to just delete the WAL without archiving it. There again though ... the server already has a way of both archiving and removing archived WAL and also has now grown the archive_library option, something that this tool would be pretty hard to replicate, I feel like, as it wouldn't be loading the library into a PG backend anymore. As we don't have any real archive libraries yet, it's hard to say if that's going to be an actual issue or not. Something to consider though. > > > > That isn't to say that we shouldn't also have a tool like this, but it > > > > generally feels like we're taking a reactive approach here rather than a > > > > proactive one to addressing the root issue. > > > > > > Agree. The offline tool like pg_walcleaner can help greatly even with > > > some sort of above internal/external disk space monitoring tools. > > > > See, this seems like a really bad idea to me. I'd be very concerned > > about people mis-using this tool in some way and automating its usage > > strikes me as absolutely exactly that.. Are we sure that we can > > guarantee that we don't remove things we shouldn't when this ends up > > getting run against a running cluster from someone's automated tooling? > > Or when someone sees that it refuses to run for $reason and tries to.. > > "fix" that? Seems quite risky to me.. I'd probably want to put similar > > caveats around using this tool as I do around pg_resetwal when doing > > training- that is, don't ever, ever, ever use this, heh. > > The initial version of the patch doesn't check if the server crashed > or not before running it. I was thinking of looking at the > postmaster.pid or pg_control file (however they don't guarantee > whether the server is up or crashed because the server can crash > without deleting postmaster.pid or updating pg_control file). Another > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see > if the server is up and running, if yes, exit, otherwise proceed with > its work. All of which isn't an issue if we don't have an external tool trying to do this and instead have the server doing it as the server knows its internal status, that the archive command has been failing long enough to pass the configuration threshold, and that the WAL isn't needed for crash recovery. The ability to avoid having to crash and go through that process is pretty valuable. Still, a crash may still happen and it'd be useful to have a clean way to deal with it. I'm not really a fan of having to essentially configure this external command as well as have the server configured. Have we settled that there's no way to make the server archive while there's no space available and before trying to write out more data? > Also, to not cause losing of WAL permanently, we must recommend using > archvie_command so that the WAL can be moved to an alternative > location (could be the same archvie_location that primary uses). I agree we should recommend using archive_command or archive_library, of course, but if that's been done and is working properly then this tool isn't really needed. The use-case we're trying to address, I believe, is something like: 1) archive command starts failing for some reason 2) WAL piles up on the primary 3) primary runs out of disk space, crash happens 4) archive command gets 'fixed' in some fashion 5) WAL is archived and removed from primary 6) primary is restarted and able to go through crash recovery 7) server is online again Now, I was trying to suggest an approach to addressing the issue at #2, that is, avoid having WAL pile up without end on the primary and avoid the crash in the first place. For users who care more about uptime and less about WAL, that's likely what they want. For users who care more about WAL than uptime, it'd be good to have a way to help them too, but to do that, #4 has to happen and, once that's done, #5 and following just need to be accomplished in whatever way is simplest. The thought I'm having here is that the simplest answer, at least from the user's perspective, is that the server is able to just be brought up with the fixed archive command and everything just works- archiving happens, space is free'd up, and the server comes up and continues running. I accept that it isn't this patch's job or mandate to go implement some new option that I've thought up, and I could be convinced that this separate tool is just how we're going to have to get #5 accomplished for now due to the complexity of making the server do the archiving early on and/or that it has other downsides (if the crash wasn't due to running out of space, making the server wait to come online until after the WAL has been archived wouldn't be ideal) that make it a poor choice overall, but it seems like it's something that's at least worth some thought and consideration of if there's a way to accomplish this with a bit less manual user involvement, as that tends to be error-prone. Thanks, Stephen
Attachment
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Bharath Rupireddy
Date:
On Tue, Apr 26, 2022 at 12:09 AM Stephen Frost <sfrost@snowman.net> wrote: > > I was thinking more specifically along the lines of "if there's > X GB > of WAL that hasn't been archived, give up on archiving anything new" > (which is how the pgbackrest option works). IMO, this option is dangerous because the users might lose the unarchived WAL files completely. Instead, fixing the archive_command either before the server runs out of disk space (if someone/some monitoring detected the archive_command failure) or after the server crashes with no space. > As archiving with this command is optional, it does present the same > risk too. Perhaps if we flipped it around to require the > archive_command be provided then it'd be a bit better, though we would > also need a way for users to ask for us to just delete the WAL without > archiving it. There again though ... the server already has a way of > both archiving and removing archived WAL and also has now grown the > archive_library option, something that this tool would be pretty hard to > replicate, I feel like, as it wouldn't be loading the library into a PG > backend anymore. As we don't have any real archive libraries yet, it's > hard to say if that's going to be an actual issue or not. Something to > consider though. Actually, why are we just thinking that the WAL files grow up only because of archive command failures (of course, it's the main cause even in a well-configured server)?. Imagine if the checkpoints weren't happening frequently for whatever reasons or the max_slot_wal_keep_size or wal_keep_size weren't set appropriately (even if they set properly, someone, could be an attacker, can reset them), high write activity generating huge WAL files (especially on smaller servers) - these too can be reasons for server going down because of WAL files growth. The main motto of this tool is to use (by DBAs or service engineers) it as a last resort option after the server goes out of disk space to quickly bring the server back online so that the regular cleaning up activity can take place or the storage scale operations can be performed if required. There's a dry run option in pg_walcleaner to see if it can free up some space at all. If required we can provide an option to archive and delete only the maximum of specified number of WAL files. I'm okay with making the archive_command mandatory and if archiving isn't required to be used with the pg_walcleaner tool they can set some dummy values such as archive_command='/bin/true'. > > The initial version of the patch doesn't check if the server crashed > > or not before running it. I was thinking of looking at the > > postmaster.pid or pg_control file (however they don't guarantee > > whether the server is up or crashed because the server can crash > > without deleting postmaster.pid or updating pg_control file). Another > > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see > > if the server is up and running, if yes, exit, otherwise proceed with > > its work. > > All of which isn't an issue if we don't have an external tool trying to > do this and instead have the server doing it as the server knows its > internal status, that the archive command has been failing long enough > to pass the configuration threshold, and that the WAL isn't needed for > crash recovery. The ability to avoid having to crash and go through > that process is pretty valuable. Still, a crash may still happen and > it'd be useful to have a clean way to deal with it. I'm not really a > fan of having to essentially configure this external command as well as > have the server configured. Have we settled that there's no way to make > the server archive while there's no space available and before trying to > write out more data? The pg_walcleaner tool isn't intrusive in the sense that it doesn't delete the WAL files that are required for the server to come up (as it checks for the checkpoint redo WAL file), apart from this it has archive_command too so no loss of the WAL file(s) at all unlike the pgbackrest option. > > Also, to not cause losing of WAL permanently, we must recommend using > > archvie_command so that the WAL can be moved to an alternative > > location (could be the same archvie_location that primary uses). > > I agree we should recommend using archive_command or archive_library, of > course, but if that's been done and is working properly then this tool > isn't really needed. The use-case we're trying to address, I believe, > is something like: > > 1) archive command starts failing for some reason > 2) WAL piles up on the primary > 3) primary runs out of disk space, crash happens > 4) archive command gets 'fixed' in some fashion > 5) WAL is archived and removed from primary > 6) primary is restarted and able to go through crash recovery > 7) server is online again > > Now, I was trying to suggest an approach to addressing the issue at #2, > that is, avoid having WAL pile up without end on the primary and avoid > the crash in the first place. For users who care more about uptime and > less about WAL, that's likely what they want. > > For users who care more about WAL than uptime, it'd be good to have a > way to help them too, but to do that, #4 has to happen and, once that's > done, #5 and following just need to be accomplished in whatever way is > simplest. The thought I'm having here is that the simplest answer, at > least from the user's perspective, is that the server is able to just be > brought up with the fixed archive command and everything just works- > archiving happens, space is free'd up, and the server comes up and > continues running. > > I accept that it isn't this patch's job or mandate to go implement some > new option that I've thought up, and I could be convinced that this > separate tool is just how we're going to have to get #5 accomplished for > now due to the complexity of making the server do the archiving early on > and/or that it has other downsides (if the crash wasn't due to running > out of space, making the server wait to come online until after the WAL > has been archived wouldn't be ideal) that make it a poor choice overall, > but it seems like it's something that's at least worth some thought and > consideration of if there's a way to accomplish this with a bit less > manual user involvement, as that tends to be error-prone. As I said above, let's not just think that the archive_command failures are the only reasons (of course they are the main causes) for WAL file growth. The pg_walcleaner proposed here is a better option IMO for the developers/DBAs/service engineers to see if they can quickly bring up the crashed server avoiding some manual work of looking at which WAL files can be deleted as such. Also if they fixed the failed archive_command or wrong settings that caused the WAL files growth, the server will be able to do the other cleanup and come online cleanly. I'm sure many would have faced this problem of server crashes due to out of disk space at some point in time in production environments, especially with the older versions of postgres. Regards, Bharath Rupireddy.
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Stephen Frost
Date:
Greetings, * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: > On Tue, Apr 26, 2022 at 12:09 AM Stephen Frost <sfrost@snowman.net> wrote: > > I was thinking more specifically along the lines of "if there's > X GB > > of WAL that hasn't been archived, give up on archiving anything new" > > (which is how the pgbackrest option works). > > IMO, this option is dangerous because the users might lose the > unarchived WAL files completely. Instead, fixing the archive_command > either before the server runs out of disk space (if someone/some > monitoring detected the archive_command failure) or after the server > crashes with no space. Certainly, fixing the archive_command would be the best answer. We're here because we know that doesn't always happen and are looking for other ways to avoid the server crashing or to at least help clean things up if a crash does happen. > > As archiving with this command is optional, it does present the same > > risk too. Perhaps if we flipped it around to require the > > archive_command be provided then it'd be a bit better, though we would > > also need a way for users to ask for us to just delete the WAL without > > archiving it. There again though ... the server already has a way of > > both archiving and removing archived WAL and also has now grown the > > archive_library option, something that this tool would be pretty hard to > > replicate, I feel like, as it wouldn't be loading the library into a PG > > backend anymore. As we don't have any real archive libraries yet, it's > > hard to say if that's going to be an actual issue or not. Something to > > consider though. > > Actually, why are we just thinking that the WAL files grow up only > because of archive command failures (of course, it's the main cause > even in a well-configured server)?. Imagine if the checkpoints weren't > happening frequently for whatever reasons or the > max_slot_wal_keep_size or wal_keep_size weren't set appropriately > (even if they set properly, someone, could be an attacker, can reset > them), high write activity generating huge WAL files (especially on > smaller servers) - these too can be reasons for server going down > because of WAL files growth. The option that I outline would probably be the "end" of all of the above- that is, its job is to absolutely keep the pg_wal filesystem from running out of space by making sure that pg_wal doesn't grow beyond the configured size. That might mean that we don't keep wal_keep_size amount of WAL or maybe get rid of WAL before max_slot_wal_keep_size. If checkpoints aren't happening, that's different and throwing away WAL would mean corruption on the primary and so we'd just have to accept a crash in that case but I don't think that's anywhere near as common as the other cases. We would probably want to throw warnings if someone configured wal_keep_size bigger than throw_away_unneeded_wal or whatever. > The main motto of this tool is to use (by DBAs or service engineers) > it as a last resort option after the server goes out of disk space to > quickly bring the server back online so that the regular cleaning up > activity can take place or the storage scale operations can be > performed if required. There's a dry run option in pg_walcleaner to > see if it can free up some space at all. If required we can provide an > option to archive and delete only the maximum of specified number of > WAL files. Unfortunately, it's not likely to be very quick if archive command is set to anything real. Interesting idea to have an option along the lines of "please archive X amount of WAL", but there again- the server could, in theory anyway, start up and get a few WAL segments archived and then start doing replay from the crash, and if it ran out of space and archiving was still happening, it could perhaps just wait and try again. With the ALTER SYSTEM READ ONLY, it could maybe even just go read-only if it ran out of space in the first place and continue to try and archive WAL. Kind of hard to say how much additional space is needed to get through crash recovery and so this suggested option would always end up having to be a pretty broad guess and you absolutely couldn't start the server while this command is running (which I would definitely be worried would end up happening...), so that's also something to think about. > I'm okay with making the archive_command mandatory and if archiving > isn't required to be used with the pg_walcleaner tool they can set > some dummy values such as archive_command='/bin/true'. Yeah, that seems at least a bit better. I imagine a lot of people will probably use it exactly like that, but this will hopefully at least discourage them from just throwing it away. > > > The initial version of the patch doesn't check if the server crashed > > > or not before running it. I was thinking of looking at the > > > postmaster.pid or pg_control file (however they don't guarantee > > > whether the server is up or crashed because the server can crash > > > without deleting postmaster.pid or updating pg_control file). Another > > > idea is to let pg_walcleaner fire a sample query ('SELECT 1') to see > > > if the server is up and running, if yes, exit, otherwise proceed with > > > its work. > > > > All of which isn't an issue if we don't have an external tool trying to > > do this and instead have the server doing it as the server knows its > > internal status, that the archive command has been failing long enough > > to pass the configuration threshold, and that the WAL isn't needed for > > crash recovery. The ability to avoid having to crash and go through > > that process is pretty valuable. Still, a crash may still happen and > > it'd be useful to have a clean way to deal with it. I'm not really a > > fan of having to essentially configure this external command as well as > > have the server configured. Have we settled that there's no way to make > > the server archive while there's no space available and before trying to > > write out more data? > > The pg_walcleaner tool isn't intrusive in the sense that it doesn't > delete the WAL files that are required for the server to come up (as > it checks for the checkpoint redo WAL file), apart from this it has > archive_command too so no loss of the WAL file(s) at all unlike the > pgbackrest option. Won't be any WAL loss with pgbackrest unless it's specifically configured to throw it away- again, it's a tradeoff. Just suggesting that we could have that be part of core as an option. > > > Also, to not cause losing of WAL permanently, we must recommend using > > > archvie_command so that the WAL can be moved to an alternative > > > location (could be the same archvie_location that primary uses). > > > > I agree we should recommend using archive_command or archive_library, of > > course, but if that's been done and is working properly then this tool > > isn't really needed. The use-case we're trying to address, I believe, > > is something like: > > > > 1) archive command starts failing for some reason > > 2) WAL piles up on the primary > > 3) primary runs out of disk space, crash happens > > 4) archive command gets 'fixed' in some fashion > > 5) WAL is archived and removed from primary > > 6) primary is restarted and able to go through crash recovery > > 7) server is online again > > > > Now, I was trying to suggest an approach to addressing the issue at #2, > > that is, avoid having WAL pile up without end on the primary and avoid > > the crash in the first place. For users who care more about uptime and > > less about WAL, that's likely what they want. > > > > For users who care more about WAL than uptime, it'd be good to have a > > way to help them too, but to do that, #4 has to happen and, once that's > > done, #5 and following just need to be accomplished in whatever way is > > simplest. The thought I'm having here is that the simplest answer, at > > least from the user's perspective, is that the server is able to just be > > brought up with the fixed archive command and everything just works- > > archiving happens, space is free'd up, and the server comes up and > > continues running. > > > > I accept that it isn't this patch's job or mandate to go implement some > > new option that I've thought up, and I could be convinced that this > > separate tool is just how we're going to have to get #5 accomplished for > > now due to the complexity of making the server do the archiving early on > > and/or that it has other downsides (if the crash wasn't due to running > > out of space, making the server wait to come online until after the WAL > > has been archived wouldn't be ideal) that make it a poor choice overall, > > but it seems like it's something that's at least worth some thought and > > consideration of if there's a way to accomplish this with a bit less > > manual user involvement, as that tends to be error-prone. > > As I said above, let's not just think that the archive_command > failures are the only reasons (of course they are the main causes) for > WAL file growth. I agree that we should consider other reasons and how the suggested option might interact with the other existing options we have. This doesn't really address the point that I was making though. > The pg_walcleaner proposed here is a better option IMO for the > developers/DBAs/service engineers to see if they can quickly bring up > the crashed server avoiding some manual work of looking at which WAL > files can be deleted as such. Also if they fixed the failed > archive_command or wrong settings that caused the WAL files growth, > the server will be able to do the other cleanup and come online > cleanly. I'm sure many would have faced this problem of server crashes > due to out of disk space at some point in time in production > environments, especially with the older versions of postgres. For some folks, it'd be better to avoid having a crash at all, but of course there's downsides to that too. For others, it strikes me as clearly simpler if they could just fix the archive_command and start PG again rather than having to run some other command while making sure that PG isn't running (and doesn't end up getting started), and then PG could go do the archiving and, ideally, also be doing WAL replay as soon as there's enough space to allow it, and possibly even come back online once crash recovery is done and continue to archive WAL just like normal operation. I didn't really see that idea addressed in your response and was hoping to get your thoughts on it as a possible alternative approach to having a separate tool such as this. Thanks, Stephen
Attachment
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
David Steele
Date:
On 5/3/22 17:17, Stephen Frost wrote: > * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote: >> >> The pg_walcleaner tool isn't intrusive in the sense that it doesn't >> delete the WAL files that are required for the server to come up (as >> it checks for the checkpoint redo WAL file), apart from this it has >> archive_command too so no loss of the WAL file(s) at all unlike the >> pgbackrest option. > > Won't be any WAL loss with pgbackrest unless it's specifically > configured to throw it away- again, it's a tradeoff. Just suggesting > that we could have that be part of core as an option. To be clear, pgBackRest never deletes WAL from the pg_wal directory (or modifies that directory in any way). If archive-push-queue-max is configured that simply means it will notify Postgres that WAL have been archived if the max queue size has been exceeded (even though they have not been archived). This should never lead to WAL being required for crash recovery being deleted unless there is a bug in Postgres. But yeah, if they configure it there could be a loss of PITR capability. Regards, -David
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Peter Eisentraut
Date:
On 25.04.22 20:39, Stephen Frost wrote: > All of which isn't an issue if we don't have an external tool trying to > do this and instead have the server doing it as the server knows its > internal status, that the archive command has been failing long enough > to pass the configuration threshold, and that the WAL isn't needed for > crash recovery. The ability to avoid having to crash and go through > that process is pretty valuable. Still, a crash may still happen and > it'd be useful to have a clean way to deal with it. I'm not really a > fan of having to essentially configure this external command as well as > have the server configured. Have we settled that there's no way to make > the server archive while there's no space available and before trying to > write out more data? I would also be in favor of not having an external command and instead pursue a solution built into the server along the ways you have outlined. Besides the better integration and less potential for misuse that can be achieved that way, maintaining a separate tool has some constant overhead and if users only use it every ten years on average, it seems not worth it.
Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
From
Bharath Rupireddy
Date:
On Thu, Jun 30, 2022 at 5:33 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 25.04.22 20:39, Stephen Frost wrote: > > All of which isn't an issue if we don't have an external tool trying to > > do this and instead have the server doing it as the server knows its > > internal status, that the archive command has been failing long enough > > to pass the configuration threshold, and that the WAL isn't needed for > > crash recovery. The ability to avoid having to crash and go through > > that process is pretty valuable. Still, a crash may still happen and > > it'd be useful to have a clean way to deal with it. I'm not really a > > fan of having to essentially configure this external command as well as > > have the server configured. Have we settled that there's no way to make > > the server archive while there's no space available and before trying to > > write out more data? > > I would also be in favor of not having an external command and instead > pursue a solution built into the server along the ways you have > outlined. Besides the better integration and less potential for misuse > that can be achieved that way, maintaining a separate tool has some > constant overhead and if users only use it every ten years on average, > it seems not worth it. Thanks for the feedback. My understanding is this: introduce a GUC (similar to max_slot_wal_keep_size), when set, beyond which postgres will not keep the WAL files even if archiving is failing, am I right? If my understanding is correct, are we going to say that postgres may not archive "all" the WAL files that may be needed for PITR if the archiving is failing for long enough? Will this be okay in production environments? Regards, Bharath Rupireddy.