Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary) - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)
Date
Msg-id CALj2ACXW2rYU_FmE_ci3UqnLbpzYTnaVW2Z_5NaOZnKzU1jMVg@mail.gmail.com
Whole thread Raw
In response to Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)  (Stephen Frost <sfrost@snowman.net>)
Responses Re: pg_walcleaner - new tool to detect, archive and delete the unneeded wal files (was Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Mon, Apr 18, 2022 at 7:41 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Bharath Rupireddy (bharath.rupireddyforpostgres@gmail.com) wrote:
> > Thanks for the comments. Here's a new tool called pg_walcleaner which
> > basically deletes (optionally archiving before deletion) the unneeded
> > WAL files.
> >
> > Please provide your thoughts and review the patches.
>
> Alright, I spent some more time thinking about this and contemplating
> what the next steps are... and I feel like the next step is basically
> "add a HINT when the server can't start due to being out of disk space
> that one should consider running pg_walcleaner" and at that point... why
> aren't we just, uh, doing that?  This is all still quite hand-wavy, but
> it sure would be nice to be able to avoid downtime due to a broken
> archiving setup.  pgbackrest has a way of doing this and while we, of
> course, discourage the use of that option, as it means throwing away
> WAL, it's an option that users have.  PG could have a similar option.
> Basically, to archive_command/library what max_slot_wal_keep_size is for
> slots.

Thanks. I get your point. The way I see it is that the postgres should
be self-aware of the about-to-get-full disk (probably when the data
directory size is 90%(configurable, of course) of total disk size) and
then freeze the new write operations (may be via new ALTER SYSTEM SET
READ-ONLY or setting default_transaction_read_only GUC) and then go
clean the unneeded WAL files by just invoking pg_walcleaner tool
perhaps. I think, so far, this kind of work has been done outside of
postgres. Even then, we might get into out-of-disk situations
depending on how frequently we check the data directory size to
compute the 90% configurable limit. Detecting the disk size is the KEY
here. Hence we need an offline invokable tool like pg_walcleaner.

 Actually, I was planning to write an extension with a background
worker doing this for us.

> That isn't to say that we shouldn't also have a tool like this, but it
> generally feels like we're taking a reactive approach here rather than a
> proactive one to addressing the root issue.

Agree. The offline tool like pg_walcleaner can help greatly even with
some sort of above internal/external disk space monitoring tools.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Next
From: Robert Haas
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)