Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id CALj2ACUPSG4aMqCEO9L+xSyfLYaL88E377rqV_Jsc7P82+U2xA@mail.gmail.com
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  ("Bossart, Nathan" <bossartn@amazon.com>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints
List pgsql-hackers
On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> On 12/1/21, 6:48 PM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> > +1 for the overall idea of making the checkpoint faster. In fact, we
> > here at our team have been thinking about this problem for a while. If
> > there are a lot of files that checkpoint has to loop over and remove,
> > IMO, that task can be delegated to someone else (maybe a background
> > worker called background cleaner or bg cleaner, of course, we can have
> > a GUC to enable or disable it). The checkpoint can just write some
>
> Right.  IMO it isn't optimal to have critical things like startup and
> checkpointing depend on somewhat-unrelated tasks.  I understand the
> desire to avoid adding additional processes, and maybe it is a bigger
> hammer than what is necessary to reduce the impact, but it seemed like
> a natural solution for this problem.  That being said, I'm all for
> exploring other ways to handle this.

Having a generic background cleaner process (controllable via a few
GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
temp files etc.) or some other task on behalf of the checkpointer,
seems to be the easiest solution.

I'm too open for other ideas.

> > Another idea could be to parallelize the checkpoint i.e. IIUC, the
> > tasks that checkpoint do in CheckPointGuts are independent and if we
> > have some counters like (how many snapshot/mapping files that the
> > server generated)
>
> Could you elaborate on this?  Is your idea that the checkpointer would
> create worker processes like autovacuum does?

Yes, I was thinking that the checkpointer creates one or more dynamic
background workers (we can assume one background worker for now) to
delete the files. If a threshold of files crosses (snapshot files
count is more than this threshold), the new worker gets spawned which
would then enumerate the files and delete the unneeded ones, the
checkpointer can proceed with the other tasks and finish the
checkpointing. Having said this, I prefer the background cleaner
approach over the dynamic background worker. The advantage with the
background cleaner being that it can do other tasks (like other kinds
of file deletion).

Another idea could be that, use the existing background writer to do
the file deletion while the checkpoint is happening. But again, this
might cause problems because the bg writer flushing dirty buffers will
get delayed.

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: Multi-Column List Partitioning
Next
From: Bharath Rupireddy
Date:
Subject: Re: Shouldn't postgres_fdw report warning when it gives up getting result from foreign server?