Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Andres Freund
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id 20220217231247.llfqjvvuk6rkawel@alap3.anarazel.de
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
Hi,

On 2022-02-17 14:58:38 -0800, Nathan Bossart wrote:
> On Thu, Feb 17, 2022 at 02:28:29PM -0800, Andres Freund wrote:
> > As far as I understand, the primary concern are logical decoding serialized
> > snapshots, because a lot of them can accumulate if there e.g. is an old unused
> > / far behind slot. It should be easy to reduce the number of those snapshots
> > by e.g. eliding some redundant ones. Perhaps we could also make backends in
> > logical decoding occasionally do a bit of cleanup themselves.
> > 
> > I've not seen reports of the number of mapping files to be an real issue?
> 
> I routinely see all four of these tasks impacting customers, but I'd say
> the most common one is the temporary file cleanup.

I took temp file cleanup and StartupReorderBuffer() "out of consideration" for
custodian, because they're not needed during normal running.


> Besides eliminating some redundant files and having backends perform some
> cleanup, what do you think about skipping the logical decoding cleanup
> during end-of-recovery/shutdown checkpoints?

I strongly disagree with it. Then you might never get the cleanup done, but
keep on operating until you hit corruption issues.


> > The improvements around deleting temporary files and serialized snapshots
> > afaict don't require a dedicated process - they're only relevant during
> > startup. We could use the approach of renaming the directory out of the way as
> > done in this patchset but perform the cleanup in the startup process after
> > we're up.
> 
> Perhaps this is a good place to start.  As I mentioned above, IME the
> temporary file cleanup is the most common problem, so I think even getting
> that one fixed would be a huge improvement.

Cool.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Use generation context to speed up tuplesorts
Next
From: Andres Freund
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints