Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id 20221127233434.GA878043@nathanxps13
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints
List pgsql-hackers
Thanks for taking a look!

On Thu, Nov 24, 2022 at 05:31:02PM +0000, Simon Riggs wrote:
> * not sure I believe that everything it does can always be aborted out
> of and shutdown - to achieve that you will need a
> CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least

I did something like this earlier, but was advised to simply let the
functions finish as usual during shutdown [0].  I think this is what the
checkpointer process does today, anyway.

> * not sure why you want immediate execution of custodian tasks - I
> feel supporting two modes will be a lot harder. For me, I would run
> locally when !IsUnderPostmaster and also in an Assert build, so we can
> test it works right - i.e. running in its own process is just a
> production optimization for performance (which is the stated reason
> for having this)

I added this because 0004 involves requesting a task from the postmaster,
so checking for IsUnderPostmaster doesn't work.  Those tasks would always
run immediately.  However, we could use IsPostmasterEnvironment instead,
which would allow us to remove the "immediate" argument.  I did it this way
in v14.

I'm not sure about running locally in Assert builds.  It's true that would
help ensure there's test coverage for the task logic, but it would also
reduce coverage for the custodian logic.  And in general, I'm worried about
having Assert builds use a different code path than production builds.

> 0005 seems good from what I know
> * There is no check to see if it worked in any sane time

What did you have in mind?  Should the custodian begin emitting WARNINGs
after a while?

> * It seems possible that "Old" might change meaning - will that make
> it break/fail?

I don't believe so.

> Rather than explicitly use DEBUG1 everywhere I would have an
> #define CUSTODIAN_LOG_LEVEL     LOG
> so we can run with it in LOG mode and then set it to DEBUG1 with a one
> line change in a later phase of Beta

I can create a separate patch for this, but I don't think I've ever seen
this sort of thing before.  Is the idea just to help with debugging during
the development phase?

> I can't really comment with knowledge on sub-patches 0002 to 0004.
> 
> Perhaps you should aim to get 1, 5, 6 committed first and then return
> to the others in a later CF/separate thread?

That seems like a good idea since those are all relatively self-contained.
I removed 0002-0004 in v14.

[0] https://postgr.es/m/20220217065938.x2esfdppzypegn5j%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Amcheck verification of GiST and GIN
Next
From: Nathan Bossart
Date:
Subject: Re: wake up logical workers after ALTER SUBSCRIPTION