Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id CANbhV-GXQ+owyk_jnowFRJEVX1TDgQrnQrgQb6C87V6jTecpZA@mail.gmail.com
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints
List pgsql-hackers
On Sun, 27 Nov 2022 at 23:34, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Thanks for taking a look!
>
> On Thu, Nov 24, 2022 at 05:31:02PM +0000, Simon Riggs wrote:
> > * not sure I believe that everything it does can always be aborted out
> > of and shutdown - to achieve that you will need a
> > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
>
> I did something like this earlier, but was advised to simply let the
> functions finish as usual during shutdown [0].  I think this is what the
> checkpointer process does today, anyway.

If we say "The custodian is not an essential process and can shutdown
quickly when requested.", and yet we know its not true in all cases,
then that will lead to misunderstandings and bugs.

If we perform a restart and the custodian is performing extra work
that delays shutdown, then it also delays restart. Given the title of
the thread, we should be looking to improve that, or at least know it
occurred.

> > * not sure why you want immediate execution of custodian tasks - I
> > feel supporting two modes will be a lot harder. For me, I would run
> > locally when !IsUnderPostmaster and also in an Assert build, so we can
> > test it works right - i.e. running in its own process is just a
> > production optimization for performance (which is the stated reason
> > for having this)
>
> I added this because 0004 involves requesting a task from the postmaster,
> so checking for IsUnderPostmaster doesn't work.  Those tasks would always
> run immediately.  However, we could use IsPostmasterEnvironment instead,
> which would allow us to remove the "immediate" argument.  I did it this way
> in v14.

Thanks

> > 0005 seems good from what I know
> > * There is no check to see if it worked in any sane time
>
> What did you have in mind?  Should the custodian begin emitting WARNINGs
> after a while?

I think it might be useful if it logged anything that took an
"extended period", TBD.

Maybe that is already covered by startup process logging. Please tell
me that still works?

> > Rather than explicitly use DEBUG1 everywhere I would have an
> > #define CUSTODIAN_LOG_LEVEL     LOG
> > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > line change in a later phase of Beta
>
> I can create a separate patch for this, but I don't think I've ever seen
> this sort of thing before.

Much of recovery is coded that way, for the same reason.

> Is the idea just to help with debugging during
> the development phase?

"Just", yes. Tests would be desirable also, under src/test/modules.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Reducing power consumption on idle servers
Next
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply