Re: O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: O(n) tasks cause lengthy startups and checkpoints
Date
Msg-id 20220217182337.GA3247866@nathanxps13
Whole thread Raw
In response to Re: O(n) tasks cause lengthy startups and checkpoints  (Andres Freund <andres@anarazel.de>)
Responses Re: O(n) tasks cause lengthy startups and checkpoints  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote:
> On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote:
>> >> -    while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> >> +    while (!ShutdownRequestPending &&
>> >> +           (spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> >
>> > Uh, huh? It strikes me as a supremely bad idea to have functions *silently*
>> > not do their jobs when ShutdownRequestPending is set, particularly without a
>> > huge fat comment.
>>
>> The idea was to avoid delaying shutdown because we're waiting for the
>> custodian to finish relatively nonessential tasks.  Another option might be
>> to just exit immediately when the custodian receives a shutdown request.
> 
> I think we should just not do either of these and let the functions
> finish. For the cases where shutdown really needs to be immediate
> there's, uhm, immediate mode shutdowns.

Alright.

>> > Why does this not open us up to new xid wraparound issues? Before there was a
>> > hard bound on how long these files could linger around. Now there's not
>> > anymore.
>>
>> Sorry, I'm probably missing something obvious, but I'm not sure how this
>> adds transaction ID wraparound risk.  These files are tied to LSNs, and
>> AFAIK they won't impact slots' xmins.
> 
> They're accessed by xid. The LSN is just for cleanup. Accessing files
> left over from a previous transaction with the same xid wouldn't be
> good - we'd read wrong catalog state for decoding...

Okay, that part makes sense to me.  However, I'm still confused about how
this is handled today and why moving cleanup to a separate auxiliary
process makes matters worse.  I've done quite a bit of reading, and I
haven't found anything that seems intended to prevent this problem.  Do you
have any pointers?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Next
From: Andres Freund
Date:
Subject: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM