On Wed, Feb 16, 2022 at 10:59:38PM -0800, Andres Freund wrote:
> On 2022-02-16 20:14:04 -0800, Nathan Bossart wrote:
>> >> - while ((spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> >> + while (!ShutdownRequestPending &&
>> >> + (spc_de = ReadDirExtended(spc_dir, "pg_tblspc", LOG)) != NULL)
>> >
>> > Uh, huh? It strikes me as a supremely bad idea to have functions *silently*
>> > not do their jobs when ShutdownRequestPending is set, particularly without a
>> > huge fat comment.
>>
>> The idea was to avoid delaying shutdown because we're waiting for the
>> custodian to finish relatively nonessential tasks. Another option might be
>> to just exit immediately when the custodian receives a shutdown request.
>
> I think we should just not do either of these and let the functions
> finish. For the cases where shutdown really needs to be immediate
> there's, uhm, immediate mode shutdowns.
Alright.
>> > Why does this not open us up to new xid wraparound issues? Before there was a
>> > hard bound on how long these files could linger around. Now there's not
>> > anymore.
>>
>> Sorry, I'm probably missing something obvious, but I'm not sure how this
>> adds transaction ID wraparound risk. These files are tied to LSNs, and
>> AFAIK they won't impact slots' xmins.
>
> They're accessed by xid. The LSN is just for cleanup. Accessing files
> left over from a previous transaction with the same xid wouldn't be
> good - we'd read wrong catalog state for decoding...
Okay, that part makes sense to me. However, I'm still confused about how
this is handled today and why moving cleanup to a separate auxiliary
process makes matters worse. I've done quite a bit of reading, and I
haven't found anything that seems intended to prevent this problem. Do you
have any pointers?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com