Re: Random pg_upgrade test failure on drongo - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Random pg_upgrade test failure on drongo
Date
Msg-id 5b2d0f4b-9e79-cd13-b932-b1a9162b7205@gmail.com
Whole thread Raw
In response to Re: Random pg_upgrade test failure on drongo  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Random pg_upgrade test failure on drongo
List pgsql-hackers
Hello Amit,

09.01.2024 13:08, Amit Kapila wrote:
>
>> As to checkpoint_timeout, personally I would not increase it, because it
>> seems unbelievable to me that pg_restore (with the cluster containing only
>> two empty databases) can run for longer than 5 minutes. I'd rather
>> investigate such situation separately, in case we encounter it, but maybe
>> it's only me.
>>
> I feel it is okay to set a higher value of checkpoint_timeout due to
> the same reason though the probability is less. I feel here it is
> important to explain in the comments why we are using these settings
> in the new test. I have thought of something like: "During the
> upgrade, bgwriter or checkpointer could hold the file handle for some
> removed file. Now, during restore when we try to create the file with
> the same name, it errors out. This behavior is specific to only some
> specific Windows versions and the probability of seeing this behavior
> is higher in this test because we use wal_level as logical via
> allows_streaming => 'logical' which in turn sets shared_buffers as
> 1MB."
>
> Thoughts?

I would describe that behavior as "During upgrade, when pg_restore performs
CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
fail if OS (such as older Windows versions) doesn't remove an unlinked file
completely till it's open. ..."

Best regards,
Alexander




pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Support "Right Semi Join" plan shapes
Next
From: Andrei Lepikhov
Date:
Subject: Re: POC: GROUP BY optimization