Re: Random pg_upgrade test failure on drongo - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Random pg_upgrade test failure on drongo
Date
Msg-id CAA4eK1Lq75HXRxucGrKzWNk8540kdk9dj0B4-6DMcHAZ+CE5+Q@mail.gmail.com
Whole thread Raw
In response to Re: Random pg_upgrade test failure on drongo  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: Random pg_upgrade test failure on drongo
List pgsql-hackers
On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> 09.01.2024 13:08, Amit Kapila wrote:
> >
> >> As to checkpoint_timeout, personally I would not increase it, because it
> >> seems unbelievable to me that pg_restore (with the cluster containing only
> >> two empty databases) can run for longer than 5 minutes. I'd rather
> >> investigate such situation separately, in case we encounter it, but maybe
> >> it's only me.
> >>
> > I feel it is okay to set a higher value of checkpoint_timeout due to
> > the same reason though the probability is less. I feel here it is
> > important to explain in the comments why we are using these settings
> > in the new test. I have thought of something like: "During the
> > upgrade, bgwriter or checkpointer could hold the file handle for some
> > removed file. Now, during restore when we try to create the file with
> > the same name, it errors out. This behavior is specific to only some
> > specific Windows versions and the probability of seeing this behavior
> > is higher in this test because we use wal_level as logical via
> > allows_streaming => 'logical' which in turn sets shared_buffers as
> > 1MB."
> >
> > Thoughts?
>
> I would describe that behavior as "During upgrade, when pg_restore performs
> CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
> handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
> fail if OS (such as older Windows versions) doesn't remove an unlinked file
> completely till it's open. ..."
>

I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.

--
With Regards,
Amit Kapila.

Attachment

pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Next
From: Amit Kapila
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication