Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date
Msg-id CA+hUKGKa8HNJaA24gqiiFoGy0ysndeVoJsHvX_q1-DVLFaGAmw@mail.gmail.com
Whole thread Raw
In response to Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
List pgsql-hackers
On Thu, May 12, 2022 at 4:57 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, May 12, 2022 at 3:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > error running SQL: 'psql:<stdin>:1: ERROR:  source database
> > "conflict_db_template" is being accessed by other users
> > DETAIL:  There is 1 other session using the database.'
>
> Oh, for this one I think it may just be that the autovacuum worker
> with PID 23757 took longer to exit than the 5 seconds
> CountOtherDBBackends() is prepared to wait, after sending it SIGTERM.

In this test, autovacuum_naptime is set to 1s (per Andres, AV was
implicated when he first saw the problem with pg_upgrade, hence desire
to crank it up).  That's not necessary: commenting out the active line
in ProcessBarrierSmgrRelease() shows that the tests reliably reproduce
data corruption without it.  Let's just take that out.

As for skink failing, the timeout was hard coded 300s for the whole
test, but apparently that wasn't enough under valgrind.  Let's use the
standard PostgreSQL::Test::Utils::timeout_default (180s usually), but
reset it for each query we send.

See attached.

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: recovery test failure on morepork with timestamp mystery
Next
From: Amit Kapila
Date:
Subject: Re: First draft of the PG 15 release notes