Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date
Msg-id CA+hUKGKNffOrUqWNzMEf=TGauGVLqz=QB5Kz9axmTw0BgV-a+Q@mail.gmail.com
Whole thread Raw
In response to Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
List pgsql-hackers
On Sat, May 7, 2022 at 4:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> I think we'll probably also want to invent a way
> to report which backend is holding up progress, since without that
> it's practically impossible for an end user to understand why their
> command is hanging.

Simple idea: how about logging the PID of processes that block
progress for too long?  In the attached, I arbitrarily picked 5
seconds as the wait time between LOG messages.  Also, DEBUG1 messages
let you see the processing speed on eg build farm animals.  Thoughts?

To test this, kill -STOP a random backend, and then try an ALTER
DATABASE SET TABLESPACE in another backend.  Example output:

DEBUG:  waiting for all backends to process ProcSignalBarrier generation 1
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;
LOG:  still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT:  alter database mydb set tablespace ts1;

... then kill -CONT:

DEBUG:  finished waiting for all backends to process ProcSignalBarrier
generation 1

Another thought is that it might be nice to be able to test with a
dummy PSB that doesn't actually do anything.  You could use it to see
how fast your system processes it, while doing various other things,
and to find/debug problems in other code that fails to handle
interrupts correctly.  Here's an attempt at that.  I guess it could go
into a src/test/modules/something instead of core, but on the other
hand the PSB itself has to be in core anyway, so maybe not.  Thoughts?
 No documentation yet, just seeing if people think this is worth
having... better names/ideas welcome.

To test this, just SELECT pg_test_procsignal_barrier().

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI
Next
From: Justin Pryzby
Date:
Subject: should check interrupts in BuildRelationExtStatistics ?