Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Date
Msg-id ab4Gf2r-Bd6P7BiW@paquier.xyz
Whole thread Raw
In response to Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Mar 19, 2026 at 09:54:04AM +0900, Michael Paquier wrote:
> For now I am planning for the attached to get more information from
> widowbird, which should take a few days at worst.  That would make
> clear if we have a timing issue with the requests sent to the
> postmaster.  Launching the queries for worker_spi_launch() and
> pg_stat_activity on the database postgres may also improve things, but
> I don't really buy it, even if I may be wrong.

A couple of days later after 79a5911fe65b, widowbird seems to have
cooled down a bit:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=widowbird&br=master

The extra debugging information is proving to be useful.  For example,
looking at this one for the RENAME TO case:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=widowbird&dt=2026-03-19%2003%3A35%3A03&stg=module-worker_spi-check

And some problematic pattern logs:
2026-03-19 04:49:07.848 UTC [271557:4] 002_worker_terminate.pl LOG:
statement: ALTER DATABASE testdb RENAME TO renameddb
2026-03-19 04:49:07.849 UTC [271557:5] 002_worker_terminate.pl DEBUG:
attempting worker termination for database 16413
2026-03-19 04:49:07.849 UTC [271557:6] 002_worker_terminate.pl DEBUG:
termination requested for worker (PID 271553) on database 16413
[...]
2026-03-19 04:49:08.732 UTC [271557:19] 002_worker_terminate.pl DEBUG:
attempting worker termination for database 16413
2026-03-19 04:49:08.732 UTC [271557:20] 002_worker_terminate.pl DEBUG:
termination requested for worker (PID 271553) on database 16413

So we are able to send the requests to the workers, and these can take
a long time before being processed by the postmaster.  Querying
directly "postgres" for the worker_spi_launch() and pg_stat_activity
queries seems to have reduced the friction, with less requests to
send.   However, I don't think that this is the end of the story, even
after 79a5911fe65b I have spotted one case of RENAME TO where the
requests were sent for a bit more than 4s, before the postmaster had
the idea to catch up.  RENAME TO is the only one that can get slow
(really no idea why), so I guess that we could always tweak things a
bit more:
1) Extra injection point to increase the timeout (30s or 60s?) and
give the postmaster more room to proceed the requests.
2) Remove this portion of the test, but it would be sad.

I'll keep an eye for more failures, even if the situation is looking
slightly better.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Shinya Kato
Date:
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Next
From: Richard Guo
Date:
Subject: Remove inner joins based on foreign keys