Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Date
Msg-id ZIvBzm0DwRo60Qwk@paquier.xyz
Whole thread Raw
In response to BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On Thu, Jun 15, 2023 at 06:36:41PM -0700, Will Mortensen wrote:
> On Thu, Jun 15, 2023 at 6:01 PM Michael Paquier <michael@paquier.xyz> wrote:
>> It requires a few manual steps, but I have been able to stuck the
>> autovacuum launcher schedule.  Nice investigation from the reporters.
>>
>> I may be missing something here, but finishing with an inconsistent
>> database list (generated based on the pgstat database entries) in the
>> autovacuum launcher is not something that can happen only because of a
>> worker, right?  A normal backend would call pgstat_update_dbstats()
>> once it exists, re-creating a fresh entry with the dropped database
>> OID.  Is that right?
>
> Yes, sorry, Jacob was able to repro with a normal backend just now. We
> probably should have tried that earlier. :-)

Okay, thanks for confirming.  Yes, I was able to stuck the scheduler
for both.  FWIW, I have initially hardcoded a stop point with an
on-disk file around InitPostgres() before we take the shared lock.
But thinking harder, it is possible to be sneaky with the following
steps to get these incorrect stat entries:
1) LOCK pg_database in a first session, on a database different than
the one to drop.
2) Connect with a second session to the database to drop, stuck
because of the previous LOCK when scanning pg_database to find the
tuple OID, during connection startup, just before taking the shared
lock.
3) Third session executing DROP DATABASE, with a session that
connected before the pg_database lock.
4) Commit transaction of first session to release pg_database lock,
the database is dropped and the second session fails to connect with
the database renamed.

> I'm also unsure if reiniting the pgstats entry (as opposed to creating
> a new one) is actually necessary or just what we happened to observe.
> We're definitely not very familiar with these internals. :-)

The proposed patch does better than that, actually, and takes the
approach of not touching the stats for the database dropped/renamed
by delaying MyDatabaseId which is what the backend uses to update the
database-level pgstat entries.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Will Mortensen
Date:
Subject: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Next
From: Masahiko Sawada
Date:
Subject: Re: BUG #17969: Assert failed in bloom_init() when false_positive_rate = 0.25