BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Date
Msg-id 17973-bca1f7d5c14f601e@postgresql.org
Whole thread Raw
Responses Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17973
Logged by:          Will Mortensen
Email address:      will@extrahop.com
PostgreSQL version: 15.3
Operating system:   Ubuntu 22.04
Description:

My colleague Jacob Speidel (jacob@extrahop.com) and I have diagnosed a race
condition that, under certain conditions, can cause the autovacuum daemon to
stop launching autovacuum workers until the autovacuum daemon (or the whole
server) is restarted. This obviously causes serious problems for the
server.

We've reproduced this on both 15.2 and REL_15_STABLE (git commit
bd590d1fea1ba9245c791d589eea94d2dbad5a2b). We've never seen a similar issue
on 14 despite very similar conditions occurring frequently. We haven't yet
tried on 16.

In Jacob's repro, the problem begins when AutoVacWorkerMain() is in
InitPostgres() and some other backend drops the database. Specifically, the
autovacuum worker's InitPostgres() is just about to obtain the lock at

https://github.com/postgres/postgres/blob/bd590d1fea1ba9245c791d589eea94d2dbad5a2b/src/backend/utils/init/postinit.c#L1012
. The backend dropping the DB marks the DB's pgstats entry as dropped but
can’t free it because its refcount is nonzero, so
AtEOXact_PgStat_DroppedStats() calls pgstat_request_entry_refs_gc(). The
autovacuum worker's InitPostgres() proceeds to call GetDatabaseTuple() and
notices the database has been dropped, at some point calling
pgstat_gc_entry_refs() to release its reference to the DB's pgstats entry,
and the worker decides to exit with a fatal error. But while exiting, it
calls pgstat_report_stat(), which calls pgstat_update_dbstats() for the
dropped DB, which calls pgstat_prep_database_pending(), which calls
pgstat_prep_pending_entry(), which calls pgstat_get_entry_ref() with create
== true, which calls pgstat_reinit_entry() against the DB's pgstats entry.
This sets dropped to false on that entry. Finally, the autovacuum worker
exits.

The fact that a dropped database now indefinitely has a pgstats entry with
dropped == false seemingly violates some assumptions and confuses the
autovacuum daemon. In particular, rebuild_database_list() will forever
include it in DatabaseList and take it into account when computing the
adl_next_worker for each DB, but do_start_worker() won’t consider the DB
because it's never returned by get_database_list(). In our repros, this
mismatch causes do_start_worker() to get stuck never processing any DB: in
particular, it always sees that for each non-dropped database
adl_next_worker is in the future but within the next autovacuum_naptime,
i.e. skipit = true. This causes do_start_worker() to call
rebuild_database_list() at the end, which again miscomputes adl_next_worker
and pushes it further into the future, so that the situation repeats on the
next call to do_start_worker(), and so on indefinitely.

That’s the crux of our issue. Please let us know if any clarification or
more detailed repro steps are needed. Our repro patch just sleeps before and
after the LockSharedObject() call in InitPostgres() (to widen the race
windows) and adds a lot of logging. (Jacob did >90% of the debugging here; I
merely determined how the pgstats entry lost its dropped flag.)

We assume that one fix would be to somehow ensure that the dropped flag
remains true on a dropped database’s pgstats entry until it’s freed, but
also, it seems a bit fragile for autovacuum’s do_start_worker() to sometimes
call rebuild_database_list() and delay all the adl_next_worker times.
Without thinking about it too hard, we wonder if there would still be a
pattern of ongoing DB creates and drops that could cause it to misbehave in
a similar way, never deciding to autovacuum any database even if one lives
long enough that it should be autovacuumed.


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17972: Assert failed in pull_varattnos_walker() for view with subquery and security qual
Next
From: PG Bug reporting form
Date:
Subject: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot