pgsql: Fix corruption of pgstats shared hashtable due to OOM failures - Mailing list pgsql-committers

From Michael Paquier
Subject pgsql: Fix corruption of pgstats shared hashtable due to OOM failures
Date
Msg-id E1uvVkj-0014wL-1s@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix corruption of pgstats shared hashtable due to OOM failures

A new pgstats entry is created as a two-step process:
- The entry is looked at in the shared hashtable of pgstats, and is
inserted if not found.
- When not found and inserted, its fields are then initialized.  This
part include a DSA chunk allocation for the stats data of the new entry.

As currently coded, if the DSA chunk allocation fails due to an
out-of-memory failure, an ERROR is generated, leaving in the pgstats
shared hashtable an inconsistent entry due to the first step, as the
entry has already been inserted in the hashtable.  These broken entries
can then be found by other backends, crashing them.

There are only two callers of pgstat_init_entry(), when loading the
pgstats file at startup and when creating a new pgstats entry.  This
commit changes pgstat_init_entry() so as we use dsa_allocate_extended()
with DSA_ALLOC_NO_OOM, making it return NULL on allocation failure
instead of failing.  This way, a backend failing an entry creation can
take appropriate cleanup actions in the shared hashtable before throwing
an error.  Currently, this means removing the entry from the shared
hashtable before throwing the error for the allocation failure.

Out-of-memory errors unlikely happen in the wild, and we do not bother
with back-patches when these are fixed, usually.  However, the problem
dealt with here is a degree worse as it breaks the shared memory state
of pgstats, impacting other processes that may look at an inconsistent
entry that a different process has failed to create.

Author: Mikhail Kot <mikhail.kot@databricks.com>
Discussion: https://postgr.es/m/CAAi9E7jELo5_-sBENftnc2E8XhW2PKZJWfTC3i2y-GMQd2bcqQ@mail.gmail.com
Backpatch-through: 15

Branch
------
REL_15_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/1852ec5db5d8e1c16a39a307fde853d6a7cfa8aa

Modified Files
--------------
src/backend/utils/activity/pgstat.c       | 10 ++++++++++
src/backend/utils/activity/pgstat_shmem.c | 28 +++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletion(-)


pgsql-committers by date:

Previous
From: Amit Kapila
Date:
Subject: pgsql: Post-commit review fixes for 228c370868.
Next
From: Amit Kapila
Date:
Subject: pgsql: Add test to prevent premature removal of conflict-relevant data.