pg_replication_origin_drop API potential race condition - Mailing list pgsql-hackers

From Peter Smith
Subject pg_replication_origin_drop API potential race condition
Date
Msg-id CAHut+PuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A@mail.gmail.com
Whole thread Raw
Responses Re: pg_replication_origin_drop API potential race condition  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Hackers.

As discovered elsewhere [ak0125] there is a potential race condition
in the pg_replication_origin_drop API

The current code of pg_replication_origin_drop looks like:
====
roident = replorigin_by_name(name, false);
Assert(OidIsValid(roident));

replorigin_drop(roident, true);
====

Users cannot deliberately drop a non-existent origin
(replorigin_by_name passes missing_ok = false) but there is still a
small window where concurrent processes may be able to call
replorigin_drop for the same valid roident.

Locking within replorigin_drop guards against concurrent drops so the
1st execution will succeed, but then the 2nd execution would give
internal cache error: elog(ERROR, "cache lookup failed for replication
origin with oid %u", roident);

Some ideas to fix this include:
1. Do nothing except write a comment about this in the code. The
internal ERROR is not ideal for a user API there is no great harm
done.
2. Change the behavior of replorigin_drop to be like
replorigin_drop_IF_EXISTS, so the 2nd execution of this race would
silently do nothing when it finds the roident is already gone.
3. Same as 2, but make the NOP behavior more explicit by introducing a
new "missing_ok" parameter for replorigin_drop.

Thoughts?

----
[ak0125] https://www.postgresql.org/message-id/CAA4eK1%2ByeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Tatsuro Yamada
Date:
Subject: Re: simplifying foreign key/RI checks