Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date
Msg-id CALj2ACWiNgt-+D8O4sP6y+7A3JEyhe5QkOmNMrO-VYCx9hYm_Q@mail.gmail.com
Whole thread
In response to Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
Hi,

On Wed, Apr 29, 2026 at 8:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> >
<v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>
>
> I am afraid this is only a partial fix.

Thanks for reviewing it. Please find my responses below.

> ```
> @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, ArrayType *pubnames,
>                 /* Show all columns when the column list is not specified. */
>                 if (nulls[2])
>                 {
> -                       Relation        rel = table_open(relid, AccessShareLock);
> +                       Relation        rel = try_table_open(relid, AccessShareLock);
>                         int                     nattnums = 0;
>                         int16      *attnums;
> -                       TupleDesc       desc = RelationGetDescr(rel);
> +                       TupleDesc       desc;
>                         int                     i;
>
> +                       /* Skip if the relation has been concurrently dropped. */
> +                       if (rel == NULL)
> +                               continue;
> ```
>
> This change uses try_table_open() to detect whether a table has been dropped, but try_table_open() is only called
whenthe column list is not specified. If a table is included in the publication with an explicit column list, then even
ifit is dropped concurrently, it may still be returned by pg_get_publication_tables(). 

Right. The try_table_open() is only needed there because that's the
only code path that actually opens the relation (to enumerate its
columns). The column list path reads from the pg_publication_rel
catalog - it never calls table_open(), so it cannot hit the ERROR.

> So this patch removes the “could not open relation with OID” error, but it does not fully ensure the accuracy of the
returnedtable list. 

IMO, no function returning table OIDs can guarantee they remain valid
- a drop can happen right after we return the OID, and accuracy is in
the caller's hands. All the callers of pg_get_publication_tables()
already handle this by JOINing with pg_class.

> It also introduces inconsistent behavior between tables published with and without column lists.

These two paths do different things - one needs the relation open, the
other doesn't. For callers, the outcome is the same: any stale OID
gets filtered out by their pg_class JOIN.

> To resolve the race condition completely, I think we should try to open the table regardless of whether a column list
isspecified. 

IMO, that would add unnecessary locking overhead in a path that
doesn't need it. The bug is the ERROR, and this patch fixes it.

In short, when no column list is specified,
pg_get_publication_tables() needs to open the relation to enumerate
all publishable columns and return them as an int2vector in the attrs
output column. It currently uses table_open() for this, which errors
out with concurrent table drops. This patch fixes it by using
try_table_open() and skipping the relation if it's been dropped.

Thoughts?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: First draft of PG 19 release notes
Next
From: Xuneng Zhou
Date:
Subject: Re: Implement waiting for wal lsn replay: reloaded