Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE - Mailing list pgsql-hackers

From shveta malik
Subject Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date
Msg-id CAJpy0uDZJWZyMVGEmraq8RUw6sUhPXNXoq2SY-xu0Xm_Zf5uww@mail.gmail.com
Whole thread
In response to Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Fri, May 1, 2026 at 7:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> On Wed, Apr 29, 2026 at 8:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
> >
> > >
<v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>
> >
> > I am afraid this is only a partial fix.
>
> Thanks for reviewing it. Please find my responses below.
>
> > ```
> > @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, ArrayType *pubnames,
> >                 /* Show all columns when the column list is not specified. */
> >                 if (nulls[2])
> >                 {
> > -                       Relation        rel = table_open(relid, AccessShareLock);
> > +                       Relation        rel = try_table_open(relid, AccessShareLock);
> >                         int                     nattnums = 0;
> >                         int16      *attnums;
> > -                       TupleDesc       desc = RelationGetDescr(rel);
> > +                       TupleDesc       desc;
> >                         int                     i;
> >
> > +                       /* Skip if the relation has been concurrently dropped. */
> > +                       if (rel == NULL)
> > +                               continue;
> > ```
> >
> > This change uses try_table_open() to detect whether a table has been dropped, but try_table_open() is only called
whenthe column list is not specified. If a table is included in the publication with an explicit column list, then even
ifit is dropped concurrently, it may still be returned by pg_get_publication_tables(). 
>
> Right. The try_table_open() is only needed there because that's the
> only code path that actually opens the relation (to enumerate its
> columns). The column list path reads from the pg_publication_rel
> catalog - it never calls table_open(), so it cannot hit the ERROR.
>
> > So this patch removes the “could not open relation with OID” error, but it does not fully ensure the accuracy of
thereturned table list. 
>
> IMO, no function returning table OIDs can guarantee they remain valid
> - a drop can happen right after we return the OID, and accuracy is in
> the caller's hands. All the callers of pg_get_publication_tables()
> already handle this by JOINing with pg_class.
>

I agree with Bharath. Also I would like to add one more point. We do have this:

+ /* Skip if the relation has been concurrently dropped. */
+ if (!OidIsValid(schemaid))
+ continue;

So if a table is dropped before we could access its explicitly
mentioned column-list, above should handle it right even without
'try_table_open' done? Moreover, pg_publication_rel will not return
any row for dropped table, so we should be good. For 'table_infos',
the case was different, as we saved the list in first call and are
using that in subsequent call? But that is not the case with
pg_publication_rel.

> > It also introduces inconsistent behavior between tables published with and without column lists.
>
> These two paths do different things - one needs the relation open, the
> other doesn't. For callers, the outcome is the same: any stale OID
> gets filtered out by their pg_class JOIN.
>
> > To resolve the race condition completely, I think we should try to open the table regardless of whether a column
listis specified. 
>
> IMO, that would add unnecessary locking overhead in a path that
> doesn't need it. The bug is the ERROR, and this patch fixes it.
>
> In short, when no column list is specified,
> pg_get_publication_tables() needs to open the relation to enumerate
> all publishable columns and return them as an int2vector in the attrs
> output column. It currently uses table_open() for this, which errors
> out with concurrent table drops. This patch fixes it by using
> try_table_open() and skipping the relation if it's been dropped.
>
> Thoughts?
>
> --
> Bharath Rupireddy
> Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: StringInfo fixes, v19 edition. Plus a few oddities
Next
From: shveta malik
Date:
Subject: Re: Include schema-qualified names in publication error messages.