Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE - Mailing list pgsql-hackers

From Chao Li
Subject Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date
Msg-id AFB59976-10F2-4D42-8C35-A8D4ED40E342@gmail.com
Whole thread
In response to Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
List pgsql-hackers

> On Apr 29, 2026, at 15:15, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> On Tue, Apr 28, 2026 at 8:28 PM shveta malik <shveta.malik@gmail.com> wrote:
>>
>> On Wed, Apr 29, 2026 at 7:12 AM Ajin Cherian <itsajin@gmail.com> wrote:
>>
>>> One small comment. The includes need to be in alphabetical order.
>>> injection_point.h should come after fmgroids.h
>>>
>>> +#include "utils/injection_point.h"
>>> #include "utils/syscache.h"
>>
>> Also there is a trailing whitespace issue while applying the patch.
>> Other than these, the patch looks good.
>
> Fixed. Please find the attached v5 patch.
>
> The fix is needed only for PG16 and later, not PG15 or PG14. The bug
> was introduced by b7ae03953690 [1] in PG16, which added a table_open()
> call in pg_get_publication_tables(). PG15 and earlier only use
> get_rel_namespace() and syscache lookups, both of which gracefully
> handle dropped relations (returning InvalidOid/false rather than
> erroring).
>
> I verified the bug and the fix on all affected branches. Please find
> the attached version-specific patches for backpatching. Thank you!
>
> [1] b7ae03953690 - Ignore dropped and generated columns from the column list
>
> --
> Bharath Rupireddy
> Amazon Web Services: https://aws.amazon.com
>
<v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>

I am afraid this is only a partial fix.

```
@@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, ArrayType *pubnames,
         /* Show all columns when the column list is not specified. */
         if (nulls[2])
         {
-            Relation    rel = table_open(relid, AccessShareLock);
+            Relation    rel = try_table_open(relid, AccessShareLock);
             int            nattnums = 0;
             int16       *attnums;
-            TupleDesc    desc = RelationGetDescr(rel);
+            TupleDesc    desc;
             int            i;

+            /* Skip if the relation has been concurrently dropped. */
+            if (rel == NULL)
+                continue;
```

This change uses try_table_open() to detect whether a table has been dropped, but try_table_open() is only called when
thecolumn list is not specified. If a table is included in the publication with an explicit column list, then even if
itis dropped concurrently, it may still be returned by pg_get_publication_tables(). 

So this patch removes the “could not open relation with OID” error, but it does not fully ensure the accuracy of the
returnedtable list. It also introduces inconsistent behavior between tables published with and without column lists. 

To resolve the race condition completely, I think we should try to open the table regardless of whether a column list
isspecified. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Fix race condition in XLogLogicalInfo and ProcSignal initialization.
Next
From: Chao Li
Date:
Subject: Re: Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server