BUG #18653: Is it necessary for ATExecDropInherit to acquire an AccessExclusiveLock on the parent table? - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18653: Is it necessary for ATExecDropInherit to acquire an AccessExclusiveLock on the parent table?
Date
Msg-id 18653-f2a96d3b52fd95a0@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18653
Logged by:          Fei Changhong
Email address:      feichanghong@qq.com
PostgreSQL version: 17.0
Operating system:   Operating system: centos 8,Kernel version: 5.10.13
Description:

Hi all,

I encountered a puzzling issue when executing "ALTER TABLE NO INHERIT"
concurrently with "SELECT": In an SQL statement, there are two identical
subqueries, but they return different results. This can be reproduced with
the following spec test:

```
setup
{
 CREATE TABLE p (a integer);
 INSERT INTO p VALUES(1);
 CREATE TABLE c1 () INHERITS (p);
 INSERT INTO c1 VALUES(10);
 CREATE TABLE c2 (a integer);
 INSERT INTO c2 VALUES(100);
}

teardown
{
 DROP TABLE IF EXISTS c1, c2, p;
}

session s1
step s1b    { BEGIN; }
step s1delc1    { ALTER TABLE c1 NO INHERIT p; }
step s1c    { COMMIT; }

session s2
step s2sel    { explain analyze select a from p group by a union all select a
from p group by a; }

permutation s1b s1delc1 s2sel s1c
```

The result of the SELECT is as follows: one of the UNION ALL nodes returns 2
rows, while the other returns 1 row:

```
QUERY PLAN
                                       
--------------------------------------------------------------------------------------------------------------------
Append  (cost=60.09..70.28 rows=400 width=4) (actual time=0.022..0.030
rows=3 loops=1)                              
  ->  HashAggregate  (cost=60.09..62.09 rows=200 width=4) (actual
time=0.022..0.023 rows=2 loops=1)                 
        Group Key: p.a
                                       
        Batches: 1  Memory Usage: 40kB
                                       
        ->  Append  (cost=0.00..53.07 rows=2805 width=4) (actual
time=0.007..0.014 rows=2 loops=1)                  
              ->  Seq Scan on p p_1  (cost=0.00..3.55 rows=255 width=4)
(actual time=0.007..0.007 rows=1 loops=1)   
              ->  Seq Scan on c1 p_2  (cost=0.00..35.50 rows=2550 width=4)
(actual time=0.005..0.005 rows=1 loops=1)
  ->  HashAggregate  (cost=4.19..6.19 rows=200 width=4) (actual
time=0.005..0.005 rows=1 loops=1)                   
        Group Key: p_3.a
                                       
        Batches: 1  Memory Usage: 40kB
                                       
        ->  Seq Scan on p p_3  (cost=0.00..3.55 rows=255 width=4) (actual
time=0.002..0.002 rows=1 loops=1)         
Planning Time: 10.051 ms
                                       
Execution Time: 0.161 ms    
```

Based on my analysis, the issue occurs because "ALTER TABLE NO INHERIT" only
holds an AccessExclusiveLock on the parent table.

When generating the plan for the first subquery, find_all_inheritors uses
the catalog snapshot from before the DDL is committed, so it sees table c1,
but gets blocked when trying to lock c1. For the second subquery,
find_all_inheritors uses the catalog snapshot from after the DDL is
committed, so it doesn't see c1.

Is this behavior expected? Alternatively, should we acquire an
AccessExclusiveLock on the parent table in ATExecDropInherit? This might
lead to a deadlock.


pgsql-bugs by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark
Next
From: Ba Jinsheng
Date:
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark