Bug about drop index concurrently - Mailing list pgsql-hackers

From 李杰(慎追)
Subject Bug about drop index concurrently
Date
Msg-id cd1b1f74-b3ba-4df7-9355-9df2057b7724.adger.lj@alibaba-inc.com
Whole thread Raw
Responses Re: Bug about drop index concurrently
List pgsql-hackers

Hi hackers,

In recently, I discovered a postgres bug, and I hope I can ask you for the best solution.
The problem is as follows:

postgres=# explain analyze select * from xxx where a=500;

ERROR:  could not open relation with OID 25989

The structure of my table is as follows:

postgres=# \d xxx

                Table "public.xxx"

 Column |  Type   | Collation | Nullable | Default 

--------+---------+-----------+----------+---------

 a      | integer |           |          | 

 b      | text    |           |          | 


postgres=# select count(*) from xxx;

 count  

--------

 800000

(1 row)


postgres=# select * from xxx limit 3;


 a |                b                 

---+----------------------------------

 1 | 203c51477570aa517cfa317155dcc52c

 2 | b58da31baa5c78fee4642fd398bd5909

 3 | c7c475bf0a3ca2fc2afc4812a4d44c58


I opened the log file and saw that the index of table xxx was deleted,

postgres=# drop index CONCURRENTLY idx_xxx ;

DROP INDEX


In order to reproduce this bug, I created and deleted the index again and again on the master.
What is hard to understand is that this bug cannot be repeated 100%.
I wrote a script that loops over the master and runs the following two sentences.

postgres=# create index idx_xxx on xxx (a);

postgres=# drop index CONCURRENTLY idx_xxx ;

postgres=# create index idx_xxx on xxx (a);

postgres=# drop index CONCURRENTLY idx_xxx ;

...

...

...

At the same time, I started two clients in the standby, 

respectively execute the following sql on the table xxx:


postgres=# explain analyze select * from xxx where a=500;

postgres=# \watch 0.1


After a few minutes, the bug will appear.


I finally confirmed my guess, I used an index scan in the standby query,
but deleted the index on the master at the same time.
Curious, I went to read the source code of Postgres. I found that
 regular DROP INDEX commands imposes a AccessExclusiveLock on the table,
 while drop index concurrently commands only used ShareUpdateExclusiveLock.

As we all know, only AccessExclusiveLock and  AccessShareLock ,a select's  lock ,
are mutually exclusive, and AccessShareLock can't block ShareUpdateExclusiveLock.
This is very weird and not desirable.

This is of course, developers must have thought of this, so we can see in the source 
code, before the drop index concurrently, will wait for all transactions using this
 index to wait for detection.

 But this only exists on the master, my query is executed on the standby.
 I use the pg_waldump tool to parse the wal file, and analyze the stantup process,
 I found that there is no similar operation on the standby, so it will appear that 
 when I execute the query on the standby, the index will be deleted by others.

I think this is a bug that will affect the user's experience. we need to fix it.

 I have imagined that the logic that detects the  query transaction and

 waits for it to end is implemented on the standby,but this may increase the

log application delay and the delay is exacerbated that cause the master and backup. 

This is not desirable if the query concurrency is large.


All in all, I expect that you can provide a solution that can use drop index concurrently 

without affecting the master-slave delay.


Sincerely look forward to your reply and thanks.


adger




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: maintenance_work_mem used by Vacuum
Next
From: Andres Freund
Date:
Subject: Re: Understanding TupleQueue impact and overheads?