Home > mailing lists

Bug about drop index concurrently - Mailing list pgsql-hackers

From	李杰(慎追)
Subject	Bug about drop index concurrently
Date	October 16, 2019 08:34:31
Msg-id	cd1b1f74-b3ba-4df7-9355-9df2057b7724.adger.lj@alibaba-inc.com Whole thread Raw
Responses	Re: Bug about drop index concurrently
List	pgsql-hackers

Tree view

Hi hackers,

In recently, I discovered a postgres bug, and I hope I can ask you for the best solution.

The problem is as follows:

postgres=# explain analyze select * from xxx where a=500;

ERROR: could not open relation with OID 25989

The structure of my table is as follows:

postgres=# \d xxx

Table "public.xxx"

Column | Type | Collation | Nullable | Default

--------+---------+-----------+----------+---------

a | integer | | |

b | text | | |

postgres=# select count(*) from xxx;

count

--------

800000

(1 row)

postgres=# select * from xxx limit 3;

a | b

---+----------------------------------

1 | 203c51477570aa517cfa317155dcc52c

2 | b58da31baa5c78fee4642fd398bd5909

3 | c7c475bf0a3ca2fc2afc4812a4d44c58

I opened the log file and saw that the index of table xxx was deleted,

postgres=# drop index CONCURRENTLY idx_xxx ;

DROP INDEX

In order to reproduce this bug, I created and deleted the index again and again on the master.

What is hard to understand is that this bug cannot be repeated 100%.

I wrote a script that loops over the master and runs the following two sentences.

postgres=# create index idx_xxx on xxx (a);

postgres=# drop index CONCURRENTLY idx_xxx ;

postgres=# create index idx_xxx on xxx (a);

postgres=# drop index CONCURRENTLY idx_xxx ;

...

At the same time, I started two clients in the standby,

respectively execute the following sql on the table xxx:

postgres=# explain analyze select * from xxx where a=500;

postgres=# \watch 0.1

After a few minutes, the bug will appear.

I finally confirmed my guess, I used an index scan in the standby query,

but deleted the index on the master at the same time.

Curious, I went to read the source code of Postgres. I found that

regular DROP INDEX commands imposes a AccessExclusiveLock on the table,

while drop index concurrently commands only used ShareUpdateExclusiveLock.

As we all know, only AccessExclusiveLock and AccessShareLock ,a select's lock ,

are mutually exclusive, and AccessShareLock can't block ShareUpdateExclusiveLock.

This is very weird and not desirable.

This is of course, developers must have thought of this, so we can see in the source

code, before the drop index concurrently, will wait for all transactions using this

index to wait for detection.

But this only exists on the master, my query is executed on the standby.

I use the pg_waldump tool to parse the wal file, and analyze the stantup process,

I found that there is no similar operation on the standby, so it will appear that

when I execute the query on the standby, the index will be deleted by others.

I think this is a bug that will affect the user's experience. we need to fix it.

I have imagined that the logic that detects the query transaction and

waits for it to end is implemented on the standby,but this may increase the

log application delay and the delay is exacerbated that cause the master and backup.

This is not desirable if the query concurrency is large.

All in all, I expect that you can provide a solution that can use drop index concurrently

without affecting the master-slave delay.

Sincerely look forward to your reply and thanks.

adger

pgsql-hackers by date:

From: Amit Kapila
Date: 16 October 2019, 06:48:41
Subject: Re: maintenance_work_mem used by Vacuum

From: Andres Freund
Date: 16 October 2019, 08:40:53
Subject: Re: Understanding TupleQueue impact and overheads?

Bug about drop index concurrently - Mailing list pgsql-hackers

Previous

Next