Re: REINDEX CONCURRENTLY 2.0 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: REINDEX CONCURRENTLY 2.0
Date
Msg-id 20131118121958.GB20305@awork2.anarazel.de
Whole thread Raw
In response to Re: REINDEX CONCURRENTLY 2.0  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 2013-11-18 19:52:29 +0900, Michael Paquier wrote:
> On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
> >> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> >> Patch 3 needs to have patch 2 applied first. Regression tests,
> >> isolation tests and documentation are included with the patch.
> >
> > Have you addressed my concurrency concerns from the last version?
> I have added documentation in the patch with a better explanation
> about why those choices of implementation are made.

The dropping still isn't safe:
After phase 4 we are in the state:
old index: valid, live, !isdead
new index: !valid, live, !isdead
Then you do a index_concurrent_set_dead() from that state on in phase 5.
There you do WaitForLockers(locktag, AccessExclusiveLock) before
index_set_state_flags(INDEX_DROP_SET_DEAD).
That's not sufficient.

Consider what happens with the following sequence:
1) WaitForLockers(locktag, AccessExclusiveLock)  -> GetLockConflicts() => virtualxact 1  -> VirtualXactLock(1)
2) virtualxact 2 starts, opens the *old* index since it's currently the  only valid one.
3) virtualxact 1 finishes
4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD)
5) another transaction (vxid 3) starts inserting data into the relation, updates  only the new index, the old index is
dead
6) vxid 2 inserts data, updates only the old index. Since it had the  index already open the cache invalidations won't
beprocessed.
 

Now the indexes are out of sync. There's entries only in the old index
and there's entries only in the new index. Not good.

I hate to repeat myself, but you really need to follow the current
protocol for concurrently dropping indexes. Which involves *first*
marking the index as invalid so it won't be used for querying anymore,
then wait for everyone possibly still seing that entry to finish, and
only *after* that mark the index as dead. You cannot argue away
correctness concerns with potential deadlocks.

c.f. http://www.postgresql.org/message-id/20130926103400.GA2471420@alap2.anarazel.de


I am also still unconvinced that the logic in index_concurrent_swap() is
correct. It very much needs to explain why no backend can see values
that are inconsistent. E.g. what prevents a backend thinking the old and
new indexes have the same relfilenode? MVCC snapshots don't seem to
protect you against that.
I am not sure there's a problem, but there certainly needs to more
comments explaining why there are none.

Something like the following might be possible:

Backend 1: start reindex concurrently, till phase 4
Backend 2: ExecOpenIndices()          -> RelationGetIndexList (that list is consistent due to mvcc snapshots!)
Backend 2: -> index_open(old_index) (old relfilenode)
Backend 1: index_concurrent_swap()          -> CommitTransaction()          -> ProcArrayEndTransaction() (changes
visibleto others henceforth!)
 
Backend 2: -> index_open(new_index)          => no cache invalidations yet, gets the old relfilenode
Backend 2: ExecInsertIndexTuples()          => updates the same relation twice, corruptf
Backend 1: stil. in CommitTransaction()         -> AtEOXact_Inval() sends out invalidations

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Sequence Access Method WIP
Next
From: Kohei KaiGai
Date:
Subject: Re: Custom Scan APIs (Re: Custom Plan node)