Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY |
Date | |
Msg-id | 20170203184248.nwucnik6mmgdbgo4@alvherre.pgsql Whole thread Raw |
In response to | Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY |
List | pgsql-hackers |
Pavan Deolasee wrote: > Looking at the history and some past discussions, it seems Tomas reported > somewhat similar problem and Andres proposed a patch here > https://www.postgresql.org/message-id/20140514155204.GE23943@awork2.anarazel.de > which got committed via b23b0f5588d2e2. Not exactly the same issue, but > related to relcache flush happening in index_open(). > > I wonder if we should just add a recheck logic > in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at > the end, we go back and start again from RelationGetIndexList(). Or is > there any code path that relies on finding the old information? Good find on that old report. It occured to me to re-run Tomas' test case with this second patch you proposed (the "ddl" test takes 11 minutes in my laptop), and lo and behold -- your "recheck" code enters an infinite loop. Not good. I tried some more tricks that came to mind, but I didn't get any good results. Pavan and I discussed it further and he came up with the idea of returning the bitmapset we compute, but if an invalidation occurs in index_open, simply do not cache the bitmapsets so that next time this is called they are all computed again. Patch attached. This appears to have appeased both test_decoding's ddl test under CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug. I intend to commit this soon to all branches, to ensure it gets into the next set of minors. Here is a detailed reproducer for the CREATE INDEX CONCURRENTLY bug. Line numbers are as of today's master; comments are supposed to help locating good spots in other branches. Given the use of gdb breakpoints, there's no good way to integrate this with regression tests, which is a pity because this stuff looks a little brittle to me, and if it breaks it is hard to detect. Prep: DROP TABLE IF EXISTS testtab; CREATE TABLE testtab (a int unique, b int, c int); INSERT INTO testtab VALUES (1, 100, 10); CREATE INDEX testindx ON testtab (b); S1: SELECT * FROM testtab; UPDATE testtab SET b = b + 1 WHERE a = 1; SELECT pg_backend_pid(); attach GDB to this session. break relcache.c:4800 # right before entering the per-index loop cont S2: SELECT pg_backend_pid(); DROP INDEX testindx; attach GDB to this session handle SIGUSR1 nostop break indexcmds.c:666 # right after index_create break indexcmds.c:790 # right before the second CommitTransactionCommand cont CREATE INDEX CONCURRENTLY testindx ON testtab (b); -- this blocks in breakpoint 1. Leave it there. S1: UPDATE testtab SET b = b + 1 WHERE a = 1; -- this blocks in breakpoint 1. Leave it there. S2: gdb -> cont -- This blocks waiting for S1 S1: gdb -> cont -- S1 finishes the update. S2 awakens and blocks again in breakpoint 2. S1: -- Now run more UPDATEs: UPDATE testtab SET b = b + 1 WHERE a = 1; This produces a broken HOT chain. This can be done several times; all these updates should be non-HOT because of the index created by CIC, but they are marked HOT. S2: "cont" From this point onwards, update are correct. SELECT * FROM testtab; SET enable_seqscan TO 0; SET enable_bitmapscan TO 0; SELECT * FROM testtab WHERE b between 100 and 110; -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: