Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date
Msg-id 20170203184248.nwucnik6mmgdbgo4@alvherre.pgsql
Whole thread Raw
In response to Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
List pgsql-hackers
Pavan Deolasee wrote:

> Looking at the history and some past discussions, it seems Tomas reported
> somewhat similar problem and Andres proposed a patch here
> https://www.postgresql.org/message-id/20140514155204.GE23943@awork2.anarazel.de
> which got committed via b23b0f5588d2e2. Not exactly the same issue, but
> related to relcache flush happening in index_open().
> 
> I wonder if we should just add a recheck logic
> in RelationGetIndexAttrBitmap() such that if rd_indexvalid is seen as 0 at
> the end, we go back and start again from RelationGetIndexList(). Or is
> there any code path that relies on finding the old information?

Good find on that old report.  It occured to me to re-run Tomas' test
case with this second patch you proposed (the "ddl" test takes 11
minutes in my laptop), and lo and behold -- your "recheck" code enters
an infinite loop.  Not good.  I tried some more tricks that came to
mind, but I didn't get any good results.  Pavan and I discussed it
further and he came up with the idea of returning the bitmapset we
compute, but if an invalidation occurs in index_open, simply do not
cache the bitmapsets so that next time this is called they are all
computed again.  Patch attached.

This appears to have appeased both test_decoding's ddl test under
CACHE_CLOBBER_ALWAYS, as well as the CREATE INDEX CONCURRENTLY bug.

I intend to commit this soon to all branches, to ensure it gets into the
next set of minors.

Here is a detailed reproducer for the CREATE INDEX CONCURRENTLY bug.
Line numbers are as of today's master; comments are supposed to help
locating good spots in other branches.  Given the use of gdb
breakpoints, there's no good way to integrate this with regression
tests, which is a pity because this stuff looks a little brittle to me,
and if it breaks it is hard to detect.

Prep:
DROP TABLE IF EXISTS testtab;
CREATE TABLE testtab (a int unique, b int, c int);
INSERT INTO testtab VALUES (1, 100, 10);
CREATE INDEX testindx ON testtab (b);

S1:
SELECT * FROM testtab;
UPDATE testtab SET b = b + 1 WHERE a = 1;
SELECT pg_backend_pid();
  attach GDB to this session.
  break relcache.c:4800
      # right before entering the per-index loop
  cont

S2:
SELECT pg_backend_pid();
DROP INDEX testindx;
  attach GDB to this session
  handle SIGUSR1 nostop
  break indexcmds.c:666
     # right after index_create
  break indexcmds.c:790
     # right before the second CommitTransactionCommand
  cont
CREATE INDEX CONCURRENTLY testindx ON testtab (b);
  -- this blocks in breakpoint 1.  Leave it there.

S1:
UPDATE testtab SET b = b + 1 WHERE a = 1;
  -- this blocks in breakpoint 1. Leave it there.

S2:
  gdb -> cont
  -- This blocks waiting for S1

S1:
  gdb -> cont
  -- S1 finishes the update.  S2 awakens and blocks again in breakpoint 2.

S1:
  -- Now run more UPDATEs:
UPDATE testtab SET b = b + 1 WHERE a = 1;
  This produces a broken HOT chain.
  This can be done several times; all these updates should be non-HOT
  because of the index created by CIC, but they are marked HOT.

S2: "cont"
  From this point onwards, update are correct.

SELECT * FROM testtab;
SET enable_seqscan TO 0;
SET enable_bitmapscan TO 0;
SELECT * FROM testtab WHERE b between 100 and 110;

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
Next
From: Jeff Janes
Date:
Subject: Re: [HACKERS] new autovacuum criterion for visible pages