Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date
Msg-id CABOikdPVi=DxSmaVJz8j+gBMpQ9A+6ka_DgbRYnNnAwaFt01xw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ah, nah, scratch that.  If any post-index-build pruning had occurred on a
> page, the evidence would be gone --- the non-matching older tuples would
> be removed and what remained would look consistent.  But it wouldn't
> match the index.  You might be able to find problems if you were willing
> to do the expensive check on *every* HOT chain, but that seems none too
> practical.

If the goal is just to detect tuples that aren't indexed and should
be, what about scanning each index and building a TIDBitmap of all of
the index entries, and then scanning the heap for non-HOT tuples and
probing the TIDBitmap for each one?  If you find a heap entry for
which you didn't find an index entry, go and recheck the index and see
if one got added concurrently.  If not, whine.


This particular case of corruption results in a heap tuple getting indexed by a wrong key (or to be precise, indexed by its old value). So the only way to detect the corruption is to look at each index key and check if it matches with the corresponding heap tuple. We could write some kind of self join that can use a sequential scan and an index-only scan (David Rowley had suggested something of that sort internally here), but we can't guarantee index-only scan on a table which is being concurrently updated. So not sure even that will catch every possible case.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel Index-only scan