Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date
Msg-id CA+TgmoZ9H_Ekz4creSG-P6fPjM8CyJiMFOnxdT_SqyonT5OD1Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> However, you might be able to find it without so much random I/O.
>> I'm envisioning a seqscan over the table, in which you simply look for
>> HOT chains in which the indexed columns aren't all the same.  When you
>> find one, you'd have to do a pretty expensive index lookup to confirm
>> whether things are OK or not, but hopefully that'd be rare enough to not
>> be a big performance sink.
>
> Ah, nah, scratch that.  If any post-index-build pruning had occurred on a
> page, the evidence would be gone --- the non-matching older tuples would
> be removed and what remained would look consistent.  But it wouldn't
> match the index.  You might be able to find problems if you were willing
> to do the expensive check on *every* HOT chain, but that seems none too
> practical.

If the goal is just to detect tuples that aren't indexed and should
be, what about scanning each index and building a TIDBitmap of all of
the index entries, and then scanning the heap for non-HOT tuples and
probing the TIDBitmap for each one?  If you find a heap entry for
which you didn't find an index entry, go and recheck the index and see
if one got added concurrently.  If not, whine.

One problem is that you'd need to make sure that the TIDBitmap didn't
lossify, but that could be prevented either by specifying a very large
maxbytes setting or by using something that serves the same function
instead of a TIDBitmap per se.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Does having a NULL column automatically exclude thetable from the tupleDesc cache?