Home > mailing lists

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date	February 19, 2017 17:27:53
Msg-id	CA+TgmoYznjLO_QVxrWbLWTo2uow6mknTWWVGPPqyMqGqJ3QLsw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses	Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
List	pgsql-hackers

Tree view

On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and check if it
> matches with the corresponding heap tuple. We could write some kind of self
> join that can use a sequential scan and an index-only scan (David Rowley had
> suggested something of that sort internally here), but we can't guarantee
> index-only scan on a table which is being concurrently updated. So not sure
> even that will catch every possible case.

Oh, so the problem isn't index entries that are altogether missing?  I
guess I was confused.

You can certainly guarantee an index-only scan if you write the
validation code in C rather than using SQL.  I think the issue is that
if the table is large enough that keeping a TID -> index value mapping
in a hash table is impractical, there's not going to be a real
efficient strategy for this.  Ignoring the question of whether you use
the main executor for this or just roll your own code, your options
for a large table are (1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join.  (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM.  (3) is most
suitable for very large tables but takes more work to code, and is
also likely to be a lot slower for small tables than a hash or
nestloop-based approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 19 February 2017, 17:21:32
Subject: Re: [HACKERS] Adding new output parameter of pg_stat_statements toidentify operation of the query.

From: Michael Paquier
Date: 19 February 2017, 17:44:03
Subject: Re: [HACKERS] SCRAM authentication, take three

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

Previous

Next