On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and check if it
> matches with the corresponding heap tuple. We could write some kind of self
> join that can use a sequential scan and an index-only scan (David Rowley had
> suggested something of that sort internally here), but we can't guarantee
> index-only scan on a table which is being concurrently updated. So not sure
> even that will catch every possible case.
Oh, so the problem isn't index entries that are altogether missing? I
guess I was confused.
You can certainly guarantee an index-only scan if you write the
validation code in C rather than using SQL. I think the issue is that
if the table is large enough that keeping a TID -> index value mapping
in a hash table is impractical, there's not going to be a real
efficient strategy for this. Ignoring the question of whether you use
the main executor for this or just roll your own code, your options
for a large table are (1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join. (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM. (3) is most
suitable for very large tables but takes more work to code, and is
also likely to be a lot slower for small tables than a hash or
nestloop-based approach.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company