Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Including Snapshot Info with Indexes
Date
Msg-id 4709ECFA.4020905@enterprisedb.com
Whole thread Raw
In response to Including Snapshot Info with Indexes  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
Responses Re: Including Snapshot Info with Indexes
Re: Including Snapshot Info with Indexes
List pgsql-hackers
Gokulakannan Somasundaram wrote:
>     Currently The index implementation in Postgresql does not store the
> Snapshot information in the Index. If we add the snapshot information into
> the indexing structure, we will have the following advantages.

This idea has been discussed to death many times before. Please search
the archives.

> a) There can be index only scans like Oracle

IMO, the most promising approach to achieving index-only-scans at the
moment is the Dead Space Map, as discussed in the 8.3 dev cycle.

> b) Unique indexes will become less costly, as older index tuples can be
> found out.

Doesn't seem like a big benefit, considering that in most cases there
won't be any tuples in the index with a duplicate key. A common
exception to that is (non-HOT) updating a row. But in that case, the
page containing the old tuple is already in cache, so the lookup of the
visibility from the heap is cheap.

> c) Even the index scans will get faster, since some of the index tuples
> won't translate into HeapScans.

That's the same as doing an index-only-scan, right?

> d) Deletes and Updates will become slightly costly, as they have to update
> these indexes.

I think you're grossly underestimating the cost of that. For example, on
a table with 3 indexes. a delete currently requires one index lookup +
one heap lookup. With visibility in the indexes, that would require 3
index lookups + one heap lookup. That's 4 vs. 2 page accesses, not
taking into account the non-leaf b-tree pages. The real impact will
depend on what's in cache, but the cost can be very high.

Also, the full visibility information would need 12 bytes of space per
tuple. An index tuple on an int4 key currently takes 12 bytes, so that
would double the index size. Storage size has a big impact on
performance. More bytes means more I/O, less data fits in cache, and
more WAL traffic.

There's non-trivial implementation issues involved as well. You'd need a
way to reliably find all the index pointers for a given heap tuple
(search the archives for "retail vacuum" for the issues involved in
that. Broken user defined functions are a problem for example). And
you'd need to keep them all locked at the same time to modify them all
atomically, which is prone to deadlocks.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Gokulakannan Somasundaram"
Date:
Subject: Including Snapshot Info with Indexes
Next
From: Magnus Hagander
Date:
Subject: Re: 8.4 TODO item: make src/port support libpq and ecpg directly