Re: Index-only quals - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Index-only quals
Date
Msg-id 4A904EB1.70100@enterprisedb.com
Whole thread Raw
In response to Re: Index-only quals  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Barring objections, I'm going to apply the indexam API changes part,
>> since that simplifies the code in question regardless of the rest of the
>> work. I'm pretty happy now with the indexfilter patch as well, but want
>> to do some more testing on that before committing. Some more eyeballs
>> would be appreciated as well.
> 
> The first patch definitely needs another editorial pass, eg you have
> complicated the behavior of heap_hot_search_buffer's all_dead parameter
> without adjusting its documentation.

Ok, I'll take a look.

>  The patch as-submitted is really
> quite hard to review, though.  It's hard to convince myself that all the
> logic you removed from index_getnext is all accounted for elsewhere.
> Could you run through the reasoning on that?

The removed logic in index_getnext() dealt with HOT chains. As the code
stands without the patch, index_getnext() returns all visible tuples
from a HOT chain, and therefore needs bookkeeping to keep track of the
current tuple in the current HOT chain.

With a normal MVCC snapshot, only one tuple in a HOT chain can be
visible, so all that logic is unnecessary for a MVCC snapshot. The same
holds for SnapshotSelf and SnapshotNow, but not for SnapshotAny.
However, there is only caller of index_getnext() that uses SnapshotAny:
CLUSTER. I modified heap_hot_search_buffer() so that it can be used to
continue searching a HOT chain after the first visible tuple, and
modified cluster.c to use that for the HOT chain following. This allowed
me to dumb down index_next()/index_fetch() so that index_fetch() only
needs to find the first visible tuple in a HOT chain.

> I think the second patch needs a fair amount of work.  The fact that
> _bt_getindextuple can fail to get the right tuple seems quite bogus;
> what I think it means is that you've attached the functionality in the
> wrong place.  nbtree certainly had its hands on the right tuple at some
> point, and what you should have done was saved the tuple aside at that
> point.  I feel it's important that an indexam either be able to give
> back tuples or not; this "maybe I can" semantics will prevent us from
> doing many interesting things with the capability.  (More about that
> in an upcoming message.)

Yeah, I think you're right. That seemed like the quickest way to get it
done, but I'm seeing that it's also pretty bad from a performance
standpoint - the index page needs to be locked and unlocked for each
tuple, which in a quick microbenchmark makes the index scan slower than
a seqscan when the data is in cache. I'm seeing about 10-15% of CPU time
spent on that locking, and the index scan is about that much slower than
the seq scan.

That can certainly be improved. Unfortunately we can't just return
pointers to the pinned index page in shared buffer, because a pin
doesn't stop page splits or moving tuples like it does on heap pages.
But we can copy all matching tuples to local memory along with the
matching TIDs when we step on a page.

> I concur with the objection to "regurgitate" terminology, it seems a
> bit yucky.

Apparently that word evokes stronger images in native speakers :-).

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: Another try at reducing repeated detoast work for PostGIS
Next
From: Roger Leigh
Date:
Subject: [PATCH 1/6] psql: Abstract table formatting characters used for different line types.