Re: Index-only quals - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Index-only quals |
Date | |
Msg-id | 4A904EB1.70100@enterprisedb.com Whole thread Raw |
In response to | Re: Index-only quals (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Barring objections, I'm going to apply the indexam API changes part, >> since that simplifies the code in question regardless of the rest of the >> work. I'm pretty happy now with the indexfilter patch as well, but want >> to do some more testing on that before committing. Some more eyeballs >> would be appreciated as well. > > The first patch definitely needs another editorial pass, eg you have > complicated the behavior of heap_hot_search_buffer's all_dead parameter > without adjusting its documentation. Ok, I'll take a look. > The patch as-submitted is really > quite hard to review, though. It's hard to convince myself that all the > logic you removed from index_getnext is all accounted for elsewhere. > Could you run through the reasoning on that? The removed logic in index_getnext() dealt with HOT chains. As the code stands without the patch, index_getnext() returns all visible tuples from a HOT chain, and therefore needs bookkeeping to keep track of the current tuple in the current HOT chain. With a normal MVCC snapshot, only one tuple in a HOT chain can be visible, so all that logic is unnecessary for a MVCC snapshot. The same holds for SnapshotSelf and SnapshotNow, but not for SnapshotAny. However, there is only caller of index_getnext() that uses SnapshotAny: CLUSTER. I modified heap_hot_search_buffer() so that it can be used to continue searching a HOT chain after the first visible tuple, and modified cluster.c to use that for the HOT chain following. This allowed me to dumb down index_next()/index_fetch() so that index_fetch() only needs to find the first visible tuple in a HOT chain. > I think the second patch needs a fair amount of work. The fact that > _bt_getindextuple can fail to get the right tuple seems quite bogus; > what I think it means is that you've attached the functionality in the > wrong place. nbtree certainly had its hands on the right tuple at some > point, and what you should have done was saved the tuple aside at that > point. I feel it's important that an indexam either be able to give > back tuples or not; this "maybe I can" semantics will prevent us from > doing many interesting things with the capability. (More about that > in an upcoming message.) Yeah, I think you're right. That seemed like the quickest way to get it done, but I'm seeing that it's also pretty bad from a performance standpoint - the index page needs to be locked and unlocked for each tuple, which in a quick microbenchmark makes the index scan slower than a seqscan when the data is in cache. I'm seeing about 10-15% of CPU time spent on that locking, and the index scan is about that much slower than the seq scan. That can certainly be improved. Unfortunately we can't just return pointers to the pinned index page in shared buffer, because a pin doesn't stop page splits or moving tuples like it does on heap pages. But we can copy all matching tuples to local memory along with the matching TIDs when we step on a page. > I concur with the objection to "regurgitate" terminology, it seems a > bit yucky. Apparently that word evokes stronger images in native speakers :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: