Home > mailing lists

Re: index-only scans - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: index-only scans
Date	October 7, 2011 22:24:05
Msg-id	CA+TgmoaN6pE3fmEna6SKLHjcV5TXzisk26P88edctQJXzA-fQQ@mail.gmail.com Whole thread
In response to	Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Fri, Oct 7, 2011 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
>> perhaps susceptible to improvement.  I thought about building a
>> virtual tuple, but then what do I do with an OID column, if I have
>> one?  Or maybe this should be done some other way altogether.
>
> I switched it to use a virtual tuple for now, and just not attempt to
> use index-only scans if a system column is required.  We're likely to
> want to rethink this anyway, because as currently constituted the code
> can't do anything with an expression index, and avoiding recalculation
> of an expensive function could be a nice win.  But the approach of
> just building a faux heap tuple fundamentally doesn't work for that.

Figuring out how to fix that problem likely requires more knowledge of
the executor than I have got.

>> 2. Suppose we scan one tuple on a not-all-visible page followed by 99
>> tuples on all-visible pages.  The code as written will hold the pin on
>> the first heap page for the entire scan.  As soon as we hit the end of
>> the scan or another tuple where we have to actually visit the page,
>> the old pin will be released, but until then we hold onto it.
>
> I did not do anything about this issue --- ISTM it needs performance
> testing.

I'm actually less worried about any performance problem than I am
about the possibility of holding up VACUUM.  That can happen the old
way, too, but now the pin could stay on the same page for quite a
while even when the scan is advancing.

I think we maybe ought to think seriously about solving the problem at
the other end, though - either make VACUUM skip pages that it can't
get a cleanup lock on without blocking (except in anti-wraparound
mode) or have it just do the amount of work that can be done with an
exclusive lock (i.e. prune but not defragment, which would work even
in anti-wraparound mode).  That would solve the problems of (1)
undetected VACUUM deadlock vs. a buffer pin, (2) indefinite VACUUM
stall due to a suspended query, and (3) this issue.

>> 3. The code in create_index_path() builds up a bitmapset of heap
>> attributes that get used for any purpose anywhere in the query, and
>> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
>> index under consideration.  However, if it were somehow possible to
>> have the rel involved without using any attributes at all, we'd
>> rebuild the cache over and over, since it would never become non-NULL.
>
> I dealt with this by the expedient of getting rid of the caching ;-).
> It's not clear to me that it was worth the trouble, and in any case
> it's fundamentally wrong to suppose that every index faces the same
> set of attributes it must supply.  It should not need to supply columns
> that are only needed in indexquals or index predicate conditions.
> I'm not sure how to deal with those refinements cheaply enough, but
> the cache isn't helping.

Oh, hmm.

>> 4. There are a couple of cases that use index-only scans even though
>> the EXPLAIN output sort of makes it look like they shouldn't.  For
>> example, in the above queries, an index-only scan is chosen even
>> though the query does "SELECT *" from the table being scanned.  Even
>> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
>> that the target list of an EXISTS query is in fact discarded, e.g.:
>
> The reason it looks that way is that we're choosing to use a "physical
> result tuple" to avoid an ExecProject step at runtime.  There's nothing
> wrong with the logic, it's just that EXPLAIN shows something that might
> mislead people.

I wonder if we oughta do something about that.

I was also thinking we should probably make EXPLAIN ANALYZE display
the number of heap fetches, so that you can see how index-only your
index-only scan actually was.

>> 5. We haven't made any planner changes at all, not even for cost
>> estimation.  It is not clear to me what the right way to do cost
>> estimation here is.
>
> Yeah, me either.  For the moment I put in a hard-wired estimate that
> only 90% of the heap pages would actually get fetched.  This is
> conservative and only meant to ensure that the planner picks an
> index-only-capable plan over an indexscan with a non-covering index,
> all else being equal.  We'll need to do performance testing before
> we can refine that.

Yep.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 07 October 2011, 22:11:00
Subject: Re: Why does WAL_DEBUG macro need to be defined by default?

From: Bruce Momjian
Date: 07 October 2011, 23:00:16
Subject: Re: Why does WAL_DEBUG macro need to be defined by default?

Re: index-only scans - Mailing list pgsql-hackers

Previous

Next