Re: the big picture for index-only scans - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: the big picture for index-only scans
Date
Msg-id 4DC92EBE020000250003D4ED@gw.wicourts.gov
Whole thread Raw
In response to Re: the big picture for index-only scans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: the big picture for index-only scans  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Simon Riggs <simon@2ndQuadrant.com> wrote:
>>> This topic has been discussed many times, yet I have never seen
>>> an assessment that explains WHY we would want to do index-only
>>> scans.
>  
>> In databases with this feature, it's not too unusual for a query
>> which uses just an index to run one or more orders of magnitude
>> faster than a query which has to randomly access the heap for
>> each index entry.  That seems like enough evidence of its
>> possible value in PostgreSQL to proceed to the point where
>> benchmarks become possible.  I'm assuming that, like all other
>> features added as performance optimizations, it won't be
>> committed until there are benchmarks showing the net benefit.
>  
>> As a thought experiment, picture the relative costs of scanning a
>> portion of an index in index sequence, and being done, versus
>> scanning a portion of an index in index sequence and jumping to a
>> random heap access for each index entry as you go.
> 
> It's already the case that we'll flip over to a bitmap indexscan,
> and thus get rid of most/all of the "random" page accesses, in
> situations where this is likely to be a big win.  Pointing to the
> performance difference in databases that don't do that is
> therefore not too convincing.
Sure.  Of course, if you're only accessing twenty thousand rows from
a table containing fifty million rows, bitmap index scans could come
out pretty close to random access times; but on the whole I agree
that the scale of benefit in PostgreSQL won't tend to match what
people see in other products.  Note that my words were "enough
evidence of its possible value in PostgreSQL to proceed to the point
where benchmarks become possible."
In particular, we might want to somehow try to make clear to people
that the very wide indexes they are accustomed to creating to allow
this optimization in other products might be inefficient compared to
creating several one-column indexes which would enable bitmap
logical operations.
> I'm inclined to agree that index-only scans might be worth the
> amount of work that's involved
So we agree there.
> ... but I share Simon's desire to see some proof before anything
> gets committed.
And we agree there.  In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.
My overall gut feel is that there will be some circumstances where
the "covering index" optmization is much faster, and some where
people expect it to be, but it isn't.  The trickiest part of this
might be developing a costing model which allows us to make the
right choice most of the time.  And even if we get it perfect, we
can expect questions about why the covering index wasn't used, and
requests for hints so they can force it.  :-(
-Kevin


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: collateral benefits of a crash-safe visibility map
Next
From: Merlin Moncure
Date:
Subject: Re: hint bit cache v5