Re: proposal: be smarter about i/o patterns in index scan - Mailing list pgsql-hackers

From Jeffrey W. Baker
Subject Re: proposal: be smarter about i/o patterns in index scan
Date
Msg-id 1084997260.19249.16.camel@heat
Whole thread Raw
In response to Re: proposal: be smarter about i/o patterns in index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2004-05-19 at 12:55, Tom Lane wrote:
> "Glen Parker" <glenebob@nwlink.com> writes:
> > What am I missing?  Why is a performance bottle neck of this magnitude not
> > on the same list of priorities as PITR, replication, and Win32?
> 
> It's higher on my personal to-do list than most of those ;-).  But those
> things are getting done because there are other developers with other
> priorities.  I suspect also that the set of people competent to make
> this change is much smaller than the set of people able to work on the
> other points.  In my mind, most of the issue is in the planner (figuring
> out what to do with an unsorted-indexscan option) and relatively few
> people have wanted to touch the planner.
> 
> > Here's one answer: If you had to sort every result set, even when an index
> > could have been used, overall performance would still improve by a very
> > large margin.  I'd bet money on it.
> 
> For a counterexample I refer you to our standard solution for
> MAX-using-an-index:
> 
>     SELECT ... FROM table ORDER BY foo DESC LIMIT 1;

Yes, that would stink with unordered index result.

> which would become truly spectacularly bad without an ordered index
> scan.  A more general point is that for any indexscan that returns only
> a small number of index entries (eg, any unique-key search) worrying
> about physical-order access will be wasted effort.  The best you could
> hope for is not to be significantly worse than the existing code in such
> cases, and I'm unconvinced you could achieve even that.

I don't think a TID-sorted index scan would be worse than the current
unsorted scan.  You will produce a list of 1, sort that, and fetch the
tuple.  In the current implementation, you only omit the sort, which is
basically free.

> I can assure you that any patch that completely removes the existing
> behavior will be rejected, because there are plenty of cases where it's
> the right thing.

Okay.

> The main thing that unordered indexscan could do for us is extend the
> usefulness of indexscan plans into relatively-poor-selectivity cases
> where we presently tend to drop back to seqscans.  There would still be
> a selectivity threshold above which you might as well use seqscan, but
> it ought to be higher than the fraction-of-a-percent that we currently
> see for indexscans.  What is unknown, and will be unknown until someone
> tries it, is just what range of selectivity this technique might win
> for.  I think such a range exists; I am not real certain that it is wide
> enough to justify a lot of effort in making the idea a reality.

I don't think selectivity is the main issue.  I think the absolute size
of the result is the issue.  The current implementation fairly well
garauntees one seek for each tuple of the result.  Which means that the
cost to access the result is something on the order of 10ms * rows.  So
for any significant result that needs to be fetched from the heap, the
seek cost is going to overwhelm all other costs quickly.

Even though my index is selective, retrieving only 200k rows of 240m row
table, the cost of the sort is justified.

-jwb



pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: commit messages from gforge -> pgsql-committers
Next
From: Nikola Milutinovic
Date:
Subject: Re: [GENERAL] PgSQL 7.4.2 - NaN on Tru64 UNIX - CORRECTION!!!