Re: proposal: be smarter about i/o patterns in index scan - Mailing list pgsql-hackers

From Jeffrey W. Baker
Subject Re: proposal: be smarter about i/o patterns in index scan
Date
Msg-id 1084997895.19249.22.camel@heat
Whole thread Raw
In response to Re: proposal: be smarter about i/o patterns in index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: proposal: be smarter about i/o patterns in index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2004-05-19 at 13:10, Tom Lane wrote:
> "Glen Parker" <glenebob@nwlink.com> writes:
> >> Not unless you add yet another sort step after the fetch step, that is
> >> the idea becomes
> >> 1. read index to get candidate TIDs
> >> 2. sort TIDs into physical order
> >> 3. read heap in physical order, check row visibility
> >> 4. sort selected rows into index order
> 
> > Or:
> >   2. Sort AND COPY TID's into physical order.
> >   3. Read heap in phy. order, match results to un-sorted TID list.
> > That sounds quite cheap.
> 
> No, it sounds like handwaving.  What's your "match results" step, and
> does it take less than O(N^2) time?  How do you get to *return* the
> tuples in index order, without having read them all before you return
> the first one?  (Which is really what the problem is with the sort
> alternative, at least for fast-start cases such as the LIMIT 1 example.)
> What happens when you run out of memory for the list of TIDs ... er,
> make that two lists of TIDs?

No doubt, you can't just naively create giant vectors of TIDs and expect
to sort them.  Is there any concept in Pg of an unrealized result?  If
you scanned an index building up a result set that was totally
unrealized, except for the TID and the index columns, you could cheaply
join two such results without ever touching the heap.  You could also
use the existing Sort execution step to sort such a result.  Then don't
touch the heap something accesses a non-index column, or because you are
returning the result somewhere and need to satisfy MVCC visibility
limits.

This would lay down the foundation needed by your TIDExpand, and could
also be a useful concept in other areas.

I acknowledge my own significant handwaving on this subject :)  From
your point of view everyone is going to be handwaving, because we don't
have your depth of experience with this code.

-jwb



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: proposal: be smarter about i/o patterns in index scan
Next
From: Tom Lane
Date:
Subject: Re: proposal: be smarter about i/o patterns in index scan