Re: proposal: be smarter about i/o patterns in index scan - Mailing list pgsql-hackers

From Jeffrey W. Baker
Subject Re: proposal: be smarter about i/o patterns in index scan
Date
Msg-id 1084996574.19249.8.camel@heat
Whole thread Raw
In response to Re: proposal: be smarter about i/o patterns in index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: proposal: be smarter about i/o patterns in index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2004-05-19 at 11:56, Tom Lane wrote:
> "Jeffrey W. Baker" <jwbaker@acm.org> writes:
> > Are you saying that index scan results are sorted by something other
> > than the index key?  Because in my scheme they would still be sorted by
> > index key.
> 
> Not unless you add yet another sort step after the fetch step, that is
> the idea becomes
>     1. read index to get candidate TIDs
>     2. sort TIDs into physical order
>     3. read heap in physical order, check row visibility
>     4. sort selected rows into index order
> 
> That would start to look like an unpleasant amount of overhead, since
> the sort would have to be over the entire output; you couldn't divide
> the scan into reasonable-size batches, whereas steps 1-3 can easily
> be run in batched style.
> 
> Moreover, this'd not be any win compared to just dropping the assumption
> of sorted output; the planner could stick in the same sort for itself if
> it decided the cost was worth it.
> 
> What you probably want to do is support both our traditional indexscan
> style and an "unsorted indexscan" plan type that delivers unsorted
> output (implementing steps 1-3 above).  With appropriate cost estimation
> the planner could make an intelligent choice between the two.

> I'm too lazy to look in the archives right now, but my private notes
> about this say:

Thanks for the notes.  Introducing a new query execution step sounds
like a better/easier idea than was I was going to do, which was to
rework some of the access methods to operate on vectors instead of
scalars.  That idea looks increasingly difficult to implement.

> We could improve indexscan performance if we were to read a bunch of TIDs from
> the index, sort in page number order, and access the heap tuples in page
> number order.  But probably need a batch size in the thousands of TIDs to get
> any really useful effect.

Depends on the query, but I got an improvement by half from batches of
400.

> This would not work very well given the present
> assumption that we hold a pin on the index page until we have fetched the
> tuple (cf lazy VACUUM).  Pinning lots of index pages is no good for
> concurrency or buffer usage.  QUESTION: is that assumption really necessary?
> Can't we just make the code ignore TIDs that lead to deleted tuples?  Worst
> case is that we fetch a newly-inserted tuple that does not actually have the
> expected index value, but won't we ignore it because of MVCC tqual rules?
> (This does NOT work for dirty read or SnapshotNow cases, but we could probably
> set things up to only try this optimization for standard snapshot reads.)
> An even nicer property is that we could loop inside the index AM to scoop up
> tuples, saving locking overhead.  This might be worth doing even when we want
> to keep returning the tuples in index order.

I think this is *definitely* worth doing.  The current implementation in
the vicinity of index_getnext wastes valuable opportunities to optimize
the io and/or to let the kernel do a better job.  Descending into
index_getnext -> heap_getnext for every tuple looks expensive in my
application.

Thanks again for your notes I'll revisit the source code and see if I
can come up with a plan.

-jwb

PS Regarding the archive, I've never seen it work.  I use MARC instead.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: proposal: be smarter about i/o patterns in index scan
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Call for 7.5 feature completion