Re: proposal: be smarter about i/o patterns in index scan - Mailing list pgsql-hackers
From | Jeffrey W. Baker |
---|---|
Subject | Re: proposal: be smarter about i/o patterns in index scan |
Date | |
Msg-id | 1084996574.19249.8.camel@heat Whole thread Raw |
In response to | Re: proposal: be smarter about i/o patterns in index scan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: proposal: be smarter about i/o patterns in index scan
|
List | pgsql-hackers |
On Wed, 2004-05-19 at 11:56, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > Are you saying that index scan results are sorted by something other > > than the index key? Because in my scheme they would still be sorted by > > index key. > > Not unless you add yet another sort step after the fetch step, that is > the idea becomes > 1. read index to get candidate TIDs > 2. sort TIDs into physical order > 3. read heap in physical order, check row visibility > 4. sort selected rows into index order > > That would start to look like an unpleasant amount of overhead, since > the sort would have to be over the entire output; you couldn't divide > the scan into reasonable-size batches, whereas steps 1-3 can easily > be run in batched style. > > Moreover, this'd not be any win compared to just dropping the assumption > of sorted output; the planner could stick in the same sort for itself if > it decided the cost was worth it. > > What you probably want to do is support both our traditional indexscan > style and an "unsorted indexscan" plan type that delivers unsorted > output (implementing steps 1-3 above). With appropriate cost estimation > the planner could make an intelligent choice between the two. > I'm too lazy to look in the archives right now, but my private notes > about this say: Thanks for the notes. Introducing a new query execution step sounds like a better/easier idea than was I was going to do, which was to rework some of the access methods to operate on vectors instead of scalars. That idea looks increasingly difficult to implement. > We could improve indexscan performance if we were to read a bunch of TIDs from > the index, sort in page number order, and access the heap tuples in page > number order. But probably need a batch size in the thousands of TIDs to get > any really useful effect. Depends on the query, but I got an improvement by half from batches of 400. > This would not work very well given the present > assumption that we hold a pin on the index page until we have fetched the > tuple (cf lazy VACUUM). Pinning lots of index pages is no good for > concurrency or buffer usage. QUESTION: is that assumption really necessary? > Can't we just make the code ignore TIDs that lead to deleted tuples? Worst > case is that we fetch a newly-inserted tuple that does not actually have the > expected index value, but won't we ignore it because of MVCC tqual rules? > (This does NOT work for dirty read or SnapshotNow cases, but we could probably > set things up to only try this optimization for standard snapshot reads.) > An even nicer property is that we could loop inside the index AM to scoop up > tuples, saving locking overhead. This might be worth doing even when we want > to keep returning the tuples in index order. I think this is *definitely* worth doing. The current implementation in the vicinity of index_getnext wastes valuable opportunities to optimize the io and/or to let the kernel do a better job. Descending into index_getnext -> heap_getnext for every tuple looks expensive in my application. Thanks again for your notes I'll revisit the source code and see if I can come up with a plan. -jwb PS Regarding the archive, I've never seen it work. I use MARC instead.
pgsql-hackers by date: