Re: index-only scans - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: index-only scans |
Date | |
Msg-id | CA+TgmoaN6pE3fmEna6SKLHjcV5TXzisk26P88edctQJXzA-fQQ@mail.gmail.com Whole thread Raw |
In response to | Re: index-only scans (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Fri, Oct 7, 2011 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. The way that nodeIndexscan.c builds up the faux heap tuple is >> perhaps susceptible to improvement. I thought about building a >> virtual tuple, but then what do I do with an OID column, if I have >> one? Or maybe this should be done some other way altogether. > > I switched it to use a virtual tuple for now, and just not attempt to > use index-only scans if a system column is required. We're likely to > want to rethink this anyway, because as currently constituted the code > can't do anything with an expression index, and avoiding recalculation > of an expensive function could be a nice win. But the approach of > just building a faux heap tuple fundamentally doesn't work for that. Figuring out how to fix that problem likely requires more knowledge of the executor than I have got. >> 2. Suppose we scan one tuple on a not-all-visible page followed by 99 >> tuples on all-visible pages. The code as written will hold the pin on >> the first heap page for the entire scan. As soon as we hit the end of >> the scan or another tuple where we have to actually visit the page, >> the old pin will be released, but until then we hold onto it. > > I did not do anything about this issue --- ISTM it needs performance > testing. I'm actually less worried about any performance problem than I am about the possibility of holding up VACUUM. That can happen the old way, too, but now the pin could stay on the same page for quite a while even when the scan is advancing. I think we maybe ought to think seriously about solving the problem at the other end, though - either make VACUUM skip pages that it can't get a cleanup lock on without blocking (except in anti-wraparound mode) or have it just do the amount of work that can be done with an exclusive lock (i.e. prune but not defragment, which would work even in anti-wraparound mode). That would solve the problems of (1) undetected VACUUM deadlock vs. a buffer pin, (2) indefinite VACUUM stall due to a suspended query, and (3) this issue. >> 3. The code in create_index_path() builds up a bitmapset of heap >> attributes that get used for any purpose anywhere in the query, and >> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every >> index under consideration. However, if it were somehow possible to >> have the rel involved without using any attributes at all, we'd >> rebuild the cache over and over, since it would never become non-NULL. > > I dealt with this by the expedient of getting rid of the caching ;-). > It's not clear to me that it was worth the trouble, and in any case > it's fundamentally wrong to suppose that every index faces the same > set of attributes it must supply. It should not need to supply columns > that are only needed in indexquals or index predicate conditions. > I'm not sure how to deal with those refinements cheaply enough, but > the cache isn't helping. Oh, hmm. >> 4. There are a couple of cases that use index-only scans even though >> the EXPLAIN output sort of makes it look like they shouldn't. For >> example, in the above queries, an index-only scan is chosen even >> though the query does "SELECT *" from the table being scanned. Even >> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems >> that the target list of an EXISTS query is in fact discarded, e.g.: > > The reason it looks that way is that we're choosing to use a "physical > result tuple" to avoid an ExecProject step at runtime. There's nothing > wrong with the logic, it's just that EXPLAIN shows something that might > mislead people. I wonder if we oughta do something about that. I was also thinking we should probably make EXPLAIN ANALYZE display the number of heap fetches, so that you can see how index-only your index-only scan actually was. >> 5. We haven't made any planner changes at all, not even for cost >> estimation. It is not clear to me what the right way to do cost >> estimation here is. > > Yeah, me either. For the moment I put in a hard-wired estimate that > only 90% of the heap pages would actually get fetched. This is > conservative and only meant to ensure that the planner picks an > index-only-capable plan over an indexscan with a non-covering index, > all else being equal. We'll need to do performance testing before > we can refine that. Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: