Re: the big picture for index-only scans - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: the big picture for index-only scans |
Date | |
Msg-id | BANLkTin5N1g=_TbikGPkL=bjyQMvXLqB+w@mail.gmail.com Whole thread Raw |
In response to | Re: the big picture for index-only scans (Cédric Villemain <cedric.villemain.debian@gmail.com>) |
List | pgsql-hackers |
On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > 2011/5/10 Robert Haas <robertmhaas@gmail.com>: >> On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain >> <cedric.villemain.debian@gmail.com> wrote: >>> ANALYZE can do the stats job for 'free' on the pages it collects >>> anyway. So that looks like a good idea. >>> I believe the really lazy vacuum is another topic; even if it will >>> improve the performance of the index only scan to have tables already >>> vacuuumed, the stats should expose that and the function >>> cost_index(_only?)() taking care of that. >> >> I basically agree. The connection is that - as we use the all-visible >> for more things, the performance penalty for failing to vacuum (say) >> an insert-only table will continue to grow. Still, as you say, >> clearly a separate topic. >> >>> The temptation is high to estimate the cost of an "index_scan(only) + >>> ordered(by ctid) table pages fetch if heap required". (this is what I >>> understood from heikki suggestion 3-4. and it makes sense). It may be >>> easier to implement both at once but I didn't find the branch in the >>> Heikki's git repos. (probably removed since the long time) >> >> I was thinking about this as well, at least if I understand you >> correctly. That would be similar to a bitmap index scan, and I think >> it would be a great thing to have, not only because it would allow us >> to get the advantages of index-only scans in situations that are >> well-suited to our current bitmap scans, but also because it could be >> batched. You could allocate a buffer of work_mem bytes and fill it up >> with TIDs; then, when it's full, you sort the buffer and start doing >> the necessary heap fetches in physical order. If you still need more >> rows, you can clear the buffer and go around for another pass. >> >>> Based on ANALYZE stats for the visibility, I believe cost_index and >>> cost_index_only should be very similar functions (well, atm, I don't >>> see the point to split it in 2 functions). >> >> Yeah, I would more imagine modifying the existing function. >> >>>> Any thoughts welcome. Incidentally, if anyone else feels like working >>>> on this, feel free to let me know and I'm happy to step away, from all >>>> of it or from whatever part someone else wants to tackle. I'm mostly >>>> working on this because it's something that I think we really need to >>>> get done, more than having a burning desire to be the one who does it. >>> >>> Indexonly scans are welcome! >>> I believe I can help on 3 and 4, but (really) not sure for 1 and 2. >> >> Well, I have code for #1, and just need reviews, and #2 shouldn't be >> that hard, and with luck I'll twist Bruce's arm into doing it (*waves >> to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts >> on what/how you'd like to contribute there? > > I can provide initial patchs for cost and analyze, at least. OK, cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: