Re: the big picture for index-only scans - Mailing list pgsql-hackers

From Cédric Villemain
Subject Re: the big picture for index-only scans
Date
Msg-id BANLkTi=cefjoKR1hJ5RLEPsvRUVRBxg0dw@mail.gmail.com
Whole thread Raw
In response to Re: the big picture for index-only scans  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: the big picture for index-only scans  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2011/5/10 Robert Haas <robertmhaas@gmail.com>:
> On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>> ANALYZE can do the stats job for 'free' on the pages it collects
>> anyway. So that looks like a good idea.
>> I believe the really lazy vacuum is another topic; even if it will
>> improve the performance of the index only scan to have tables already
>> vacuuumed, the stats should expose that and the function
>> cost_index(_only?)() taking care of that.
>
> I basically agree.  The connection is that - as we use the all-visible
> for more things, the performance penalty for failing to vacuum (say)
> an insert-only table will continue to grow.  Still, as you say,
> clearly a separate topic.
>
>> The temptation is high to estimate the cost of an "index_scan(only) +
>> ordered(by ctid) table pages fetch if heap required". (this is what I
>> understood from heikki suggestion 3-4. and it makes sense). It may be
>> easier to implement both at once but I didn't find the branch in the
>> Heikki's git repos. (probably removed since the long time)
>
> I was thinking about this as well, at least if I understand you
> correctly.  That would be similar to a bitmap index scan, and I think
> it would be a great thing to have, not only because it would allow us
> to get the advantages of index-only scans in situations that are
> well-suited to our current bitmap scans, but also because it could be
> batched.  You could allocate a buffer of work_mem bytes and fill it up
> with TIDs; then, when it's full, you sort the buffer and start doing
> the necessary heap fetches in physical order.  If you still need more
> rows, you can clear the buffer and go around for another pass.
>
>> Based on ANALYZE stats for the visibility, I believe cost_index and
>> cost_index_only should be very similar functions (well, atm, I don't
>> see the point to split it in 2 functions).
>
> Yeah, I would more imagine modifying the existing function.
>
>>> Any thoughts welcome.  Incidentally, if anyone else feels like working
>>> on this, feel free to let me know and I'm happy to step away, from all
>>> of it or from whatever part someone else wants to tackle.  I'm mostly
>>> working on this because it's something that I think we really need to
>>> get done, more than having a burning desire to be the one who does it.
>>
>> Indexonly scans are welcome!
>> I believe I can help on 3 and 4, but (really) not sure for 1 and 2.
>
> Well, I have code for #1, and just need reviews, and #2 shouldn't be
> that hard, and with luck I'll twist Bruce's arm into doing it (*waves
> to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
> on what/how you'd like to contribute there?

I can provide initial patchs for cost and analyze, at least.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: the big picture for index-only scans
Next
From: Jesper Krogh
Date:
Subject: Re: crash-safe visibility map, take five