Re: the big picture for index-only scans - Mailing list pgsql-hackers

From Robert Haas
Subject Re: the big picture for index-only scans
Date
Msg-id BANLkTin5N1g=_TbikGPkL=bjyQMvXLqB+w@mail.gmail.com
Whole thread Raw
In response to Re: the big picture for index-only scans  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
List pgsql-hackers
On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> 2011/5/10 Robert Haas <robertmhaas@gmail.com>:
>> On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
>> <cedric.villemain.debian@gmail.com> wrote:
>>> ANALYZE can do the stats job for 'free' on the pages it collects
>>> anyway. So that looks like a good idea.
>>> I believe the really lazy vacuum is another topic; even if it will
>>> improve the performance of the index only scan to have tables already
>>> vacuuumed, the stats should expose that and the function
>>> cost_index(_only?)() taking care of that.
>>
>> I basically agree.  The connection is that - as we use the all-visible
>> for more things, the performance penalty for failing to vacuum (say)
>> an insert-only table will continue to grow.  Still, as you say,
>> clearly a separate topic.
>>
>>> The temptation is high to estimate the cost of an "index_scan(only) +
>>> ordered(by ctid) table pages fetch if heap required". (this is what I
>>> understood from heikki suggestion 3-4. and it makes sense). It may be
>>> easier to implement both at once but I didn't find the branch in the
>>> Heikki's git repos. (probably removed since the long time)
>>
>> I was thinking about this as well, at least if I understand you
>> correctly.  That would be similar to a bitmap index scan, and I think
>> it would be a great thing to have, not only because it would allow us
>> to get the advantages of index-only scans in situations that are
>> well-suited to our current bitmap scans, but also because it could be
>> batched.  You could allocate a buffer of work_mem bytes and fill it up
>> with TIDs; then, when it's full, you sort the buffer and start doing
>> the necessary heap fetches in physical order.  If you still need more
>> rows, you can clear the buffer and go around for another pass.
>>
>>> Based on ANALYZE stats for the visibility, I believe cost_index and
>>> cost_index_only should be very similar functions (well, atm, I don't
>>> see the point to split it in 2 functions).
>>
>> Yeah, I would more imagine modifying the existing function.
>>
>>>> Any thoughts welcome.  Incidentally, if anyone else feels like working
>>>> on this, feel free to let me know and I'm happy to step away, from all
>>>> of it or from whatever part someone else wants to tackle.  I'm mostly
>>>> working on this because it's something that I think we really need to
>>>> get done, more than having a burning desire to be the one who does it.
>>>
>>> Indexonly scans are welcome!
>>> I believe I can help on 3 and 4, but (really) not sure for 1 and 2.
>>
>> Well, I have code for #1, and just need reviews, and #2 shouldn't be
>> that hard, and with luck I'll twist Bruce's arm into doing it (*waves
>> to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
>> on what/how you'd like to contribute there?
>
> I can provide initial patchs for cost and analyze, at least.

OK, cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Process wakeups when idle and power consumption
Next
From: Robert Haas
Date:
Subject: Re: crash-safe visibility map, take five