Re: the big picture for index-only scans - Mailing list pgsql-hackers

From Robert Haas
Subject Re: the big picture for index-only scans
Date
Msg-id BANLkTim1rfyYey45m_Cc4xzcn=ugwKMbCQ@mail.gmail.com
Whole thread Raw
In response to Re: the big picture for index-only scans  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Responses Re: the big picture for index-only scans  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Re: the big picture for index-only scans  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Re: the big picture for index-only scans  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> ANALYZE can do the stats job for 'free' on the pages it collects
> anyway. So that looks like a good idea.
> I believe the really lazy vacuum is another topic; even if it will
> improve the performance of the index only scan to have tables already
> vacuuumed, the stats should expose that and the function
> cost_index(_only?)() taking care of that.

I basically agree.  The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow.  Still, as you say,
clearly a separate topic.

> The temptation is high to estimate the cost of an "index_scan(only) +
> ordered(by ctid) table pages fetch if heap required". (this is what I
> understood from heikki suggestion 3-4. and it makes sense). It may be
> easier to implement both at once but I didn't find the branch in the
> Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

> Based on ANALYZE stats for the visibility, I believe cost_index and
> cost_index_only should be very similar functions (well, atm, I don't
> see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

>> Any thoughts welcome.  Incidentally, if anyone else feels like working
>> on this, feel free to let me know and I'm happy to step away, from all
>> of it or from whatever part someone else wants to tackle.  I'm mostly
>> working on this because it's something that I think we really need to
>> get done, more than having a burning desire to be the one who does it.
>
> Indexonly scans are welcome!
> I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
on what/how you'd like to contribute there?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: the big picture for index-only scans
Next
From: Cédric Villemain
Date:
Subject: Re: the big picture for index-only scans