Re: the big picture for index-only scans - Mailing list pgsql-hackers

From Robert Haas
Subject Re: the big picture for index-only scans
Date
Msg-id BANLkTikZY8RzB9ZcqbxZNs3=Rd=8MMkZ0g@mail.gmail.com
Whole thread Raw
In response to Re: the big picture for index-only scans  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: the big picture for index-only scans  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> 1. The visibility map needs to be crash-safe.  The basic idea of
>> index-only scans is that, instead of checking the heap to find out
>> whether each tuple is visible, we first check the visibility map.  If
>> the visibility map bit is set, then we know all tuples on the page are
>> visible to all transactions, and therefore the tuple of interest is
>> visible to our transaction.  Assuming that a significant number of
>> visibility map bits are set, this should enable us to avoid a fair
>> amount of I/O, especially on large tables, because the visibility map
>> is roughly 8000 times smaller than the heap, and therefore far more
>> practical to keep in cache.
>
> hm, what are the implications for tuple hint bits, short and long
> term?  I'm particularly interested if you think any hint bit i/o
> mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase.  It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set).  Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible.  So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up.  But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.

>> 2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
>> visibility map crash-safe in 9.2, people are going to want to use
>> pg_upgrade to migrate from older versions, bringing their
>> possibly-not-quite-correct visibility map forks along with them.  How
>> should we handle that?  We could (2A) arrange to have pg_upgrade nuke
>> all visibility forks when upgrading from a release where the
>> visibility map is not crash-safe to one where it is;
>
> +1 on 2A.

OK.  Anybody else?

>> 3. Statistics.  I believe that in order to accurately estimate the
>> cost of an index-only scan, we're going to need to know the fraction
>> of tuples that are on pages whose visibility map bits are set.
>
> It would be helpful to know the performance benefit of index only
> scans before knowing how much benefit to attribute here.  Maybe a
> system wide kludge would for starters anyway, like assuming 60% of
> pages can be vis checked from the VM, or a single GUC, Then again,
> maybe not.

Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff.  Then we could
actually benchmark it a bit, which would be good.  I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: "stored procedures" - use cases?
Next
From: Merlin Moncure
Date:
Subject: Re: crash-safe visibility map, take five