Thread: Re: Page at a time index scan

Re: Page at a time index scan

From
Heikki Linnakangas
Date:
As usual, I forgot the attachment. Here you go.

On Mon, 1 May 2006, Heikki Linnakangas wrote:

> Here's a patch that implements page at a time index scans discussed at
> pgsql-hackers earlier. See proposal 1 at:
> http://archives.postgresql.org/pgsql-hackers/2006-03/msg01237.php
>
> It passes regression tests, and there's no known bugs. There's some minor
> issues I'd like to point out, though:
>
> 1. An index scan now needs to allocate enough memory to hold potentially a
> whole page worth of items. And if you use markpos/restrpos, twice that much.
> I don't know if that's an issue, but I thought I'd bring that up.
>
> 2. Vacuum is now done in one phase, scanning the index in physical order.
> That significantly speeds up index vacuums of large indexes that don't fit
> into memory. However, btbulkdelete doesn't know if the vacuum is a full or
> lazy one. The patch just assumes it's a lazy vacuum, but the API really needs
> to be changed to pass that information earlier than at vacuum_cleanup.
>
> 3. Before the patch, a scan would keep the current page pinned to keep vacuum
> from deleting the current item. The patch doesn't change that behaviour, but
> it now seems to me that even a pin is no longer needed.
>
> The patch needs testing and review, to ensure it doesn't brake anything, and
> to see the effect on performance. It doesn't change disk layout or catalogs,
> so you can run it using the same data directory as with the unpatched
> version.
>
> - Heikki
>

- Heikki

Attachment

Re: Page at a time index scan

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> Here's a patch that implements page at a time index scans discussed at
>> pgsql-hackers earlier. See proposal 1 at:
>> http://archives.postgresql.org/pgsql-hackers/2006-03/msg01237.php

One potential performance lossage from this is that it partially defeats
the keys_are_unique optimization: bt_checkkeys will be run across all
the matching tuples on the index page even if the waiting caller is
going to stop after the first live one.  (I don't see any way to avoid
that without breaking the entire concept, since we can't know which of
the index entries the caller will think is live.)

I suspect this is not a deal-breaker, but we have to test to make sure
that case isn't getting markedly worse.  The thing to look at would be
unique indexes with expensive comparison functions (eg, text in a
non-C locale).

            regards, tom lane