Re: Page at a time index scan - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Page at a time index scan
Date
Msg-id 1146577444.9599.450.camel@localhost.localdomain
Whole thread Raw
In response to Page at a time index scan  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Page at a time index scan
List pgsql-patches
On Mon, 2006-05-01 at 22:00 +0300, Heikki Linnakangas wrote:
> Here's a patch that implements page at a time index scans discussed at
> pgsql-hackers earlier. See proposal 1 at:
> http://archives.postgresql.org/pgsql-hackers/2006-03/msg01237.php
>
> It passes regression tests, and there's no known bugs. There's
> some minor issues I'd like to point out, though:
>
> 1. An index scan now needs to allocate enough memory to hold potentially a
> whole page worth of items. And if you use markpos/restrpos, twice that
> much. I don't know if that's an issue, but I thought I'd bring that up.

AFAICS the code:

- allocates memory for the markpos whether or not its ever needed? Most
index scans never call markpos and not all merge joins either, so that
seems wasteful. We could allocate when btmarkpos() is called for the
first time, if ever.

- allocates 1024 offsets in every case. If this were just a unique index
retrieval, that would be too much. When scan->is_multiscan == true, go
straight for 1024, otherwise start with just 32 offsets and double that
when/if required.

> 2. Vacuum is now done in one phase, scanning the index in physical order.
> That significantly speeds up index vacuums of large indexes that don't fit
> into memory.

Also for those that *aren't in* memory. Should speed up medium-sized
VACUUMs also because of sequential disk access.

> However, btbulkdelete doesn't know if the vacuum is a full or
> lazy one. The patch just assumes it's a lazy vacuum, but the API really
> needs to be changed to pass that information earlier than at vacuum_cleanup.

Looks like it needs work. Do you have suggestions while you're there?

> 3. Before the patch, a scan would keep the current page pinned to keep
> vacuum from deleting the current item. The patch doesn't change that
> behaviour, but it now seems to me that even a pin is no longer needed.

Agreed. The pin has two functions:
- keep the page from being moved out of the bufmgr - no need anymore
- stop a vacuum from removing the page - no need anymore. We'll not stop
on a removable row anymore, so no need.

Some of the code doesn't use standard spacing e.g. "if(" should be "if
(", but other than that it looks very neat and well implemented.

Overall, I'm optimistic that this patch will help in a number of ways.
Speeding up a VACUUM index scan is a primary objective and it looks like
that will work well. Also, this looks like it will reduce LWLocking
overhead and encourage sequential memory scans of blocks, both of which
will improve index scan performance. It should also reduce buffer
residency time making shared_buffers more fluid. So, subject to
performance tests of this I'm very interested in this.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com


pgsql-patches by date:

Previous
From: Dhanaraj M
Date:
Subject: Patch - Have psql show current values for a sequence
Next
From: Heikki Linnakangas
Date:
Subject: Re: Page at a time index scan