Index vacuum improvements - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Index vacuum improvements
Date
Msg-id Pine.OSF.4.61.0603282123020.235785@kosh.hut.fi
Whole thread Raw
Responses Re: Index vacuum improvements
Re: Index vacuum improvements
List pgsql-hackers
Hi,

As we know, index vacuum could be sped up significantly if it didn't have 
to lock every page in left to right direction because of the integrity issue 
described in nbtree/README. We could then scan the index in physical 
order, and AFAICS combine the btbulkdelete and btvacuumcleanup logic to 
just one scan.

Several approaches have been proposed before, see prior discussion:

http://archives.postgresql.org/pgsql-hackers/2004-04/msg00829.php
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00927.php

The proposals this far have tried to solve the problem by changing the 
split and/or vacuum algorithms to make sure that a tuple isn't deleted if 
an index scan is stopped on it. That has proven to be hard, so I'm 
proposing two alternatives that change the index scan instead:

1. Instead of stopping on the first matching tuple, scan the whole index 
page for all matching entries at once. Then move one page to the 
right, and logically stop before the first (not including hikey) item on 
the page. Since the scan is not stopped on any specific tuple, it cannot 
be deleted or moved. When the scan continues, it can just start scanning 
from the beginning of the page.

This relies on the fact that items are never moved, except on page split 
when they're always moved to the right and to a new page. Page deletion is 
already done in two phases ensuring that the page doesn't get deleted 
while a scan is stopped on it.

2. Alternatively, the index scan could store the location of the last 
known non-deletable tuple it has encountered, in addition to the tuple it 
stops on. When a stopped scan continues, it checks if the tuple it was 
stopped on is still on the same page. If it's not, instead of moving 
right to find it, relocate the last known non-deletable tuple and 
continue the scan from there. There can't be any visible tuples between 
the tuple it stopped on and the last known non-deletable tuple, because 
we would have encountered it before, and would know by now that it's 
non-deletable.

What do these ideas sound like? Anything I've missed?

- Heikki


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Issue on Varchar Ordering
Next
From: "Rodrigo Hjort"
Date:
Subject: Re: Issue on Varchar Ordering