Re: Dead Space Map - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Dead Space Map
Date
Msg-id Pine.OSF.4.61.0602280908490.76425@kosh.hut.fi
Whole thread Raw
In response to Re: Dead Space Map  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, 27 Feb 2006, Tom Lane wrote:

> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On Mon, 27 Feb 2006, Tom Lane wrote:
>>> This strikes me as a fairly bad idea, because it makes VACUUM dependent
>>> on correct functioning of user-written code --- consider a functional
>>> index involving a user-written function that was claimed to be immutable
>>> and is not.
>
>> If the user-defined function is broken, you're in more or less trouble
>> anyway.
>
> Less.  A non-immutable function might result in lookup failures (not
> finding the row you need) but not in database corruption, which is what
> would ensue if VACUUM fails to remove an index tuple.  The index entry
> would eventually point to a wrong table entry, after the table item slot
> gets recycled for another tuple.

It's easy to detect when it happens (that you don't find the row in the 
index). You can then complain loudly or fall back to the full scan.

> Moreover, you haven't pointed to any strong reason to adopt this
> methodology.  It'd only be a win when vacuuming pretty small numbers
> of tuples, which is not the design center for VACUUM, and isn't likely
> to be the case in practice either if you're using autovacuum.  If you're
> removing say 1% of the tuples, you are likely to be hitting every index
> page to do it, meaning that the scan approach will be significantly
> *more* efficient than retail lookups.

It really depends a lot on what kind of indexes the table has, clustering 
order, and what kind of deletions and updates happen. Assuming a random 
distribution of dead tuples, a scan is indeed going to be more efficient 
as you say.

Assuming a tightly packed bunch of dead tuples, however, say after a 
"delete from log where timestamp < now() - 1 month", you would benefit 
from a partial vacuum.

That's certainly something that needs to be tested. It's important to 
implement so that it falls back to full scan when it's significantly 
faster.

- Heikki


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Dead Space Map
Next
From: Hans-Juergen Schoenig
Date:
Subject: Re: new feature: LDAP database name resolution