Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Plans for solving the VACUUM problem
Date
Msg-id 14010.990164493@sss.pgh.pa.us
Whole thread Raw
In response to Re: Plans for solving the VACUUM problem  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> The only question I have is about the Free Space Map.  It would seem
> better to me if we could get this map closer to the table itself, rather
> than having every table of every database mixed into the same shared
> memory area.  I can just see random table access clearing out most of
> the map cache and perhaps making it less useless.

What random access?  Read transactions will never touch the FSM at all.
As for writes, seems to me the places you are writing are exactly the
places you need info for.

You make a good point, which is that we don't want a schedule-driven
VACUUM to load FSM entries for unused tables into the map at the cost
of throwing out entries that *are* being used.  But it seems to me that
that's easily dealt with if we recognize the risk.

> It would be nice if we could store the map on the first page of the disk
> table, or store it in a flat file per table.  I know both of these ideas
> will not work,

You said it.  What's wrong with shared memory?  You can't get any closer
than shared memory: keeping maps in the files would mean you'd need to
chew up shared-buffer space to get at them.  (And what was that about
random accesses causing your maps to get dropped?  That would happen
for sure if they live in shared buffers.)

Another problem with keeping stuff in the first page: what happens when
the table gets big enough that 8k of map data isn't really enough?
With a shared-memory area, we can fairly easily allocate a variable
amount of space based on total size of a relation vs. total size of
relations under management.

It is true that a shared-memory map would be useless at system startup,
until VACUUM has run and filled in some info.  But I don't see that as
a big drawback.  People who aren't developers like us don't restart
their postmasters every five minutes.

> Another advantage of centralization is that we can record update/delete
> counters per table, helping tell vacuum where to vacuum next.  Vacuum
> roaming around looking for old tuples seems wasteful.

Indeed.  But I thought you were arguing against centralization?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Plans for solving the VACUUM problem
Next
From: Philip Warner
Date:
Subject: Re: Plans for solving the VACUUM problem