Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Plans for solving the VACUUM problem
Date
Msg-id 14139.990169129@sss.pgh.pa.us
Whole thread Raw
In response to Re: Plans for solving the VACUUM problem  (Philip Warner <pjw@rhyme.com.au>)
List pgsql-hackers
Philip Warner <pjw@rhyme.com.au> writes:
> At 19:05 17/05/01 -0400, Tom Lane wrote:
>> 1. Forget moving tuples from one page to another.

> Could this be done opportunistically, meaning it builds up a list of
> candidates to move (perhaps based on emptiness of page), then moves a
> subset of these in each pass?

Well, if we move tuples at all then we have a considerably different
animal: to move tuples across pages you must be a transaction so that
you can have an atomic commit for both pages, and that brings back the
issue of how long the transaction runs for and how large its WAL trail
will grow before it can be dropped.  Yeah, you could move a limited
number of tuples, commit, and start again ... but it's not so
lightweight anymore.

Perhaps we will eventually end up with three strengths of VACUUM:
the existing heavy-duty form, the "lazy" form that isn't transactional,
and an intermediate form that is willing to move tuples in simpler
cases (none of that tuple-chain-moving stuff please ;-)).  But I'm
not buying into producing the intermediate form in this go-round.
Let's build the lazy form first and get some experience with it
before we decide if we need yet another kind of VACUUM.

>> To do that, I propose a free space map (FSM) kept in shared memory, which
>> will tell backends which pages of a relation have free space.  Only if the
>> FSM shows no free space available will the relation be extended to insert
>> a new or updated tuple.

> I assume that now is not a good time to bring up memory-mapped files? ;-}

Don't see the relevance exactly ...

> Were you planning on just a free byte count, or something smaller? Dec/RDB
> uses a nast system of DBA-defined thresholds for each storage area: 4 bits,
> where 0=empty, and 1, 2 & 3 indicate above/below thresholds (3 is also
> considered 'full'). The thresholds are usually set based on average record
> sizes. In this day & age, I suspect a 1 byte percentage, or 2 byte count is
> OK unless space is really a premium.

I had toyed with two different representations of the FSM:

1. Bitmap: one bit per page in the relation, set if there's an
"interesting" amount of free space in the page (exact threshold ???).
DEC's approach seems to be a generalization of this.

2. Page list: page number and number of free bytes.  This is six bytes
per page represented; you could maybe compress it to 5 but I'm not sure
there's much point.

I went with #2 mainly because it adapts easily to being forced into a
limited amount of space (just forget the pages with least free space)
which is critical for a table to be kept in shared memory.  A bitmap
would be less forgiving.  #2 also gives you a better chance of going
to a page that actually has enough free space for your tuple, though
you'd still need to be prepared to find out that it no longer has
enough once you get to it.  (Whereupon, you go back to the FSM, fix
the outdated entry, and keep searching.)

> While on the subject of record keeping, it would be great if it was coded
> to collect statistics about it's own operation for Jan's stats package.

Seems like a good idea, but I've seen no details yet about that package...

> This seems to have the potential to create as many false FSM page entries
> as there are backends. Is it really that expensive to lock the FSM table
> entry, subtract a number, then unlock it?

Yes, if you are contending against N other backends to get that lock.
Remember the *whole* point of this design is to avoid locking as much
as possible.  Too many trips to the FSM could throw away the performance
advantage.

> Is it possible/worth adding a 'blocking notification' to the lock manager.
> Then VACUUM could choose to terminate/restart when someone wants to do a
> schema change. This is realy only relevant if the VACUUM will be prolonged.

Seems messier than it's worth ... the VACUUM might not be the only thing
holding off your schema update anyway, and regular transactions aren't
likely to pay any attention.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Re: Plans for solving the VACUUM problem
Next
From: Hannu Krosing
Date:
Subject: Re: Plans for solving the VACUUM problem