Re: Plans for solving the VACUUM problem - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Plans for solving the VACUUM problem
Date
Msg-id 200105180227.f4I2Rpa13258@candle.pha.pa.us
Whole thread Raw
In response to Plans for solving the VACUUM problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Plans for solving the VACUUM problem  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> Free space map details
> ----------------------
> 
> I envision the FSM as a shared hash table keyed by table ID, with each
> entry containing a list of page numbers and free space in each such page.
> 
> The FSM is empty at system startup and is filled by lazy VACUUM as it
> processes each table.  Backends then decrement/remove page entries as they
> use free space.
> 
> Critical point: the FSM is only a hint and does not have to be perfectly
> accurate.  It can omit space that's actually available without harm, and
> if it claims there's more space available on a page than there actually
> is, we haven't lost much except a wasted ReadBuffer cycle.  This allows
> us to take shortcuts in maintaining it.  In particular, we can constrain
> the FSM to a prespecified size, which is critical for keeping it in shared
> memory.  We just discard entries (pages or whole relations) as necessary
> to keep it under budget.  Obviously, we'd not bother to make entries in
> the first place for pages with only a little free space.  Relation entries
> might be discarded on a least-recently-used basis.

The only question I have is about the Free Space Map.  It would seem
better to me if we could get this map closer to the table itself, rather
than having every table of every database mixed into the same shared
memory area.  I can just see random table access clearing out most of
the map cache and perhaps making it less useless.

It would be nice if we could store the map on the first page of the disk
table, or store it in a flat file per table.  I know both of these ideas
will not work, but I am just throwing it out to see if someone has a
better idea.  

I wonder if cache failures should be what drives the vacuum daemon to
vacuum a table?  Sort of like, "Hey, someone is asking for free pages
for that table.  Let's go find some!"  That may work really well. 
Another advantage of centralization is that we can record update/delete
counters per table, helping tell vacuum where to vacuum next.  Vacuum
roaming around looking for old tuples seems wasteful.

Also, I suppose if we have the map act as a shared table cache (fseek
info), it may override the disadvantage of having it all centralized.

I know I am throwing out the advantages and disadvantages of
centralization, but I thought I would give out the ideas.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: mlw
Date:
Subject: Re: Plans for solving the VACUUM problem
Next
From: "August Zajonc"
Date:
Subject: Re: Plans for solving the VACUUM problem