On Fri, 2005-10-28 at 08:31 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On Fri, 2005-10-28 at 13:21 +1300, Mark Kirkwood wrote:
> >
> > > regression=# SELECT c.relname, m.relblocknumber, m.blockfreebytes
> > > FROM pg_freespacemap m INNER JOIN pg_class c
> > > ON c.relfilenode = m.relfilenode LIMIT 10;
> >
> >
> > I like this, but not because I want to read it myself, but because I
> > want to make autovacuum responsible for re-allocating free space when it
> > runs out. This way we can have an autoFSM feature in 8.2
>
> What do you mean, re-allocating free space? I don't understand what you
> are proposing.
Moving to -hackers.
FSM currently focuses on reusing holes in a table. It does nothing to
help with the allocation of space for extending tables.
There are a few issues with current FSM implementation, IMHO, discussing
as usual the very highest end of performance:
1. Data Block Contention: If you have many free blocks in the FSM and
many concurrent UPDATE/INSERTers then each gets its own data block and
experiences little contention. Once the FSM is used up, each new block
is allocated by relation extension. At this point, all UPDATE/INSERTers
attempt to use the same block and contention increases as a result. ISTM
that if we were to re-fill the FSM with freshly allocated blocks then we
would be able to continue without data block contention. (We would still
have some index block contention, but that is a separate issue).
2. FSM Contention: As the FSM empties, it takes longer and longer to
find a free data block to insert into. When the FSM is empty, the search
time to discover that no free blocks are available is O(N), so the
freespace lock is held for longer the bigger you make the FSM. So
refilling the FSM automatically when it happens seems again like a
reasonable way to reduce contention. (Perhaps another way would be
simply to alter the search algorithm to make it O(1) when FSM empty,
which is simpler than it sounds.)
3. Helping Readahead efficiency: Currently blocks are allocated one at a
time. If many tables are extending at the same time, the blocks from
multiple tables will be intermixed together on the disk. Reading the
data back takes more head movement and reduces the I/O rate. Allocating
the blocks on disk in larger chunks would help to reduce that. Doing so
would require us to keep track of that, which is exactly what the FSM
already does for us. So automatically refilling the FSM seems like a
possible way of doing that since the FSM effectively tracks which
relations extend frequently and for whom larger allocations would be a
win. (Larger allocations in all cases would give very poor disk usage
that we might call fragmentation, if we can avoid debating that word)
There are other solutions to the above issues, so I really should have
started with the above as a problem statement rather than driving
straight to a partially thought through solution.
Do we agree those problems exist?
(I'm not intending to work on these issues myself anytime soon, so happy
for others to go for it.)
Best Regards, Simon Riggs