Re: New FSM patch - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: New FSM patch
Date
Msg-id 48D11497.8050303@enterprisedb.com
Whole thread Raw
In response to Re: New FSM patch  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: New FSM patch  (Decibel! <decibel@decibel.org>)
Re: New FSM patch  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
Zdenek Kotala wrote:
> I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got 
> similar result as you. Main problem in your new implementation is 
> locking. On small tables where FSM fits on one page clients spend about 
> 3/4 time to waiting on page lock.

That in itself is not that surprising. The test case does nothing else 
than exercise the FSM, so it's pretty obvious that all the time will be 
spent in the FSM. And they will be serialized on the lock on the single 
FSM page, like they are currently serialized on the FreeSpaceMapLock.

I think it's pretty unlikely we'd run into FSM congestion on a small 
table in real life. If you're (non-HOT) updating a table at such a high 
rate, it's not going to stay small for long.

> On medium tables (2level FSM) then 
> InsertWal lock become significant - it takes 1/4 of waiting time. Page 
> waiting takes "only" 1/3.

That's a more interesting case from scalability point of view.

For the case of a medium size table, we could implement the optimization 
of a "fast-root", similar to B-tree, to speed up the searches. If the 
heap is small enough that not all FSM levels are needed, we could simply 
start searches from the lower levels. That should reduce the page 
locking significantly. However, when searching, the upper levels are 
locked in shared, not exclusive, mode, so it's not clear if that would 
help with that 1/3 that's now spent in page waiting.

> Suggestions:
> 
> 1) remove WAL logging. I think that FSM record should be recovered 
> during processing of others WAL records (like insert, update). Probably 
> only we need full page write on first modification after checkpoint.

Hmm, we don't have the infrastructure to do that, but I guess it's 
doable. In case of a crash, the FSM information after recovery wouldn't 
obviously be up-to-date. And the FSM information in a warm standby would 
also lag behind.

One option would be to put RecordPageWithFreeSpace() calls into 
heap_redo, so that the FSM would be updated based on the WAL records we 
write anyway.

I think we'd still need to WAL log operations that decrease the amount 
of free space on page. Otherwise, after we have partial vacuum, we might 
never revisit a page, and update the FSM, even though there's usable 
space on the page, leaving the space forgotten forever.

And the torn page problem would need to be handled somehow.

> 2) break lock - use only share lock for page locking and divide page for 
> smaller part for exclusive locking (at least for root page)

That sounds difficult. But I don't think it's very likely to have 
congestion on a single FSM page in real life. Unless the root page 
becomes a bottleneck. Can you figure out how much of the lock waits come 
from the upper level and root pages?

> However, your test case is too artificial.

> I'm going to run OLTP 
> workload and test it with "real" workload.

Agreed. It should be possible to emulate the pattern the FSM really sees:

- tables become fuller and fuller, until no more tuples fill. Vacuum 
sweeps through them periodically, adding a lot of free space to each page.
- RecordPageWithFreeSpace is only called when a tuple being inserted 
doesn't fit on the previous page the backend inserted to.

However, even if these artificial tests show that the new implementation 
is X times slower than the old one, the question that they can't answer 
is whether it matters or not. If only say 0.01% of the time was spent in 
FSM, a 10x slowdown would be acceptable. A full-blown OLTP benchmark 
should give some clue on that.

Other important test cases are various bulk insert kind of tests.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Chernow
Date:
Subject: Re: [PATCHES] libpq events patch (with sgml docs)
Next
From: Stephen Frost
Date:
Subject: Re: Column level privileges was:(Re: Extending grant insert on tables to sequences)