Re: New FSM patch - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: New FSM patch |
Date | |
Msg-id | 48D11497.8050303@enterprisedb.com Whole thread Raw |
In response to | Re: New FSM patch (Zdenek Kotala <Zdenek.Kotala@Sun.COM>) |
Responses |
Re: New FSM patch
Re: New FSM patch |
List | pgsql-hackers |
Zdenek Kotala wrote: > I tested it with DTrace on Solaris 10 and 8CPUs SPARC machine. I got > similar result as you. Main problem in your new implementation is > locking. On small tables where FSM fits on one page clients spend about > 3/4 time to waiting on page lock. That in itself is not that surprising. The test case does nothing else than exercise the FSM, so it's pretty obvious that all the time will be spent in the FSM. And they will be serialized on the lock on the single FSM page, like they are currently serialized on the FreeSpaceMapLock. I think it's pretty unlikely we'd run into FSM congestion on a small table in real life. If you're (non-HOT) updating a table at such a high rate, it's not going to stay small for long. > On medium tables (2level FSM) then > InsertWal lock become significant - it takes 1/4 of waiting time. Page > waiting takes "only" 1/3. That's a more interesting case from scalability point of view. For the case of a medium size table, we could implement the optimization of a "fast-root", similar to B-tree, to speed up the searches. If the heap is small enough that not all FSM levels are needed, we could simply start searches from the lower levels. That should reduce the page locking significantly. However, when searching, the upper levels are locked in shared, not exclusive, mode, so it's not clear if that would help with that 1/3 that's now spent in page waiting. > Suggestions: > > 1) remove WAL logging. I think that FSM record should be recovered > during processing of others WAL records (like insert, update). Probably > only we need full page write on first modification after checkpoint. Hmm, we don't have the infrastructure to do that, but I guess it's doable. In case of a crash, the FSM information after recovery wouldn't obviously be up-to-date. And the FSM information in a warm standby would also lag behind. One option would be to put RecordPageWithFreeSpace() calls into heap_redo, so that the FSM would be updated based on the WAL records we write anyway. I think we'd still need to WAL log operations that decrease the amount of free space on page. Otherwise, after we have partial vacuum, we might never revisit a page, and update the FSM, even though there's usable space on the page, leaving the space forgotten forever. And the torn page problem would need to be handled somehow. > 2) break lock - use only share lock for page locking and divide page for > smaller part for exclusive locking (at least for root page) That sounds difficult. But I don't think it's very likely to have congestion on a single FSM page in real life. Unless the root page becomes a bottleneck. Can you figure out how much of the lock waits come from the upper level and root pages? > However, your test case is too artificial. > I'm going to run OLTP > workload and test it with "real" workload. Agreed. It should be possible to emulate the pattern the FSM really sees: - tables become fuller and fuller, until no more tuples fill. Vacuum sweeps through them periodically, adding a lot of free space to each page. - RecordPageWithFreeSpace is only called when a tuple being inserted doesn't fit on the previous page the backend inserted to. However, even if these artificial tests show that the new implementation is X times slower than the old one, the question that they can't answer is whether it matters or not. If only say 0.01% of the time was spent in FSM, a 10x slowdown would be acceptable. A full-blown OLTP benchmark should give some clue on that. Other important test cases are various bulk insert kind of tests. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: