Re: Group commit, revised - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Group commit, revised |
Date | |
Msg-id | CA+U5nM+Yj7scbELbftqPi=Zn1Q6SDM+PDgM0npkiRRrc_tS-xg@mail.gmail.com Whole thread Raw |
In response to | Re: Group commit, revised (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Group commit, revised
|
List | pgsql-hackers |
On Mon, Jan 30, 2012 at 8:04 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > So, what's the approach you're working on? I've had a few days leave at end of last week, so no time to fully discuss the next steps with the patch. That's why you were requested not to commit anything. You've suggested there was no reason to have the WALwriter be involved, which isn't the case and made other comments about latches that weren't correct also. The plan here is to allow WAL flush and clog updates to occur concurrently. Which allows the clog contention and update time to be completely hidden behind the wait for the WAL flush. That is only possible if we have the WALwriter involved since we need two processes to be actively involved. It's a relatively minor change and uses code that is already committed and working, not some just invented low level stuff that might not work right. You might then ask, why the delay? Just simply because my absence has prevented moving forwards. We'll have a patch tomorrow. The theory behind this is clear, but needs some explanation. There are 5 actions that need to occur at commit 1) insert WAL record 2) optionally flush WAL record 3) mark the clog AND set LSN from (1) if we skipped (2) 4) optionally wait for sync rep 5) remove the proc from the procarray Dependencies between those actions are these Step (3) must always happen before (5) otherwise we get race conditions in visibility checking. Step (4) must always happen before (5) otherwise we also get race conditions in disaster cases. Step (1) must always happen before (2) if it happens Step (1) must always happen before (3) if we skipped (2) Notice that step (2) and step (3) are actually independent of each other. So an improved design for commit is to 2) request flush up to LSN, but don't wait 3) mark the clog and set LSN 4) wait for LSN once, either for walwriter or walsender to release us This is free of race conditions as long as step (3) marks each clog page with a valid LSN, just as we would do for asynchronous commit. Marking the clog with an LSN ensures that we issue XLogFlush(LSN) on the clog page before it is written, so we always get WAL flushed to the desired LSN before the clog mark appears on disk. Does this cause any other behaviour? No, because the LSN marked on the clog is always flushed by the time we hit step (5), so there is no delay in any hint bit setting, or any other effect. So step (2) requests the flush, which is then performed by WALwriter. Backend then performs (3) while the flush takes place and then waits at step (4) to be woken We only wait once in step 4, rather than waiting for flush at step (2) and then waiting again at step (4). So we use the existing code path for TransactionIdAsyncCommitTree() yet we wait at step (4) using the SyncRep code. Step 5 happens last, as always. There are two benefits to this approach * The clog update happens "for free" since it is hidden behind a longer running task * The implementation uses already tested and robust code for SyncRep and AsyncCommit -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: