Re: Synchronization levels in SR - Mailing list pgsql-hackers
From | Boszormenyi Zoltan |
---|---|
Subject | Re: Synchronization levels in SR |
Date | |
Msg-id | 4C80F0BD.1080109@cybertec.at Whole thread Raw |
In response to | Re: Synchronization levels in SR (Fujii Masao <masao.fujii@gmail.com>) |
List | pgsql-hackers |
Fujii Masao írta: > On Fri, Sep 3, 2010 at 6:43 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > >> In my patch, when the transactions were waiting for ack from >> the standby, they have already released all their locks, the wait >> happened at the latest possible point in CommitTransaction(). >> >> In Fujii's patch (I am looking at synch_rep_0722.patch, is there >> a newer one?) >> > > No ;) > > We'll have to create the patch based on the result of the recent > discussion held on other thread. > > >> the wait happens in RecordTransactionCommit() >> so other transactions still see the sync transaction and most >> importantly, the locks held by the sync transaction will make >> the async transactions waiting for the same lock wait too. >> > > The transaction should be invisible to other transactions until > its replication has been completed. Invisible? How can it be invisible? You are in RecordTransactionCommit(), even before calling ProcArrayEndTransaction(MyProc, latestXid) and releasing the locks the transaction holds. > So I put the wait before > CommitTransaction() calls ProcArrayEndTransaction(). Is this unsafe? > I don't know whether it's unsafe. In my patch, I only registered the Xid at the point where you do WaitXLogSend(), this was the safe point to setup the waiting for sync ack. Otherwise, when the Xid registration for the sync ack was done in CommitTransaction() later than RecordTransactionCommit(), there was a race between the primary and the standby. The scenario was that the standby received and processed the COMMIT of certain Xids even before the backend on the primary properly registered its Xid, so the backend has set up the waiting for sync ack after this Xid was acked by the standby. The result was stuck backends. My idea to split up the registration for wait and the waiting itself would allow for transaction-level synchronous setting, i.e. in my patch the transaction released the locks and did all the post-commit cleanups *then* it waited for sync ack if needed. In the meantime, because locks were already released, other transactions could progress with their job, allowing that e.g. async transactions to progress and theoretically finish faster than the sync transaction that was waiting for the ack. The solution in my patch was not racy, registration of the Xid was done before XLogInsert() in RecordTransactionCommit(). If the standby acked the Xid to the primary before reaching the end of CommitTransaction() then this backend didn't even needed to wait because the Xid was found in its PGPROC structure and the waiting for sync ack was torn down. But with the LSNs, as you are waiting for XactLastRecEnd which is set by XLogInsert(). I don't know if it's safe to WaitXLogSend() after XLogFlush() in RecordTransactionCommit(). I remember that in previous instances of my patch even if I put the waiting for sync ack directly after latestXid = RecordTransactionCommit(); in CommitTransaction(), there were cases when I got stuck backends after a pgbench run. I had the primary and standbys on the same machine on different ports, so the ack was almost instant, which wouldn't be the case with a real network. But the race condition was still there it just doesn't show up with networks being slower than memory. In your patch, the waiting happens almost at the end of RecordTransactionCommit(), so theoretically it has the same race condition. Am I missing something? Best regards, Zoltán Böszörményi > Regards, > >
pgsql-hackers by date: