Re: Synchronous Log Shipping Replication - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Synchronous Log Shipping Replication |
Date | |
Msg-id | 3f0b79eb0809090412r1fa85a81i331c3db7ef833437@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronous Log Shipping Replication (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Synchronous Log Shipping Replication
Re: Synchronous Log Shipping Replication Re: Synchronous Log Shipping Replication Re: Synchronous Log Shipping Replication Re: Synchronous Log Shipping Replication |
List | pgsql-hackers |
On Tue, Sep 9, 2008 at 5:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for > the send operation. The Write and Send operations can then continue > independently of one another. XLogInsert() cannot advance to a new page > while we are waiting to send or write. Agreed. For realizing various synchronous options, the Write and Send operations should be treated separately. So, I'll introduce XLogCtlSend structure which is shared state data for WAL sending. XLogCtlInsert might need new field LogsndResult which indicates a byte position that we have already sended. As you say, AdvanceXLInsertBuffer() must check both position that we have already written (fsynced) and sended. I'm doing the detail design of this now :) > Notice that the Send process > might be the bottleneck - that is the price of synchronous replication. Really? In the benchmark result of my prototype, the bottleneck is still disk I/O. The communication (between the master and the slave) latency is smaller than WAL writing (fsyncing) one. Of course, I assume that we use not-poor network like 1000BASE-T. What makes the sender process bottleneck? > Backends then wait > * not at all for asynch commit > * just for Write for local synch commit > * for both Write and Send for remote synch commit > (various additional options for what happens to confirm Send) I'd like to introduce new parameter "synchronous_replication" which specifies whether backends waits for the response from WAL sender process. By combining synchronous_commit and synchronous_replication, users can choose various options. > After (or during) XLogInsert backends will sleep in a proc queue, > similar to LWlocks and protected by a spinlock. When preparing to > write/send the WAL process should read the proc at the *tail* of the > queue to see what the next LogwrtRqst should be. Then it performs its > action and wakes procs up starting with the head of the queue. We would > add LSN into PGPROC, so WAL processes can check whether the backend > should be woken. The LSN field can be accessed without spinlocks since > it is only ever set by the backend itself and only read while a backend > is sleeping. So we access spinlock, find tail, drop spinlock then read > LSN of the backend that (was) the tail. You mean only XLogInsert treating "commit record" or every XLogInsert? Anyway, ISTM that the response time get worse :( > Another thought occurs that we might measure the time a Send takes and > specify a limit on how long we are prepared to wait for confirmation. > Limit=0 => asynchronous. Limit > 0 implies synchronous-up-to-the-limit. > This would give better user behaviour across a highly variable network > connection. In the viewpoint of detection of a network failure, this feature is necessary. When the network goes down, WAL sender can be blocked until it detects the network failure, i.e. WAL sender keeps waiting for the response which never comes. A timeout notification is necessary in order to detect a network failure soon. regards -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: