Re: Synchronous replication, reading WAL for sending - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Synchronous replication, reading WAL for sending |
Date | |
Msg-id | 1230050922.4793.893.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Synchronous replication, reading WAL for sending (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Synchronous replication, reading WAL for sending
|
List | pgsql-hackers |
On Tue, 2008-12-23 at 17:42 +0200, Heikki Linnakangas wrote: > As the patch stands, whenever XLOG segment is switched in XLogInsert, we > wait for the segment to be sent to the standby server. That's not good. > Particularly in asynchronous mode, you'd expect the standby to not have > any significant ill effect on the master. But in case of a flaky network > connection, or a busy or dead standby, it can take a long time for the > standby to respond, or the primary to give up. During that time, all WAL > insertions on the primary are blocked. (How long is the default TCP > timeout again?) Ugh, didn't see that. Get rid of that. We managed to get rid of the fsync of the control file when we changed WAL file at start of 8.3. That had a major effect on performance, via reduced response time profiles. No need to re-introduce a delay in the same place. > Another point is that in the future, we really shouldn't require setting > up archiving and file-based log shipping using external scripts, when > all you want is replication. It should be enough to restore a base > backup on the standby, and point it to the IP address of the primary, > and have it catch up. This is very important, IMHO. It's quite a lot of > work to set up archiving and log-file shipping, for no obvious reason. > It's really only needed at the moment because we're building this > feature from spare parts. Happy for that to be hidden more from users. > For those reasons, we need a way to send arbitrary ranges of WAL from > primary to standby. The current method where the WAL is read from > wal_buffers obviously only works for very recent WAL pages that are > still in wal_buffers. The design should be changed so that instead of > reading from wal_buffers, the WAL is read from filesystem. There are two basic ways: from memory and from files. Sure we can hide the two mechanisms in code better, but they will remain fairly distinct. > Sending directly from wal_buffers can be provided as a fastpath when > sending recent enough WAL range, but I wouldn't bother complicating the > code for now. Sounds like you are saying completely replace the write-from-buffers and replace it with write-from-file? Sending from wal_buffers is OK if wal_buffers is large enough. If streaming replication falls so far behind that we have problems then there are larger issues to worry about, like is the primary being driven too hard for the network to cope. Copying direct from memory means that a disk problem that occurs on the primary will never cause corruption on the standby. Reading WAL files can mean that corruptions get propagated. The current design allows for file based WAL sending, if the connection is so poor that streaming won't work. If you are seriously suggesting these things now then I'd like to see some diagrams, designs and descriptions so we can all understand what is being suggested, how it will cope with all the current requirements. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: