Re: Streaming Replication patch for CommitFest 2009-09 - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Streaming Replication patch for CommitFest 2009-09 |
Date | |
Msg-id | 3f0b79eb0909172250m71c942f8n820c94bc8a264176@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming Replication patch for CommitFest 2009-09 (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
List | pgsql-hackers |
Hi, On Thu, Sep 17, 2009 at 8:32 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Some random comments: Thanks for the comments. > I don't think we need the new PM_SHUTDOWN_3 postmaster state. We can > treat walsenders the same as the archive process, and kill and wait for > both of them to die in PM_SHUTDOWN_2 state. OK, I'll use PM_SHUTDOWN_2 for walsender instead of PM_SHUTDOWN_3. > I think there's something wrong with the napping in walsender. When I > perform px_xlog_switch(), it takes surprisingly long for it to trickle > to the standby. When I put a little proxy program in between the master > and slave that delays all messages from the slave to the master by one > second, it got worse, even though I would expect the master to still > keep sending WAL at full speed. I get logs like this: Probably this is because XLOG records following XLOG_SWITCH are sent to the standby, too. Though those records are obviously not used for recovery, they are sent because walsender doesn't know where XLOG_SWITCH is. The difficulty is that there might be many XLOG_SWITCHs in the XLOG files which are going to be sent by walsender. How should walsender get to know those location? One possible solution is to make walsender parse the XLOG files and search XLOG_SWITCH. But this is overkill, I think. I don't think that XLOG switch is often requested and is sensitive to response time in many cases. So it's not worth changing walsender to skip the XLOG following XLOG_SWITCH, I think. Thought? > 2009-09-17 14:14:09.932 EEST LOG: xlog send request 0/38000428; send > 0/38000000; write 0/38000000 > 2009-09-17 14:14:09.932 EEST LOG: xlog read request 0/38000428; send > 0/38000428; write 0/38000000 > > It looks like it's having 100 or 200 ms naps in between. Also, I > wouldn't expect to see so many "read request" acknowledgments from the > slave. The master doesn't really need to know how far the slave is, > except in synchronous replication when it has requested a flush to > slave. Another reason why master needs to know is so that the master can > recycle old log files, but for that we'd really only need an > acknowledgment once per WAL file or even less. You mean that the new protocol for asking the standby about the completion location of replication is required? In synchronous case, the backend should not wait for one acknowledgement per XLOG file, for its performance. > Why does XLogSend() care about page boundaries? Perhaps it's a leftover > from the old approach that read from wal_buffers? That is for not sending a partially-filled XLOG *record*, which simplifies the logic that startup process waits for the next XLOG record available, i.e., startup process doesn't need to take care of a partially-sent record. > Do we really need the support for asynchronous backend libpq commands? > Could walsender just keep blasting WAL to the slave, and only try to > read an acknowledgment after it has requested one, by setting > XLOGSTREAM_FLUSH flag. Or maybe we should be putting the socket into > non-blocking mode. Yes, that is required, especially for synchronous replication. The receiving of the acknowledgement should not keep the subsequent XLOG-sending waiting. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: