Streaming replication status - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Streaming replication status |
Date | |
Msg-id | 4B47A09C.8070704@enterprisedb.com Whole thread Raw |
Responses |
Re: Streaming replication status
Re: Streaming replication status Re: Streaming replication status Re: Streaming replication status |
List | pgsql-hackers |
I've gone through the patch in detail now. Here's my list of remaining issues: * If there's no WAL to send, walsender doesn't notice if the client has closed connection already. This is the issue Fujii reported already. We'll need to add a select() call to the walsender main loop to check if the socket has been closed. * I removed the feature that archiver was started during recovery. The idea of that was to enable archiving from a standby server, to relieve the master server of that duty, but I found it annoying because it causes trouble if the standby and master are configured to archive to the same location; they will fight over which copies the file to the archive first. Frankly the feature doesn't seem very useful as the patch stands, because you still have to configure archiving in the master in practice; you can't take an online base backup otherwise, and you have the risk of standby falling too much behind and having to restore from base backup whenever the standby is disconnected for any reason. Let's revisit this later when it's truly useful. * We still have a related issue, though: if standby is configured to archive to the same location as master (as it always is on my laptop, where I use the postgresql.conf of the master unmodified in the server), right after failover the standby server will try to archive all the old WAL files that were streamed from the master; but they exist already in the archive, as the master archived them already. I'm not sure if this is a pilot error, or if we should do something in the server to tell apart WAL segments streamed from master and those generated in the standby server after failover. Maybe we should immediately create a .done file for every file received from master? * I don't think we should require superuser rights for replication. Although you see all WAL and potentially all data in the system through that, a standby doesn't need any write access to the master, so it would be good practice to create a dedicated account with limited privileges for replication. * A standby that connects to master, initiates streaming, and then sits idle without stalls recycling of old WAL files in the master. That will eventually lead to a full disk in master. Do we need some kind of a emergency valve on that? * Do we really need REPLICATION_DEBUG_ENABLED? The output doesn't seem very useful to me. * Need to add comments somewhere to note that ReadRecord depends on the fact that a WAL record is always send as whole, never split across two messages. * Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE increments? * Walreceiver should flush less aggresively than after each received piece of WAL as noted by XXX comment. * Consider renaming PREPARE_REPLICATION to IDENTIFY_SYSTEM or something. * What's the change in bgwriter.c for? * ReadRecord/FetchRecord is a bit of mess. I earlier tried to refactor it into something simpler a couple of times, but failed. So I'm going to leave it as it is, but if someone else wants to give it a shot, that would be good. * Documentation. The patch used to move around some sections, but I think that has been partially reverted so that it now just duplicates them. It probably needs other work too, I haven't looked at the docs in any detail. These are all the issues I know of right now. Assuming no new issues crop up (which often does happen), the patch is ready for committing after those have been addressed. Attached is my latest version as a patch, also available in my git repository. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-hackers by date: