Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Switching timeline over streaming replication |
Date | |
Msg-id | 504F737E.1040103@iki.fi Whole thread Raw |
Responses |
Re: Switching timeline over streaming replication
|
List | pgsql-hackers |
I've been working on the often-requested feature to handle timeline changes over streaming replication. At the moment, if you kill the master and promote a standby server, and you have another standby server that you'd like to keep following the new master server, you need a WAL archive in addition to streaming replication to make it cross the timeline change. Streaming replication will just error out. Having a WAL archive is usually a good idea in complex replication scenarios anyway, but it would be good to not require it. Attached is a WIP patch for that. It needs cleanup, but works. Protocol changes ---------------- When we invented the COPY-both mode, we left out any description of how to get out of that mode, simply stating that both ends "may then send CopyData messages until the connection is terminated". The patch makes it possible to revert back to regular processing, by sending a CopyDone message, like in normal Copy-in or Copy-out mode. Either end can take the initiative and send CopyDone, and after doing that may not send any more CopyDone messages. When both ends have sent a CopyDone message, and received a CopyDone message from the other end, the connection is out of Copy-mode, and the server finishes the command with a CommandComplete message. Another way to think of it is that when the server sends a CopyDone message, the connection switches from copy-both to Copy-in mode. And if the client sends a CopyDone message first, the connection goes from Copy-both to Copy-out mode, until the server ends the streaming from its end. New replication command: TIMELINE_HISTORY ----------------------------------------- To switch recovery target timeline, a standby needs the timeline history file (e.g 00000002.history) of the new timeline. The patch adds a new command to the set of commands accepted by walsender, to transmit a given timeline history file from master to slave. Walsender changes to stream a particular timeline ------------------------------------------------- The walsender now keeps track of exactly which timeline it is currently streaming; it's not necessarily the latest one anymore. The START_REPLICATION command is extended with a TIMELINE option that the client can use to request streaming from a particular timeline. If the client asks for a timeline that's not the current, but is part of the history of the server, the walsender knows to read from the correct WAL file that contains that. Also, the walsender knows where the server's history branched off from that timeline, and will only stream WAL up to that point. When that point is reached, it ends the streaming (with a CopyDone message), and prepares to accept a new replication command. Typically, the walreceiver will then ask to start streaming from the next timeline. Walreceiver changes ------------------- Previously, when the timeline reported by the server didn't match the current timeline in the standby, walreceiver simply errored out. Now, it requests for any missing timeline history files using the new TIMELINE_HISTORY command, and then tries to start replication from the current standby's timeline, even if that's older than the master's. When the end of the old timeline is reached, walreceiver sets a state variable in shared memory to indicate that, pings the the startup process, and waits for the startup process for new orders. The startup process can set receiveStart and timeline in shared memory and ping the walreceiver again, to get the walreceiver to restart streaming from the new starting point [1]. Before the startup process does that, it will scan pg_xlog for new timeline history files if recovery_target_timeline='latest'. It will find any new histrory files the walreceiver stored there, and switch over to the latest timeline just as it does with a WAL archive. Some parts of this patch are just refactoring that probably make sense regardless of the new functionality. For example, I split off the timeline history file related functions to a new file, timeline.c. That's not very much code, but it's fairly isolated, and xlog.c is massive, so I feel that anything that we can move off from xlog.c is a good thing. I also moved off the two functions RestoreArchivedFile() and ExecuteRecoveryCommand(), to a separate file. Those are also not much code, but are fairly isolated. If no-one objects to those changes, and the general direction this work is going to, I'm going split off those refactorings to separate patches and commit them separately. I also made the timeline history file a bit more detailed: instead of recording just the WAL segment where the timeline was changed, it now records the exact XLogRecPtr. That was required for the walsender to know the switchpoint, without having to parse the XLOG records (it reads and parses the history file, instead) [1] Initially, I tried to do this by simply letting walreceiver die and have the startup process launch a new walreceiver process that would reconnect, but it turned out to be hard to rapidly disconnect and connect, because the postmaster, which forks the walreceiver process, does not always have the same idea of whether the walreceiver is active as the startup process does. It would eventually be ok, thanks to timeouts, but would require polling. But not having to disconnect seems nicer, anyway - Heikki
Attachment
pgsql-hackers by date: