Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Sync Rep: First Thoughts on Code |
Date | |
Msg-id | 3f0b79eb0812110237m55721809k2e4dd872a55899fb@mail.gmail.com Whole thread Raw |
In response to | Re: Sync Rep: First Thoughts on Code (Simon Riggs <simon@2ndQuadrant.com>) |
List | pgsql-hackers |
Hi, On Thu, Dec 11, 2008 at 7:09 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Thu, 2008-12-11 at 11:29 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >> > On Thu, 2008-12-11 at 09:44 +0200, Heikki Linnakangas wrote: >> >> Simon Riggs wrote: >> >>> When the WAL starts streaming the *primary* can immediately perform >> >>> synchronous replication, i.e. commit waits for transfer. >> >> Until the standby has obtained all the missing log files, it's not >> >> up-to-date, and there's no guarantee that it can finish the replay. For >> >> example, imagine that your archive_command is an scp from the primary to >> >> the standby. If a lightning strikes the primary before some WAL file has >> >> been copied over to the archive directory in the standby, the standby >> >> can't catch up. In the primary then, what's the point for a commit to >> >> wait for transfer, if the reply from the standby doesn't guarantee that >> >> the transaction is safe in the standby? >> > >> > The WAL files will have already left the primary. >> > >> > Timeline is this in my understanding >> > 1 [Primary] Set up continuous archiving >> > 2 [Primary] Take base backup >> > 3 [Standby] Connect to primary to initiate streaming >> > 4 [Primary] Log switch and, optionally, turn off archiving >> > 5 [Standby] Begin replaying files, initially from archive >> > 6 [Standby] Switch to replaying WAL records immediately after streaming >> > >> > So sync rep would turn on after step 4, so that all intermediate WAL >> > files have been sent to the archive. If we lose the Primary after this >> > point then all transactions are accessible to standby. If we lose the >> > Standby or Archive, then we need to replace them and re-run the above. >> >> Between steps 4 and 5, there's no guarantee that all WAL files generated >> after step 3 and the start of streaming have already been archived. >> There's a delay between writing a WAL file and when the file has been >> safely archived. If you lose the primary during that window, the standby >> will have old WAL files in the archive, the most recent ones in received >> by walreceiver, but it's missing the WAL files generated just before the >> switch to streaming mode. Yes, since such standby is unsafe, the user must not promote it to the primary. Then, the user has to stop the standby (don't complete recovery), restart the primary and restart the standby. > > I was presuming that the synchronisation was clear, but I'm sorry it > wasn't. Sync rep would begin only *after* the last WAL file was > archived. Agreed. In order for the user to confirm whether replication began or not, we might need to log the name of the switched WAL file. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: