Thread: Clean shutdown and warm standby
Hi, Following the discussion here http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de , I wrote a small patch which rotates the last XLog file on shutdown so that the archive command is also executed for this file and we are sure we have all the useful XLog files when we perform a clean shutdown of master + switch to the failover server. This rotation is done only if the archive mode is active and an archive command is set. It's currently really difficult to switch easily (ie without copying the file manually) to the failover server without any data loss. Is there any problem I've missed? Could we consider the inclusion of such a patch or something similar? Comments? Regards, -- Guillaume
Attachment
Hi, On Thu, Apr 9, 2009 at 4:11 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > Hi, > > Following the discussion here > http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de > , I wrote a small patch which rotates the last XLog file on shutdown > so that the archive command is also executed for this file and we are > sure we have all the useful XLog files when we perform a clean > shutdown of master + switch to the failover server. This rotation is > done only if the archive mode is active and an archive command is set. > > It's currently really difficult to switch easily (ie without copying > the file manually) to the failover server without any data loss. > > Is there any problem I've missed? RequestXLogSwitch() doesn't wait until the switched WAL file has actually been archived. So, some WAL files still may not exist in the standby server also after clean shutdown of the primary. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > RequestXLogSwitch() doesn't wait until the switched WAL file has > actually been archived. So, some WAL files still may not exist in > the standby server also after clean shutdown of the primary. Thanks for your comment. RequestXLogSwitch() doesn't wait for archiving but the shutdown process takes care of it AFAICS. As far as I understand the shutdown code, we have the following sequence (I just explain here the steps involved in the XLog and archiver shutdown): - postmaster.c line 2693: PM_WAIT_BACKENDS state: we start the bgwriter and shut it down. It calls ShutdownXLog which creates the shutdown checkpoint and, with my patch, switch to a new XLog file. Then we are in PM_SHUTDOWN state. - postmaster.c line 2244: the reaper is called for the bgwriter child just shutdown and wakens the archiver one last time: the archive command is executed for our last XLog file. Did I miss something? -- Guillaume
Hi, On Thu, Apr 9, 2009 at 6:36 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> RequestXLogSwitch() doesn't wait until the switched WAL file has >> actually been archived. So, some WAL files still may not exist in >> the standby server also after clean shutdown of the primary. > > Thanks for your comment. > > RequestXLogSwitch() doesn't wait for archiving but the shutdown > process takes care of it AFAICS. Oh, you are right. I completely forgot about it :( Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Wed, Apr 8, 2009 at 9:11 PM, I wrote: > Following the discussion here > http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de > , I wrote a small patch which rotates the last XLog file on shutdown > [snip] Any comment or advice on how I can fix it with a different method if this one is considered wrong? Original message and patch here: http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com Thanks. -- Guillaume
Guillaume Smet wrote: > On Wed, Apr 8, 2009 at 9:11 PM, I wrote: >> Following the discussion here >> http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de >> , I wrote a small patch which rotates the last XLog file on shutdown >> [snip] > > Any comment or advice on how I can fix it with a different method if > this one is considered wrong? > > Original message and patch here: > http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com Sorry for the delay. It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() does. After restart, the system will start inserting WAL from the checkpoint redo point, which is just before the XLOG_SWITCH record, and will overwrite it. It would be nice to have all WAL archived at shutdown, but this is not the way to do it :-(. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() > does. After restart, the system will start inserting WAL from the checkpoint > redo point, which is just before the XLOG_SWITCH record, and will overwrite > it. Since, in this case, the WAL file including XLOG_SWITCH exists in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would be treated as the last applied record and not be overwritten. WAL records would start to be inserted from the subsequent file (with new timeline). This is useful for warm-standby, but I'm afraid that this may delay Shared Disk Failover which doesn't need to wait until all the WAL files are archived at shutdown. Is there any solution to this problem? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() >> does. After restart, the system will start inserting WAL from the checkpoint >> redo point, which is just before the XLOG_SWITCH record, and will overwrite >> it. > > Since, in this case, the WAL file including XLOG_SWITCH exists > in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would > be treated as the last applied record and not be overwritten. WAL > records would start to be inserted from the subsequent file (with > new timeline). It will be overwritten in a normal non-archive-recovery startup. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() >>> does. After restart, the system will start inserting WAL from the >>> checkpoint >>> redo point, which is just before the XLOG_SWITCH record, and will >>> overwrite >>> it. >> >> Since, in this case, the WAL file including XLOG_SWITCH exists >> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would >> be treated as the last applied record and not be overwritten. WAL >> records would start to be inserted from the subsequent file (with >> new timeline). > > It will be overwritten in a normal non-archive-recovery startup. Hmm, you mean the case where the system crashes after inserting XLOG_SWITCH and before archiving the WAL file containing it? Okey, though it seems unlikely, XLOG_SWITCH would be overwritten by subsequent non-archive-recovery. I just have an idea; when the last applied WAL file has archive_status, the system starts WAL insertion from the next file after recovery. But, there is still race condition that the system may crash after XLOG_SWITCH is written (fsynced) and before .ready file is created. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > Hi, > > On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() >>>> does. After restart, the system will start inserting WAL from the >>>> checkpoint >>>> redo point, which is just before the XLOG_SWITCH record, and will >>>> overwrite >>>> it. >>> Since, in this case, the WAL file including XLOG_SWITCH exists >>> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would >>> be treated as the last applied record and not be overwritten. WAL >>> records would start to be inserted from the subsequent file (with >>> new timeline). >> It will be overwritten in a normal non-archive-recovery startup. > > Hmm, you mean the case where the system crashes after > inserting XLOG_SWITCH and before archiving the WAL file > containing it? No, no crash is involved. Just a normal server shutdown and start: 1. Server shutdown is initiated 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is also 1234. 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint record. 4. The last round of archiving is done. The partial WAL file containing the checkpoint and XLOG_SWITCH record is archived. 5. Postmaster exits. 6. Postmaster is started again. Since the system was shut down cleanly, no WAL recovery is done. The WAL insert pointer is initialized to right after the redo pointer, location 1235, which is also the location of the XLOG_SWITCH record. 7. The next WAL record written will be written at 1235, overwriting the XLOG_SWITCH record. 8. When the WAL file fills up, the system will try to archive the same WAL file again, this time with additional WAL records that after the checkpoint record. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Mon, Apr 27, 2009 at 10:02 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> >> Hi, >> >> On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> Fujii Masao wrote: >>>> >>>> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas >>>> <heikki.linnakangas@enterprisedb.com> wrote: >>>>> >>>>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() >>>>> does. After restart, the system will start inserting WAL from the >>>>> checkpoint >>>>> redo point, which is just before the XLOG_SWITCH record, and will >>>>> overwrite >>>>> it. >>>> >>>> Since, in this case, the WAL file including XLOG_SWITCH exists >>>> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would >>>> be treated as the last applied record and not be overwritten. WAL >>>> records would start to be inserted from the subsequent file (with >>>> new timeline). >>> >>> It will be overwritten in a normal non-archive-recovery startup. >> >> Hmm, you mean the case where the system crashes after >> inserting XLOG_SWITCH and before archiving the WAL file >> containing it? > > No, no crash is involved. Just a normal server shutdown and start: > > 1. Server shutdown is initiated > 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is also > 1234. > 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint > record. > 4. The last round of archiving is done. The partial WAL file containing the > checkpoint and XLOG_SWITCH record is archived. > 5. Postmaster exits. > > 6. Postmaster is started again. Since the system was shut down cleanly, no > WAL recovery is done. The WAL insert pointer is initialized to right after > the redo pointer, location 1235, which is also the location of the > XLOG_SWITCH record. > 7. The next WAL record written will be written at 1235, overwriting the > XLOG_SWITCH record. > 8. When the WAL file fills up, the system will try to archive the same WAL > file again, this time with additional WAL records that after the checkpoint > record. Oh, you are right. I've missed that case :( Thanks for the detailed description! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Heikki Linnakangas wrote: > > No, no crash is involved. Just a normal server shutdown and start: > > 1. Server shutdown is initiated > 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is > also 1234. > 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint > record. > 4. The last round of archiving is done. The partial WAL file > containing the checkpoint and XLOG_SWITCH record is archived. > 5. Postmaster exits. > > 6. Postmaster is started again. Since the system was shut down > cleanly, no WAL recovery is done. The WAL insert pointer is > initialized to right after the redo pointer, location 1235, which is > also the location of the XLOG_SWITCH record. > 7. The next WAL record written will be written at 1235, overwriting > the XLOG_SWITCH record. > 8. When the WAL file fills up, the system will try to archive the same > WAL file again, this time with additional WAL records that after the > checkpoint record. So to get this down to a solution, it appears to be correct to execute the RequestXLogSwitch right before CreateCheckPoint? Regards, Andreas
Andreas Pflug wrote: > Heikki Linnakangas wrote: >> >> No, no crash is involved. Just a normal server shutdown and start: >> >> 1. Server shutdown is initiated >> 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is >> also 1234. >> 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint >> record. >> 4. The last round of archiving is done. The partial WAL file >> containing the checkpoint and XLOG_SWITCH record is archived. >> 5. Postmaster exits. >> >> 6. Postmaster is started again. Since the system was shut down >> cleanly, no WAL recovery is done. The WAL insert pointer is >> initialized to right after the redo pointer, location 1235, which is >> also the location of the XLOG_SWITCH record. >> 7. The next WAL record written will be written at 1235, overwriting >> the XLOG_SWITCH record. >> 8. When the WAL file fills up, the system will try to archive the same >> WAL file again, this time with additional WAL records that after the >> checkpoint record. > > So to get this down to a solution, it appears to be correct to execute > the RequestXLogSwitch right before CreateCheckPoint? Hmm, then the checkpoint record isn't archived. That might be acceptable, though, since all data would be safe in the preceding WAL. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Andreas Pflug wrote: >> So to get this down to a solution, it appears to be correct to execute >> the RequestXLogSwitch right before CreateCheckPoint? > Hmm, then the checkpoint record isn't archived. That might be > acceptable, though, since all data would be safe in the preceding WAL. Not at all, because the database would be very unhappy at restart if it can't find the checkpoint record pg_control is pointing to. regards, tom lane
Tom Lane wrote: > Not at all, because the database would be very unhappy at restart > if it can't find the checkpoint record pg_control is pointing to. > So for several weeks now all postings just say how it will _not_ work. Does this boil down to "There's no way to make sure that a graceful failover won't lose data"? Regards, Andreas
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Andreas Pflug wrote: >>> So to get this down to a solution, it appears to be correct to execute >>> the RequestXLogSwitch right before CreateCheckPoint? > >> Hmm, then the checkpoint record isn't archived. That might be >> acceptable, though, since all data would be safe in the preceding WAL. > > Not at all, because the database would be very unhappy at restart > if it can't find the checkpoint record pg_control is pointing to. At a normal startup, the checkpoint record would be there as usual. And an archive recovery starts at the location indicated by the backup label. AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be equivalent to calling "pg_switch_xlog()" just before shutting down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > At a normal startup, the checkpoint record would be there as usual. And an > archive recovery starts at the location indicated by the backup label. > > AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be > equivalent to calling "pg_switch_xlog()" just before shutting down. That's what I had in mind when writing the patch but I didn't know the implications of this particular checkpoint. So moving the call before CreateCheckPoint is what I really intended now that I have in mind these implications and I don't why it would be a problem to miss this checkpoint in the logs archived. -- Guillaume
On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> At a normal startup, the checkpoint record would be there as usual. And an >> archive recovery starts at the location indicated by the backup label. >> >> AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be >> equivalent to calling "pg_switch_xlog()" just before shutting down. > > That's what I had in mind when writing the patch but I didn't know the > implications of this particular checkpoint. > > So moving the call before CreateCheckPoint is what I really intended > now that I have in mind these implications and I don't know why it would be > a problem to miss this checkpoint in the logs archived. What do we decide about this problem? Should we just call RequestXLogSwitch() before the creation of the shutdown checkpoint or do we need a more complex patch? If so can anybody explain the potential problem of this approach so we can figure how to fix it? Thanks. -- Guillaume
Guillaume Smet wrote: > On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet > <guillaume.smet@gmail.com> wrote: >> On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> At a normal startup, the checkpoint record would be there as usual. And an >>> archive recovery starts at the location indicated by the backup label. >>> >>> AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be >>> equivalent to calling "pg_switch_xlog()" just before shutting down. >> That's what I had in mind when writing the patch but I didn't know the >> implications of this particular checkpoint. >> >> So moving the call before CreateCheckPoint is what I really intended >> now that I have in mind these implications and I don't know why it would be >> a problem to miss this checkpoint in the logs archived. > > What do we decide about this problem? > > Should we just call RequestXLogSwitch() before the creation of the > shutdown checkpoint or do we need a more complex patch? If so can > anybody explain the potential problem of this approach so we can > figure how to fix it? I've committed a patch to do the RequstXLogSwitch() before shutdown checkpoint as discussed. It seems safe to me. (sorry for the delay, and thanks for the reminder) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote: > I've committed a patch to do the RequstXLogSwitch() before shutdown > checkpoint as discussed. It seems safe to me. (sorry for the delay, and > thanks for the reminder) Not sure if that is a fix that will work in all cases. There is a potential timing problem with when the archiver is shutdown: that may now be fixed in 8.4, see what you think. Also if archiving is currently stalled, then files will not be transferred, even if you switch xlogs. So this is at best a partial fix to the problem and the need for a manual check of file contents remains. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote: > >> I've committed a patch to do the RequstXLogSwitch() before shutdown >> checkpoint as discussed. It seems safe to me. (sorry for the delay, and >> thanks for the reminder) > > Not sure if that is a fix that will work in all cases. > > There is a potential timing problem with when the archiver is shutdown: > that may now be fixed in 8.4, see what you think. Can you elaborate? > Also if archiving is currently stalled, then files will not be > transferred, even if you switch xlogs. So this is at best a partial fix > to the problem and the need for a manual check of file contents > remains. Yep. Maybe we should print the filename of the last WAL segment to the log at shutdown, so that you can easily check that you have everything in the archive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-05-28 at 16:19 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote: > > > >> I've committed a patch to do the RequstXLogSwitch() before shutdown > >> checkpoint as discussed. It seems safe to me. (sorry for the delay, and > >> thanks for the reminder) > > > > Not sure if that is a fix that will work in all cases. > > > > There is a potential timing problem with when the archiver is shutdown: > > that may now be fixed in 8.4, see what you think. > > Can you elaborate? Is the archiver still alive and working after the log switch occurs? If the archiver is working, but has fallen behind at the point of shutdown, does the archiver operate for long enough to ensure we are archived up to the point of the log switch prior to checkpoint? > > Also if archiving is currently stalled, then files will not be > > transferred, even if you switch xlogs. So this is at best a partial fix > > to the problem and the need for a manual check of file contents > > remains. > > Yep. Maybe we should print the filename of the last WAL segment to the > log at shutdown, so that you can easily check that you have everything > in the archive. You still need a script to read that and synchronize file contents. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-05-28 at 16:19 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote: >>> >>>> I've committed a patch to do the RequstXLogSwitch() before shutdown >>>> checkpoint as discussed. It seems safe to me. (sorry for the delay, and >>>> thanks for the reminder) >>> Not sure if that is a fix that will work in all cases. >>> >>> There is a potential timing problem with when the archiver is shutdown: >>> that may now be fixed in 8.4, see what you think. >> Can you elaborate? > > Is the archiver still alive and working after the log switch occurs? Yes. > If the archiver is working, but has fallen behind at the point of > shutdown, does the archiver operate for long enough to ensure we are > archived up to the point of the log switch prior to checkpoint? Yes, it archives all pending WAL segments before exiting. Ok, we're good then I guess. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-05-28 at 16:52 +0300, Heikki Linnakangas wrote: > > If the archiver is working, but has fallen behind at the point of > > shutdown, does the archiver operate for long enough to ensure we are > > archived up to the point of the log switch prior to checkpoint? > > Yes, it archives all pending WAL segments before exiting. I don't think it does, please look again. > Ok, we're good then I guess. No, because as I said, if archive_command has been returning non-zero then the archive will be incomplete. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-05-28 at 16:52 +0300, Heikki Linnakangas wrote: > >>> If the archiver is working, but has fallen behind at the point of >>> shutdown, does the archiver operate for long enough to ensure we are >>> archived up to the point of the log switch prior to checkpoint? >> Yes, it archives all pending WAL segments before exiting. > > I don't think it does, please look again. Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready WAL segments have been archived (assuming no errors). >> Ok, we're good then I guess. > > No, because as I said, if archive_command has been returning non-zero > then the archive will be incomplete. Yes. You think that's wrong? How would you like it to behave, then? I don't think you want the shutdown to wait indefinitely until all files have been archived if there's an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-05-28 at 17:21 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > > > I don't think it does, please look again. > > Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready > WAL segments have been archived (assuming no errors). No, it doesn't now, though it did used to. line 440. > >> Ok, we're good then I guess. > > > > No, because as I said, if archive_command has been returning non-zero > > then the archive will be incomplete. > > Yes. You think that's wrong? How would you like it to behave, then? I > don't think you want the shutdown to wait indefinitely until all files > have been archived if there's an error. The complaint was that we needed to run a manual step to synchronise the pg_xlog directory on the standby. We still need to do that, even after the patch has been committed because 2 cases are not covered, so what is the point of the recent change? It isn't enough. It *might* be enough, most of the time, but you have no way of knowing that is the case and it is dangerous not to check. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-05-28 at 17:21 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> I don't think it does, please look again. >> Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready >> WAL segments have been archived (assuming no errors). > > No, it doesn't now, though it did used to. line 440. postmaster never sends SIGTERM to pgarch, and postmaster is still alive. >>>> Ok, we're good then I guess. >>> No, because as I said, if archive_command has been returning non-zero >>> then the archive will be incomplete. >> Yes. You think that's wrong? How would you like it to behave, then? I >> don't think you want the shutdown to wait indefinitely until all files >> have been archived if there's an error. > > The complaint was that we needed to run a manual step to synchronise the > pg_xlog directory on the standby. We still need to do that, even after > the patch has been committed because 2 cases are not covered, so what is > the point of the recent change? It isn't enough. It *might* be enough, > most of the time, but you have no way of knowing that is the case and it > is dangerous not to check. So you check. This solves Guillaume's immediate concern. If you have a suggestion for further improvements, I'm all ears. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: >>> >>> No, because as I said, if archive_command has been returning non-zero >>> then the archive will be incomplete. >>> >> Yes. You think that's wrong? How would you like it to behave, then? I >> don't think you want the shutdown to wait indefinitely until all files >> have been archived if there's an error. >> > > The complaint was that we needed to run a manual step to synchronise the > pg_xlog directory on the standby. We still need to do that, even after > the patch has been committed because 2 cases are not covered, so what is > the point of the recent change? It isn't enough. It *might* be enough, > most of the time, but you have no way of knowing that is the case and it > is dangerous not to check. > If archiving has stalled, it's not a clean shutdown anyway and I wouldn't expect the wal archive to be automatically complete. I'd still appreciate a warning that while the shutdown appeared regular, wal wasn't written completely. But the "corner case" of shutting down a smoothly running server, the wal archive archive should be complete as well. Regards, Andreas
On Thu, May 28, 2009 at 5:02 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > So you check. This solves Guillaume's immediate concern. If you have a > suggestion for further improvements, I'm all ears. Thanks for applying the patch. Yes, the problem is that before this change, even with a working replication and a clean shutdown, you still had to replicate the last WAL file by hand. Personnally, I have an eye on each postgresql log file when I switch from one server to another so I can see if anything is going wrong (that said, it could be a problem with more than 2 servers...). This patch just fixes this problem not the other concerns and corner cases we might have. If we want to go further, we need to agree on what we want exactly and which corner cases we want to cover but it's probably 8.5 material at this point. -- Guillaume
On Thu, 2009-05-28 at 17:16 +0200, Guillaume Smet wrote: > On Thu, May 28, 2009 at 5:02 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > So you check. This solves Guillaume's immediate concern. If you have a > > suggestion for further improvements, I'm all ears. > > Thanks for applying the patch. > > Yes, the problem is that before this change, even with a working > replication and a clean shutdown, you still had to replicate the last > WAL file by hand. Personnally, I have an eye on each postgresql log > file when I switch from one server to another so I can see if anything > is going wrong (that said, it could be a problem with more than 2 > servers...). > > This patch just fixes this problem not the other concerns and corner > cases we might have. If we want to go further, we need to agree on > what we want exactly and which corner cases we want to cover but it's > probably 8.5 material at this point. Your original post wanted to know "we are sure we have all the useful XLog files when we perform a clean shutdown of master". The patch does not solve the problem you stated. You may consider it useful, but a manual check or script execution must still happen. If you feel we have moved forwards, that's good, but since no part of the *safe* maintenance procedure has changed, I don't see that myself. Only the unsafe way of doing it got faster. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, May 28, 2009 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > If you feel we have moved forwards, that's good, but since no part of > the *safe* maintenance procedure has changed, I don't see that myself. > Only the unsafe way of doing it got faster. I disagree with you. The situation was: - you stop the master; - everything seems to be OK in the log files (archiving and so on); - it's broken anyway as you don't have the last log file; - you have to copy the last log file manually. - you can start the slave. It is now: - you stop the master; - if everything is OK in the log files, the last log file has been archived (and yes I check it manually too) and it's done. If not (and it's the exception, not the rule) I have to copy manually the missing WAL files; - you can start the slave. I think it's a step forward, maybe not sufficient for you but I prefer the situation now than before. It's safer because of the principle of least surprise: I'm pretty sure a lot of people didn't even think that the last WAL file was systematically missing. As Heikki stated it, if you have concrete proposals of how we can fix the other corner cases, we're all ears. Considering my current level of knowledge, that's all I can do by myself. IMHO, that's something that needs to be treated in the massive replication work planned for 8.5. -- Guillaume
On Thu, 2009-05-28 at 17:50 +0200, Guillaume Smet wrote: > I think it's a step forward, maybe not sufficient for you but I prefer > the situation now than before. It's safer because of the principle of > least surprise: I'm pretty sure a lot of people didn't even think that > the last WAL file was systematically missing. If I hadn't spoken out, I think you would have assumed you were safe and so would everybody else. Time is saved only if you perform the step manually - if time saving was your objective you should have been using a script in the first place. If you're using a script, carry on using it: nothing has changed, you still need to check. > As Heikki stated it, if you have concrete proposals of how we can fix > the other corner cases, we're all ears. Considering my current level > of knowledge, that's all I can do by myself. I'm not sure there is a solution even. Fixing a broken archive_command is not something PostgreSQL can achieve, by definition. It's good you submitted a patch, I have no problem there, BTW, but applying a patch during beta, should either fix the problem or not be applied at all. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Thu, May 28, 2009 at 6:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, 2009-05-28 at 17:50 +0200, Guillaume Smet wrote: > >> I think it's a step forward, maybe not sufficient for you but I prefer >> the situation now than before. It's safer because of the principle of >> least surprise: I'm pretty sure a lot of people didn't even think that >> the last WAL file was systematically missing. > > If I hadn't spoken out, I think you would have assumed you were safe and > so would everybody else. Time is saved only if you perform the step > manually - if time saving was your objective you should have been using > a script in the first place. If you're using a script, carry on using > it: nothing has changed, you still need to check. You might think that but I won't have. I will still monitor my log files carefully and check the last WAL file is received and treated on the slave as I currently do. I prefer checking it visually than using a script. At least, now, I have a chance to have it working without a manual intervention. > It's good you submitted a patch, I have no problem there, BTW, but > applying a patch during beta, should either fix the problem or not be > applied at all. Well, I don't think we'll agree on that. Anyway, have a nice day :). -- Guillaume
On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote: > postmaster never sends SIGTERM to pgarch, and postmaster is still alive. Then we have a regression, since we changed the code to make sure the archiver did shutdown even if there was a backlog. The reason is that if there is a long backlog at the time we restart we may cause a long outage while we wait for the archiver to shutdown, before postmaster restarts. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote: > >> postmaster never sends SIGTERM to pgarch, and postmaster is still alive. > > Then we have a regression, since we changed the code to make sure the > archiver did shutdown even if there was a backlog. The commit message of the commit that introduced the check for SIGTERM says: " Also, modify the archiver process to notice SIGTERM and refuse to issue any more archive commands if it gets it. The postmaster doesn't ever send it SIGTERM; we assume that any such signal came from init and is a notice of impending whole-system shutdown. In this situation it seems imprudent to try to start new archive commands --- if they aren't extremely quick they're likely to get SIGKILL'd by init. " -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-05-28 at 19:58 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote: > > > >> postmaster never sends SIGTERM to pgarch, and postmaster is still alive. > > > > Then we have a regression, since we changed the code to make sure the > > archiver did shutdown even if there was a backlog. > > The commit message of the commit that introduced the check for SIGTERM says: > > " > Also, modify the archiver process to notice SIGTERM and refuse to issue any > more archive commands if it gets it. The postmaster doesn't ever send it > SIGTERM; we assume that any such signal came from init and is a notice of > impending whole-system shutdown. In this situation it seems imprudent > to try > to start new archive commands --- if they aren't extremely quick they're > likely to get SIGKILL'd by init. > " Sounds great, but what does that mean exactly? The same commit message also says: "The new behavior is that the archiver is allowed to run unmolested until the bgwriter has exited; then it is sent SIGUSR2 to tell it to do a final archiving cycle and quit." Thus there is no guarantee that this is sufficient to have archived all the files you would like to archive. The patch does not provide a clean shutdown in all cases and since you don't know what state its in, you are still forced to take external action to be safe, exactly as you do already. If I didn't already say, I came up with exactly the same solution 2 years ago and then later explained it didn't work in all cases. I'm saying the same thing again here now. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes: > Thus there is no guarantee that this is sufficient to have archived all > the files you would like to archive. The patch does not provide a clean > shutdown in all cases and since you don't know what state its in, you > are still forced to take external action to be safe, exactly as you do > already. > If I didn't already say, I came up with exactly the same solution 2 > years ago and then later explained it didn't work in all cases. I'm > saying the same thing again here now. What's your point? Surely the applied patch is a *necessary* component of any attempt to try to ensure archiving is complete at shutdown. I agree that it doesn't cover every risk factor, and there are some risk factors that cannot be covered by Postgres itself. But isn't it a step in a desirable direction? regards, tom lane
On Thu, 2009-05-28 at 18:09 -0400, Tom Lane wrote: > What's your point? Surely the applied patch is a *necessary* component > of any attempt to try to ensure archiving is complete at shutdown. > I agree that it doesn't cover every risk factor, and there are some > risk factors that cannot be covered by Postgres itself. But isn't it > a step in a desirable direction? Well, in one way, yes. I certainly encourage Guillaume to submit more patches and for everybody to review them, as has been done. I turned up late to the party on this, I know. Regrettably, the patch doesn't remove the problem it was supposed to remove and I'm highlighting there is still risk of data loss. I suggest that we don't change any docs, and carefully word or even avoid any release note inclusion to avoid lulling people into stopping safety measures. The patch doesn't cause any problems though so we don't need to remove it either. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > Regrettably, the patch doesn't remove the problem it was supposed to > remove and I'm highlighting there is still risk of data loss. I feel that you're moving the goalposts. What exactly is the problem it was supposed to remove in your opinion? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, May 29, 2009 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Regrettably, the patch doesn't remove the problem it was supposed to > remove and I'm highlighting there is still risk of data loss. I suggest > that we don't change any docs, and carefully word or even avoid any > release note inclusion to avoid lulling people into stopping safety > measures. I think it's pretty clear that you and the OP are talking about two different problems. To quote Guillaume: "Yes, the problem is that before this change, even with a working replication and a clean shutdown, you still had to replicate the last WAL file by hand." I think that's a pretty legitimate complaint. You seem to that this wasn't worth fixing at this point in the development cycle, because it was always possible to write a script to copy that last WAL file by hand. That's a judgment call, of course, and you are entitled to your own opinion on the topic, but that doesn't mean that the complaint, as defined by the person complaining, hasn't been fixed. ...Robert
On Fri, 2009-05-29 at 21:46 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Regrettably, the patch doesn't remove the problem it was supposed to > > remove and I'm highlighting there is still risk of data loss. > > I feel that you're moving the goalposts. What exactly is the problem it > was supposed to remove in your opinion? I feel that you wish to argue this endlessly so that my point is lost and the threat of data loss that was left exposed is forgotten. I'm happy that I understand those threats and will advise my clients accordingly. I've tried to help the community, but in the end, there is a point where I stop trying to do so. Now, in fact. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Fri, 2009-05-29 at 14:54 -0400, Robert Haas wrote: > On Fri, May 29, 2009 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Regrettably, the patch doesn't remove the problem it was supposed to > > remove and I'm highlighting there is still risk of data loss. I suggest > > that we don't change any docs, and carefully word or even avoid any > > release note inclusion to avoid lulling people into stopping safety > > measures. > > I think it's pretty clear that you and the OP are talking about two > different problems. To quote Guillaume: > > "Yes, the problem is that before this change, even with a working > replication and a clean shutdown, you still had to replicate the last > WAL file by hand." > > I think that's a pretty legitimate complaint. It's valid complaint, yes, but only for people that do this manually, which is nobody I ever met, in *production*. (ymmv etc) > You seem to think that this wasn't worth fixing... And for them, it hasn't been completely fixed. That point was not made by patch author or committer, leaving the impression it was now completely safe, which, I truly regret to say, is not correct. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Fri, 2009-05-29 at 21:16 -0300, Euler Taveira de Oliveira wrote: > Simon Riggs escreveu: > > And for them, it hasn't been completely fixed. That point was not made > > by patch author or committer, leaving the impression it was now > > completely safe, which, I truly regret to say, is not correct. > > > Simon, could you point out what the patch does not do? If we can't fix it now, > at least we add it to TODO. I already did, on my first post on this thread. I won't repeat myself, so that we can avoid restarting the discussion; forgive me. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs escreveu: > And for them, it hasn't been completely fixed. That point was not made > by patch author or committer, leaving the impression it was now > completely safe, which, I truly regret to say, is not correct. > Simon, could you point out what the patch does not do? If we can't fix it now, at least we add it to TODO. -- Euler Taveira de Oliveira http://www.timbira.com/