Thread: Clean shutdown and warm standby

Clean shutdown and warm standby

From
Guillaume Smet
Date:
Hi,

Following the discussion here
http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
, I wrote a small patch which rotates the last XLog file on shutdown
so that the archive command is also executed for this file and we are
sure we have all the useful XLog files when we perform a clean
shutdown of master + switch to the failover server. This rotation is
done only if the archive mode is active and an archive command is set.

It's currently really difficult to switch easily (ie without copying
the file manually) to the failover server without any data loss.

Is there any problem I've missed? Could we consider the inclusion of
such a patch or something similar?

Comments?

Regards,

--
Guillaume

Attachment

Re: Clean shutdown and warm standby

From
Fujii Masao
Date:
Hi,

On Thu, Apr 9, 2009 at 4:11 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> Hi,
>
> Following the discussion here
> http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
> , I wrote a small patch which rotates the last XLog file on shutdown
> so that the archive command is also executed for this file and we are
> sure we have all the useful XLog files when we perform a clean
> shutdown of master + switch to the failover server. This rotation is
> done only if the archive mode is active and an archive command is set.
>
> It's currently really difficult to switch easily (ie without copying
> the file manually) to the failover server without any data loss.
>
> Is there any problem I've missed?

RequestXLogSwitch() doesn't wait until the switched WAL file has
actually been archived. So, some WAL files still may not exist in
the standby server also after clean shutdown of the primary.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> RequestXLogSwitch() doesn't wait until the switched WAL file has
> actually been archived. So, some WAL files still may not exist in
> the standby server also after clean shutdown of the primary.

Thanks for your comment.

RequestXLogSwitch() doesn't wait for archiving but the shutdown
process takes care of it AFAICS.

As far as I understand the shutdown code, we have the following
sequence (I just explain here the steps involved in the XLog and
archiver shutdown):
- postmaster.c line 2693: PM_WAIT_BACKENDS state: we start the
bgwriter and shut it down. It calls ShutdownXLog which creates the
shutdown checkpoint and, with my patch, switch to a new XLog file.
Then we are in PM_SHUTDOWN state.
- postmaster.c line 2244: the reaper is called for the bgwriter child
just shutdown and  wakens the archiver one last time: the archive
command is executed for our last XLog file.

Did I miss something?

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Fujii Masao
Date:
Hi,

On Thu, Apr 9, 2009 at 6:36 PM, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> On Thu, Apr 9, 2009 at 5:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> RequestXLogSwitch() doesn't wait until the switched WAL file has
>> actually been archived. So, some WAL files still may not exist in
>> the standby server also after clean shutdown of the primary.
>
> Thanks for your comment.
>
> RequestXLogSwitch() doesn't wait for archiving but the shutdown
> process takes care of it AFAICS.

Oh, you are right. I completely forgot about it :(

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
Hi,

On Wed, Apr 8, 2009 at 9:11 PM, I wrote:
> Following the discussion here
> http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
> , I wrote a small patch which rotates the last XLog file on shutdown
> [snip]

Any comment or advice on how I can fix it with a different method if
this one is considered wrong?

Original message and patch here:
http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com

Thanks.

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Guillaume Smet wrote:
> On Wed, Apr 8, 2009 at 9:11 PM, I wrote:
>> Following the discussion here
>> http://archives.postgresql.org/message-id/49D9E986.8010604@pse-consulting.de
>> , I wrote a small patch which rotates the last XLog file on shutdown
>> [snip]
> 
> Any comment or advice on how I can fix it with a different method if
> this one is considered wrong?
> 
> Original message and patch here:
> http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271ceb2@mail.gmail.com

Sorry for the delay.

It's not safe to write WAL after the checkpoint, as RequestXLogSwitch() 
does. After restart, the system will start inserting WAL from the 
checkpoint redo point, which is just before the XLOG_SWITCH record, and 
will overwrite it.

It would be nice to have all WAL archived at shutdown, but this is not 
the way to do it :-(.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Fujii Masao
Date:
Hi,

On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
> does. After restart, the system will start inserting WAL from the checkpoint
> redo point, which is just before the XLOG_SWITCH record, and will overwrite
> it.

Since, in this case, the WAL file including XLOG_SWITCH exists
in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
be treated as the last applied record and not be overwritten. WAL
records would start to be inserted from the subsequent file (with
new timeline).

This is useful for warm-standby, but I'm afraid that this may delay
Shared Disk Failover which doesn't need to wait until all the WAL
files are archived at shutdown. Is there any solution to this problem?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
>> does. After restart, the system will start inserting WAL from the checkpoint
>> redo point, which is just before the XLOG_SWITCH record, and will overwrite
>> it.
> 
> Since, in this case, the WAL file including XLOG_SWITCH exists
> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
> be treated as the last applied record and not be overwritten. WAL
> records would start to be inserted from the subsequent file (with
> new timeline).

It will be overwritten in a normal non-archive-recovery startup.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Fujii Masao
Date:
Hi,

On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>
>>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
>>> does. After restart, the system will start inserting WAL from the
>>> checkpoint
>>> redo point, which is just before the XLOG_SWITCH record, and will
>>> overwrite
>>> it.
>>
>> Since, in this case, the WAL file including XLOG_SWITCH exists
>> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
>> be treated as the last applied record and not be overwritten. WAL
>> records would start to be inserted from the subsequent file (with
>> new timeline).
>
> It will be overwritten in a normal non-archive-recovery startup.

Hmm, you mean the case where the system crashes after
inserting XLOG_SWITCH and before archiving the WAL file
containing it? Okey, though it seems unlikely, XLOG_SWITCH
would be overwritten by subsequent non-archive-recovery.

I just have an idea; when the last applied WAL file has
archive_status, the system starts WAL insertion from the next
file after recovery. But, there is still race condition that the
system may crash after XLOG_SWITCH is written (fsynced)
and before .ready file is created.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> Hi,
> 
> On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
>>>> does. After restart, the system will start inserting WAL from the
>>>> checkpoint
>>>> redo point, which is just before the XLOG_SWITCH record, and will
>>>> overwrite
>>>> it.
>>> Since, in this case, the WAL file including XLOG_SWITCH exists
>>> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
>>> be treated as the last applied record and not be overwritten. WAL
>>> records would start to be inserted from the subsequent file (with
>>> new timeline).
>> It will be overwritten in a normal non-archive-recovery startup.
> 
> Hmm, you mean the case where the system crashes after
> inserting XLOG_SWITCH and before archiving the WAL file
> containing it?

No, no crash is involved. Just a normal server shutdown and start:

1. Server shutdown is initiated
2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is 
also 1234.
3. A XLOG_SWITCH record is written at 1235, right after the checkpoint 
record.
4. The last round of archiving is done. The partial WAL file containing 
the checkpoint and XLOG_SWITCH record is archived.
5. Postmaster exits.

6. Postmaster is started again. Since the system was shut down cleanly, 
no WAL recovery is done. The WAL insert pointer is initialized to right 
after the redo pointer, location 1235, which is also the location of the 
XLOG_SWITCH record.
7. The next WAL record written will be written at 1235, overwriting the 
XLOG_SWITCH record.
8. When the WAL file fills up, the system will try to archive the same 
WAL file again, this time with additional WAL records that after the 
checkpoint record.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Fujii Masao
Date:
Hi,

On Mon, Apr 27, 2009 at 10:02 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Fujii Masao wrote:
>>
>> Hi,
>>
>> On Mon, Apr 27, 2009 at 8:43 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>
>>> Fujii Masao wrote:
>>>>
>>>> On Fri, Apr 24, 2009 at 3:20 PM, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>>>
>>>>> It's not safe to write WAL after the checkpoint, as RequestXLogSwitch()
>>>>> does. After restart, the system will start inserting WAL from the
>>>>> checkpoint
>>>>> redo point, which is just before the XLOG_SWITCH record, and will
>>>>> overwrite
>>>>> it.
>>>>
>>>> Since, in this case, the WAL file including XLOG_SWITCH exists
>>>> in archive, I don't think that it's unsafe, i.e. XLOG_SWITCH would
>>>> be treated as the last applied record and not be overwritten. WAL
>>>> records would start to be inserted from the subsequent file (with
>>>> new timeline).
>>>
>>> It will be overwritten in a normal non-archive-recovery startup.
>>
>> Hmm, you mean the case where the system crashes after
>> inserting XLOG_SWITCH and before archiving the WAL file
>> containing it?
>
> No, no crash is involved. Just a normal server shutdown and start:
>
> 1. Server shutdown is initiated
> 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is also
> 1234.
> 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint
> record.
> 4. The last round of archiving is done. The partial WAL file containing the
> checkpoint and XLOG_SWITCH record is archived.
> 5. Postmaster exits.
>
> 6. Postmaster is started again. Since the system was shut down cleanly, no
> WAL recovery is done. The WAL insert pointer is initialized to right after
> the redo pointer, location 1235, which is also the location of the
> XLOG_SWITCH record.
> 7. The next WAL record written will be written at 1235, overwriting the
> XLOG_SWITCH record.
> 8. When the WAL file fills up, the system will try to archive the same WAL
> file again, this time with additional WAL records that after the checkpoint
> record.

Oh, you are right. I've missed that case :(

Thanks for the detailed description!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Clean shutdown and warm standby

From
Andreas Pflug
Date:
Heikki Linnakangas wrote:
>
> No, no crash is involved. Just a normal server shutdown and start:
>
> 1. Server shutdown is initiated
> 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is 
> also 1234.
> 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint 
> record.
> 4. The last round of archiving is done. The partial WAL file 
> containing the checkpoint and XLOG_SWITCH record is archived.
> 5. Postmaster exits.
>
> 6. Postmaster is started again. Since the system was shut down 
> cleanly, no WAL recovery is done. The WAL insert pointer is 
> initialized to right after the redo pointer, location 1235, which is 
> also the location of the XLOG_SWITCH record.
> 7. The next WAL record written will be written at 1235, overwriting 
> the XLOG_SWITCH record.
> 8. When the WAL file fills up, the system will try to archive the same 
> WAL file again, this time with additional WAL records that after the 
> checkpoint record. 

So to get this down to a solution, it appears to be correct to execute 
the RequestXLogSwitch right before CreateCheckPoint?


Regards,
Andreas



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Andreas Pflug wrote:
> Heikki Linnakangas wrote:
>>
>> No, no crash is involved. Just a normal server shutdown and start:
>>
>> 1. Server shutdown is initiated
>> 2. A shutdown checkpoint is recorded at XLOG point 1234, redo ptr is 
>> also 1234.
>> 3. A XLOG_SWITCH record is written at 1235, right after the checkpoint 
>> record.
>> 4. The last round of archiving is done. The partial WAL file 
>> containing the checkpoint and XLOG_SWITCH record is archived.
>> 5. Postmaster exits.
>>
>> 6. Postmaster is started again. Since the system was shut down 
>> cleanly, no WAL recovery is done. The WAL insert pointer is 
>> initialized to right after the redo pointer, location 1235, which is 
>> also the location of the XLOG_SWITCH record.
>> 7. The next WAL record written will be written at 1235, overwriting 
>> the XLOG_SWITCH record.
>> 8. When the WAL file fills up, the system will try to archive the same 
>> WAL file again, this time with additional WAL records that after the 
>> checkpoint record. 
> 
> So to get this down to a solution, it appears to be correct to execute 
> the RequestXLogSwitch right before CreateCheckPoint?

Hmm, then the checkpoint record isn't archived. That might be 
acceptable, though, since all data would be safe in the preceding WAL.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Andreas Pflug wrote:
>> So to get this down to a solution, it appears to be correct to execute 
>> the RequestXLogSwitch right before CreateCheckPoint?

> Hmm, then the checkpoint record isn't archived. That might be 
> acceptable, though, since all data would be safe in the preceding WAL.

Not at all, because the database would be very unhappy at restart
if it can't find the checkpoint record pg_control is pointing to.
        regards, tom lane


Re: Clean shutdown and warm standby

From
Andreas Pflug
Date:
Tom Lane wrote:
> Not at all, because the database would be very unhappy at restart
> if it can't find the checkpoint record pg_control is pointing to.
>   

So for several weeks now all postings just say how it will _not_ work.
Does this boil down to "There's no way to make sure that a graceful
failover won't lose data"?

Regards,
Andreas


Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Andreas Pflug wrote:
>>> So to get this down to a solution, it appears to be correct to execute 
>>> the RequestXLogSwitch right before CreateCheckPoint?
> 
>> Hmm, then the checkpoint record isn't archived. That might be 
>> acceptable, though, since all data would be safe in the preceding WAL.
> 
> Not at all, because the database would be very unhappy at restart
> if it can't find the checkpoint record pg_control is pointing to.

At a normal startup, the checkpoint record would be there as usual. And 
an archive recovery starts at the location indicated by the backup label.

AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be 
equivalent to calling "pg_switch_xlog()" just before shutting down.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> At a normal startup, the checkpoint record would be there as usual. And an
> archive recovery starts at the location indicated by the backup label.
>
> AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
> equivalent to calling "pg_switch_xlog()" just before shutting down.

That's what I had in mind when writing the patch but I didn't know the
implications of this particular checkpoint.

So moving the call before CreateCheckPoint is what I really intended
now that I have in mind these implications and I don't why it would be
a problem to miss this checkpoint in the logs archived.

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet
<guillaume.smet@gmail.com> wrote:
> On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> At a normal startup, the checkpoint record would be there as usual. And an
>> archive recovery starts at the location indicated by the backup label.
>>
>> AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
>> equivalent to calling "pg_switch_xlog()" just before shutting down.
>
> That's what I had in mind when writing the patch but I didn't know the
> implications of this particular checkpoint.
>
> So moving the call before CreateCheckPoint is what I really intended
> now that I have in mind these implications and I don't know why it would be
> a problem to miss this checkpoint in the logs archived.

What do we decide about this problem?

Should we just call RequestXLogSwitch() before the creation of the
shutdown checkpoint or do we need a more complex patch? If so can
anybody explain the potential problem of this approach so we can
figure how to fix it?

Thanks.

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Guillaume Smet wrote:
> On Tue, Apr 28, 2009 at 5:35 PM, Guillaume Smet
> <guillaume.smet@gmail.com> wrote:
>> On Tue, Apr 28, 2009 at 5:22 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> At a normal startup, the checkpoint record would be there as usual. And an
>>> archive recovery starts at the location indicated by the backup label.
>>>
>>> AFAICS calling RequestXLogSwitch() before CreateCheckPoint would be
>>> equivalent to calling "pg_switch_xlog()" just before shutting down.
>> That's what I had in mind when writing the patch but I didn't know the
>> implications of this particular checkpoint.
>>
>> So moving the call before CreateCheckPoint is what I really intended
>> now that I have in mind these implications and I don't know why it would be
>> a problem to miss this checkpoint in the logs archived.
> 
> What do we decide about this problem?
> 
> Should we just call RequestXLogSwitch() before the creation of the
> shutdown checkpoint or do we need a more complex patch? If so can
> anybody explain the potential problem of this approach so we can
> figure how to fix it?

I've committed a patch to do the RequstXLogSwitch() before shutdown 
checkpoint as discussed. It seems safe to me. (sorry for the delay, and 
thanks for the reminder)

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote:

> I've committed a patch to do the RequstXLogSwitch() before shutdown 
> checkpoint as discussed. It seems safe to me. (sorry for the delay, and 
> thanks for the reminder)

Not sure if that is a fix that will work in all cases. 

There is a potential timing problem with when the archiver is shutdown:
that may now be fixed in 8.4, see what you think.

Also if archiving is currently stalled, then files will not be
transferred, even if you switch xlogs. So this is at best a partial fix
to the problem and the need for a manual check of file contents
remains. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote:
> 
>> I've committed a patch to do the RequstXLogSwitch() before shutdown 
>> checkpoint as discussed. It seems safe to me. (sorry for the delay, and 
>> thanks for the reminder)
> 
> Not sure if that is a fix that will work in all cases. 
> 
> There is a potential timing problem with when the archiver is shutdown:
> that may now be fixed in 8.4, see what you think.

Can you elaborate?

> Also if archiving is currently stalled, then files will not be
> transferred, even if you switch xlogs. So this is at best a partial fix
> to the problem and the need for a manual check of file contents
> remains. 

Yep. Maybe we should print the filename of the last WAL segment to the 
log at shutdown, so that you can easily check that you have everything 
in the archive.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 16:19 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote:
> > 
> >> I've committed a patch to do the RequstXLogSwitch() before shutdown 
> >> checkpoint as discussed. It seems safe to me. (sorry for the delay, and 
> >> thanks for the reminder)
> > 
> > Not sure if that is a fix that will work in all cases. 
> > 
> > There is a potential timing problem with when the archiver is shutdown:
> > that may now be fixed in 8.4, see what you think.
> 
> Can you elaborate?

Is the archiver still alive and working after the log switch occurs?

If the archiver is working, but has fallen behind at the point of
shutdown, does the archiver operate for long enough to ensure we are
archived up to the point of the log switch prior to checkpoint?

> > Also if archiving is currently stalled, then files will not be
> > transferred, even if you switch xlogs. So this is at best a partial fix
> > to the problem and the need for a manual check of file contents
> > remains. 
> 
> Yep. Maybe we should print the filename of the last WAL segment to the 
> log at shutdown, so that you can easily check that you have everything 
> in the archive.

You still need a script to read that and synchronize file contents.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-05-28 at 16:19 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2009-05-28 at 14:04 +0300, Heikki Linnakangas wrote:
>>>
>>>> I've committed a patch to do the RequstXLogSwitch() before shutdown 
>>>> checkpoint as discussed. It seems safe to me. (sorry for the delay, and 
>>>> thanks for the reminder)
>>> Not sure if that is a fix that will work in all cases. 
>>>
>>> There is a potential timing problem with when the archiver is shutdown:
>>> that may now be fixed in 8.4, see what you think.
>> Can you elaborate?
> 
> Is the archiver still alive and working after the log switch occurs?

Yes.

> If the archiver is working, but has fallen behind at the point of
> shutdown, does the archiver operate for long enough to ensure we are
> archived up to the point of the log switch prior to checkpoint?

Yes, it archives all pending WAL segments before exiting.

Ok, we're good then I guess.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 16:52 +0300, Heikki Linnakangas wrote:

> > If the archiver is working, but has fallen behind at the point of
> > shutdown, does the archiver operate for long enough to ensure we are
> > archived up to the point of the log switch prior to checkpoint?
> 
> Yes, it archives all pending WAL segments before exiting.

I don't think it does, please look again. 

> Ok, we're good then I guess.

No, because as I said, if archive_command has been returning non-zero
then the archive will be incomplete.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-05-28 at 16:52 +0300, Heikki Linnakangas wrote:
> 
>>> If the archiver is working, but has fallen behind at the point of
>>> shutdown, does the archiver operate for long enough to ensure we are
>>> archived up to the point of the log switch prior to checkpoint?
>> Yes, it archives all pending WAL segments before exiting.
> 
> I don't think it does, please look again. 

Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready 
WAL segments have been archived (assuming no errors).

>> Ok, we're good then I guess.
> 
> No, because as I said, if archive_command has been returning non-zero
> then the archive will be incomplete.

Yes. You think that's wrong? How would you like it to behave, then? I 
don't think you want the shutdown to wait indefinitely until all files 
have been archived if there's an error.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 17:21 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > 
> > I don't think it does, please look again. 
> 
> Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready 
> WAL segments have been archived (assuming no errors).

No, it doesn't now, though it did used to. line 440.

> >> Ok, we're good then I guess.
> > 
> > No, because as I said, if archive_command has been returning non-zero
> > then the archive will be incomplete.
> 
> Yes. You think that's wrong? How would you like it to behave, then? I 
> don't think you want the shutdown to wait indefinitely until all files 
> have been archived if there's an error.

The complaint was that we needed to run a manual step to synchronise the
pg_xlog directory on the standby. We still need to do that, even after
the patch has been committed because 2 cases are not covered, so what is
the point of the recent change? It isn't enough. It *might* be enough,
most of the time, but you have no way of knowing that is the case and it
is dangerous not to check.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-05-28 at 17:21 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> I don't think it does, please look again. 
>> Still looks ok to me. pgarch_ArchiverCopyLoop() loops until all ready 
>> WAL segments have been archived (assuming no errors).
> 
> No, it doesn't now, though it did used to. line 440.

postmaster never sends SIGTERM to pgarch, and postmaster is still alive.

>>>> Ok, we're good then I guess.
>>> No, because as I said, if archive_command has been returning non-zero
>>> then the archive will be incomplete.
>> Yes. You think that's wrong? How would you like it to behave, then? I 
>> don't think you want the shutdown to wait indefinitely until all files 
>> have been archived if there's an error.
> 
> The complaint was that we needed to run a manual step to synchronise the
> pg_xlog directory on the standby. We still need to do that, even after
> the patch has been committed because 2 cases are not covered, so what is
> the point of the recent change? It isn't enough. It *might* be enough,
> most of the time, but you have no way of knowing that is the case and it
> is dangerous not to check.

So you check. This solves Guillaume's immediate concern. If you have a 
suggestion for further improvements, I'm all ears.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Andreas Pflug
Date:
Simon Riggs wrote:
>>>
>>> No, because as I said, if archive_command has been returning non-zero
>>> then the archive will be incomplete.
>>>       
>> Yes. You think that's wrong? How would you like it to behave, then? I 
>> don't think you want the shutdown to wait indefinitely until all files 
>> have been archived if there's an error.
>>     
>
> The complaint was that we needed to run a manual step to synchronise the
> pg_xlog directory on the standby. We still need to do that, even after
> the patch has been committed because 2 cases are not covered, so what is
> the point of the recent change? It isn't enough. It *might* be enough,
> most of the time, but you have no way of knowing that is the case and it
> is dangerous not to check.
>   
If archiving has stalled, it's not a clean shutdown anyway and I
wouldn't expect the wal archive to be automatically complete. I'd still
appreciate a warning that while the shutdown appeared regular, wal
wasn't written completely. But the "corner case" of shutting down a
smoothly running server, the wal archive archive should be complete as well.

Regards,
Andreas



Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Thu, May 28, 2009 at 5:02 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> So you check. This solves Guillaume's immediate concern. If you have a
> suggestion for further improvements, I'm all ears.

Thanks for applying the patch.

Yes, the problem is that before this change, even with a working
replication and a clean shutdown, you still had to replicate the last
WAL file by hand. Personnally, I have an eye on each postgresql log
file when I switch from one server to another so I can see if anything
is going wrong (that said, it could be a problem with more than 2
servers...).

This patch just fixes this problem not the other concerns and corner
cases we might have. If we want to go further, we need to agree on
what we want exactly and which corner cases we want to cover but it's
probably 8.5 material at this point.

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 17:16 +0200, Guillaume Smet wrote:
> On Thu, May 28, 2009 at 5:02 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
> > So you check. This solves Guillaume's immediate concern. If you have a
> > suggestion for further improvements, I'm all ears.
> 
> Thanks for applying the patch.
> 
> Yes, the problem is that before this change, even with a working
> replication and a clean shutdown, you still had to replicate the last
> WAL file by hand. Personnally, I have an eye on each postgresql log
> file when I switch from one server to another so I can see if anything
> is going wrong (that said, it could be a problem with more than 2
> servers...).
> 
> This patch just fixes this problem not the other concerns and corner
> cases we might have. If we want to go further, we need to agree on
> what we want exactly and which corner cases we want to cover but it's
> probably 8.5 material at this point.

Your original post wanted to know "we are sure we have all the useful
XLog files when we perform a clean shutdown of master". The patch does
not solve the problem you stated. 

You may consider it useful, but a manual check or script execution must
still happen. 

If you feel we have moved forwards, that's good, but since no part of
the *safe* maintenance procedure has changed, I don't see that myself.
Only the unsafe way of doing it got faster.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Thu, May 28, 2009 at 5:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> If you feel we have moved forwards, that's good, but since no part of
> the *safe* maintenance procedure has changed, I don't see that myself.
> Only the unsafe way of doing it got faster.

I disagree with you.

The situation was:
- you stop the master;
- everything seems to be OK in the log files (archiving and so on);
- it's broken anyway as you don't have the last log file;
- you have to copy the last log file manually.
- you can start the slave.

It is now:
- you stop the master;
- if everything is OK in the log files, the last log file has been
archived (and yes I check it manually too) and it's done. If not (and
it's the exception, not the rule) I have to copy manually the missing
WAL files;
- you can start the slave.

I think it's a step forward, maybe not sufficient for you but I prefer
the situation now than before. It's safer because of the principle of
least surprise: I'm pretty sure a lot of people didn't even think that
the last WAL file was systematically missing.

As Heikki stated it, if you have concrete proposals of how we can fix
the other corner cases, we're all ears. Considering my current level
of knowledge, that's all I can do by myself.

IMHO, that's something that needs to be treated in the massive
replication work planned for 8.5.

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 17:50 +0200, Guillaume Smet wrote:

> I think it's a step forward, maybe not sufficient for you but I prefer
> the situation now than before. It's safer because of the principle of
> least surprise: I'm pretty sure a lot of people didn't even think that
> the last WAL file was systematically missing.

If I hadn't spoken out, I think you would have assumed you were safe and
so would everybody else. Time is saved only if you perform the step
manually - if time saving was your objective you should have been using
a script in the first place. If you're using a script, carry on using
it: nothing has changed, you still need to check.

> As Heikki stated it, if you have concrete proposals of how we can fix
> the other corner cases, we're all ears. Considering my current level
> of knowledge, that's all I can do by myself.

I'm not sure there is a solution even. Fixing a broken archive_command
is not something PostgreSQL can achieve, by definition.

It's good you submitted a patch, I have no problem there, BTW, but
applying a patch during beta, should either fix the problem or not be
applied at all.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Guillaume Smet
Date:
On Thu, May 28, 2009 at 6:06 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, 2009-05-28 at 17:50 +0200, Guillaume Smet wrote:
>
>> I think it's a step forward, maybe not sufficient for you but I prefer
>> the situation now than before. It's safer because of the principle of
>> least surprise: I'm pretty sure a lot of people didn't even think that
>> the last WAL file was systematically missing.
>
> If I hadn't spoken out, I think you would have assumed you were safe and
> so would everybody else. Time is saved only if you perform the step
> manually - if time saving was your objective you should have been using
> a script in the first place. If you're using a script, carry on using
> it: nothing has changed, you still need to check.

You might think that but I won't have. I will still monitor my log
files carefully and check the last WAL file is received and treated on
the slave as I currently do.

I prefer checking it visually than using a script.

At least, now, I have a chance to have it working without a manual intervention.

> It's good you submitted a patch, I have no problem there, BTW, but
> applying a patch during beta, should either fix the problem or not be
> applied at all.

Well, I don't think we'll agree on that. Anyway, have a nice day :).

-- 
Guillaume


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote:

> postmaster never sends SIGTERM to pgarch, and postmaster is still alive.

Then we have a regression, since we changed the code to make sure the
archiver did shutdown even if there was a backlog. The reason is that if
there is a long backlog at the time we restart we may cause a long
outage while we wait for the archiver to shutdown, before postmaster
restarts.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote:
> 
>> postmaster never sends SIGTERM to pgarch, and postmaster is still alive.
> 
> Then we have a regression, since we changed the code to make sure the
> archiver did shutdown even if there was a backlog.

The commit message of the commit that introduced the check for SIGTERM says:

"
Also, modify the archiver process to notice SIGTERM and refuse to issue any
more archive commands if it gets it.  The postmaster doesn't ever send it
SIGTERM; we assume that any such signal came from init and is a notice of
impending whole-system shutdown.  In this situation it seems imprudent 
to try
to start new archive commands --- if they aren't extremely quick they're
likely to get SIGKILL'd by init.
"

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 19:58 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-05-28 at 18:02 +0300, Heikki Linnakangas wrote:
> > 
> >> postmaster never sends SIGTERM to pgarch, and postmaster is still alive.
> > 
> > Then we have a regression, since we changed the code to make sure the
> > archiver did shutdown even if there was a backlog.
> 
> The commit message of the commit that introduced the check for SIGTERM says:
> 
> "
> Also, modify the archiver process to notice SIGTERM and refuse to issue any
> more archive commands if it gets it.  The postmaster doesn't ever send it
> SIGTERM; we assume that any such signal came from init and is a notice of
> impending whole-system shutdown.  In this situation it seems imprudent 
> to try
> to start new archive commands --- if they aren't extremely quick they're
> likely to get SIGKILL'd by init.
> "

Sounds great, but what does that mean exactly?

The same commit message also says:

"The new behavior is that the archiver is allowed to run unmolested
until the bgwriter has exited; then it is sent SIGUSR2 to tell it to do
a final archiving cycle and quit."

Thus there is no guarantee that this is sufficient to have archived all
the files you would like to archive. The patch does not provide a clean
shutdown in all cases and since you don't know what state its in, you
are still forced to take external action to be safe, exactly as you do
already. 

If I didn't already say, I came up with exactly the same solution 2
years ago and then later explained it didn't work in all cases. I'm
saying the same thing again here now.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Thus there is no guarantee that this is sufficient to have archived all
> the files you would like to archive. The patch does not provide a clean
> shutdown in all cases and since you don't know what state its in, you
> are still forced to take external action to be safe, exactly as you do
> already. 

> If I didn't already say, I came up with exactly the same solution 2
> years ago and then later explained it didn't work in all cases. I'm
> saying the same thing again here now.

What's your point?  Surely the applied patch is a *necessary* component
of any attempt to try to ensure archiving is complete at shutdown.
I agree that it doesn't cover every risk factor, and there are some
risk factors that cannot be covered by Postgres itself.  But isn't it
a step in a desirable direction?
        regards, tom lane


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Thu, 2009-05-28 at 18:09 -0400, Tom Lane wrote:

> What's your point? Surely the applied patch is a *necessary* component
> of any attempt to try to ensure archiving is complete at shutdown.
> I agree that it doesn't cover every risk factor, and there are some
> risk factors that cannot be covered by Postgres itself.  But isn't it
> a step in a desirable direction?

Well, in one way, yes. I certainly encourage Guillaume to submit more
patches and for everybody to review them, as has been done. I turned up
late to the party on this, I know.

Regrettably, the patch doesn't remove the problem it was supposed to
remove and I'm highlighting there is still risk of data loss. I suggest
that we don't change any docs, and carefully word or even avoid any
release note inclusion to avoid lulling people into stopping safety
measures. 

The patch doesn't cause any problems though so we don't need to remove
it either.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> Regrettably, the patch doesn't remove the problem it was supposed to
> remove and I'm highlighting there is still risk of data loss. 

I feel that you're moving the goalposts. What exactly is the problem it 
was supposed to remove in your opinion?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Clean shutdown and warm standby

From
Robert Haas
Date:
On Fri, May 29, 2009 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Regrettably, the patch doesn't remove the problem it was supposed to
> remove and I'm highlighting there is still risk of data loss. I suggest
> that we don't change any docs, and carefully word or even avoid any
> release note inclusion to avoid lulling people into stopping safety
> measures.

I think it's pretty clear that you and the OP are talking about two
different problems.  To quote Guillaume:

"Yes, the problem is that before this change, even with a working
replication and a clean shutdown, you still had to replicate the last
WAL file by hand."

I think that's a pretty legitimate complaint.  You seem to that this
wasn't worth fixing at this point in the development cycle, because it
was always possible to write a script to copy that last WAL file by
hand.  That's a judgment call, of course, and you are entitled to your
own opinion on the topic, but that doesn't mean that the complaint, as
defined by the person complaining, hasn't been fixed.

...Robert


Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Fri, 2009-05-29 at 21:46 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Regrettably, the patch doesn't remove the problem it was supposed to
> > remove and I'm highlighting there is still risk of data loss. 
> 
> I feel that you're moving the goalposts. What exactly is the problem it 
> was supposed to remove in your opinion?

I feel that you wish to argue this endlessly so that my point is lost
and the threat of data loss that was left exposed is forgotten.

I'm happy that I understand those threats and will advise my clients
accordingly. I've tried to help the community, but in the end, there is
a point where I stop trying to do so. Now, in fact.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Fri, 2009-05-29 at 14:54 -0400, Robert Haas wrote:
> On Fri, May 29, 2009 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Regrettably, the patch doesn't remove the problem it was supposed to
> > remove and I'm highlighting there is still risk of data loss. I suggest
> > that we don't change any docs, and carefully word or even avoid any
> > release note inclusion to avoid lulling people into stopping safety
> > measures.
> 
> I think it's pretty clear that you and the OP are talking about two
> different problems.  To quote Guillaume:
> 
> "Yes, the problem is that before this change, even with a working
> replication and a clean shutdown, you still had to replicate the last
> WAL file by hand."
> 
> I think that's a pretty legitimate complaint.  

It's valid complaint, yes, but only for people that do this manually,
which is nobody I ever met, in *production*. (ymmv etc)

> You seem to think that this wasn't worth fixing...

And for them, it hasn't been completely fixed. That point was not made
by patch author or committer, leaving the impression it was now
completely safe, which, I truly regret to say, is not correct.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Simon Riggs
Date:
On Fri, 2009-05-29 at 21:16 -0300, Euler Taveira de Oliveira wrote:
> Simon Riggs escreveu:
> > And for them, it hasn't been completely fixed. That point was not made
> > by patch author or committer, leaving the impression it was now
> > completely safe, which, I truly regret to say, is not correct.
> > 
> Simon, could you point out what the patch does not do? If we can't fix it now,
> at least we add it to TODO.

I already did, on my first post on this thread. I won't repeat myself,
so that we can avoid restarting the discussion; forgive me.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Clean shutdown and warm standby

From
Euler Taveira de Oliveira
Date:
Simon Riggs escreveu:
> And for them, it hasn't been completely fixed. That point was not made
> by patch author or committer, leaving the impression it was now
> completely safe, which, I truly regret to say, is not correct.
> 
Simon, could you point out what the patch does not do? If we can't fix it now,
at least we add it to TODO.


--  Euler Taveira de Oliveira http://www.timbira.com/