Thread: Re: Re: Problem with PITR recovery

Re: Re: Problem with PITR recovery

From
Date:
Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20:
>
> > I'd say it's very not cool :) It's not we all
> > expected from PITR.
> > I recall now Simon mentioned about that and have it
> > in his TODO.
> > Other thing I don't understand what's the problem to
> > generate WAL file
> > by demand ? Probably, TODO should says about this.
>
> This would definetly be a good feature to have.  What
> I would prefer is:
>
> 1) have the pitr stop command write out and close the
> WAL that it is currently using.
>
> 2) have another stored proc which can be invoked at
> any time that will write out and close the WAL that is
> currently in use when that command is executed.
>
> 3) have a feature in postgres that will automatically
> write out and close the WAL if the server hasn't had
> any activity in XX minutes, or hasn't closed a WAL
> file in XX minutes.

Yes, I have been working on a design.

1) is required to make PITR better for low transaction rate users.

3) is required to allow standby replication

2) is a standard feature on other DBMS, but I'd have to consider that as
optional.

Anyway, I'll post more in a few hours on this.

Best Regards, Simon Riggs


Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Mon, 2005-04-18 at 16:44 +0200, simon@2ndquadrant.com wrote:
> Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20:
> > 
> > > I'd say it's very not cool :) It's not we all
> > > expected from PITR.
> > > I recall now Simon mentioned about that and have it
> > > in his TODO.
> > > Other thing I don't understand what's the problem to
> > > generate WAL file
> > > by demand ? Probably, TODO should says about this.
> > 
> > This would definetly be a good feature to have.  What
> > I would prefer is:
> > 
> > 1) have the pitr stop command write out and close the
> > WAL that it is currently using.
> > 
> > 2) have another stored proc which can be invoked at
> > any time that will write out and close the WAL that is
> > currently in use when that command is executed.
> > 
> > 3) have a feature in postgres that will automatically
> > write out and close the WAL if the server hasn't had
> > any activity in XX minutes, or hasn't closed a WAL
> > file in XX minutes.
> 
> Yes, I have been working on a design.
> 
> 1) is required to make PITR better for low transaction rate users.
> 
> 3) is required to allow standby replication 
> 
> 2) is a standard feature on other DBMS, but I'd have to consider that as
> optional.

My plan would be to write a special xlog record for xlog switching. This
would be a special processing instruction, rather than a data/redo
instructions. This would be implemented as another xlog info value on
the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo
would simply set a variable to be used elsewhere.)

When written the xlog switch instruction (XLogInsert) would switch to a
new xlog, just as if a file had been filled, causing it to be
immediately archived. On wal replay, ReadRecord would read the
instruction, then react by moving to the next file, as if it had
naturally reached EOF. 

The wal file could be truncated after the log switch record, though I'd
want to make sure that didn't cause other problems. That is additional
functionality that I would add later when the above all works...

That would be initiated through a single function pg_walfile_switch()
which would be called from 
1) pg_stop_backup()
2) by user command
3) at a specified timeout within archiver (already built in)

A shutdown checkpoint would also have the same effect as an
XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
away the file. Otherwise, we'd have a problem as to which order to write
the messages in at shutdown time. (Not happy about that bit, so
suggestions welcome...)

I'd suggest this as a backpatch for 8.0.x, when completed. I'll commit
to doing this in time for 8.1, possibly sooner.

Comments?

Best Regards, Simon Riggs





Re: Problem with PITR recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> The wal file could be truncated after the log switch record, though I'd
> want to make sure that didn't cause other problems.

Which it would: that would break WAL file recycling.

> That would be initiated through a single function pg_walfile_switch()
> which would be called from 
> 1) pg_stop_backup()
> 2) by user command
> 3) at a specified timeout within archiver (already built in)

I would really, really, like NOT to have a user command for this.
(If pg_stop_backup does it, that already provides an out for anyone
who thinks they need to invoke it manually.)

> A shutdown checkpoint would also have the same effect as an
> XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
> away the file.

The archiver is stopped before we do the shutdown, no?

> I'd suggest this as a backpatch for 8.0.x, when completed.

Not a chance --- it's a new feature, not a bug fix, and has substantial
risk of breaking things.
        regards, tom lane


Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Mon, 2005-04-18 at 19:21 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > The wal file could be truncated after the log switch record, though I'd
> > want to make sure that didn't cause other problems.
> 
> Which it would: that would break WAL file recycling.

Yeh, there's just too many references to the file length for comfort.

> > That would be initiated through a single function pg_walfile_switch()
> > which would be called from 
> > 1) pg_stop_backup()
> > 2) by user command
> > 3) at a specified timeout within archiver (already built in)
> 
> I would really, really, like NOT to have a user command for this.
> (If pg_stop_backup does it, that already provides an out for anyone
> who thinks they need to invoke it manually.)

Actually, me too. Never saw the need for the Oracle command myself.

> > A shutdown checkpoint would also have the same effect as an
> > XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
> > away the file.
> 
> The archiver is stopped before we do the shutdown, no?

Currently, the bgwriter issues the Shutdown checkpoint and the archiver
is always stopped after the bgwriter has issued the checkpoint and quit.
It should be possible to send archiver a signal to attempt any remaining
archiving before shutdown.

Of course, this behaviour would only be initiated when
XLogArchivingActive() is true, since it makes no sense otherwise.

> > I'd suggest this as a backpatch for 8.0.x, when completed.
> 
> Not a chance --- it's a new feature, not a bug fix, and has substantial
> risk of breaking things.

No problem for me personally; I only request it, according to users
wishes.

Best Regards, Simon Riggs



Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > The wal file could be truncated after the log switch record, though I'd
> > want to make sure that didn't cause other problems.
> 
> Which it would: that would break WAL file recycling.

Good point. I don't see non-full WAL archiving as a problem for the
backup or shutdown, but I do see an issue with doing archives every X
seconds.  If someone sets that really low (and someone will) we could
easily fill the disk.  However, rather than do it ourselves, maybe we
should make it visible to administrators so they know exactly what is
happening and can undo it in case they need to recover, something like:

archive_command = 'gzip <%p >%f'

so the compression is done in a way that is visible to the
administrator.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Oleg Bartunov
Date:
On Tue, 19 Apr 2005, Simon Riggs wrote:

>>> I'd suggest this as a backpatch for 8.0.x, when completed.
>>
>> Not a chance --- it's a new feature, not a bug fix, and has substantial
>> risk of breaking things.
>
> No problem for me personally; I only request it, according to users
> wishes.

Users wish deterministic procedure of online backup. Well, it should be
at least clearly documented and explained.


>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > The wal file could be truncated after the log switch record, though I'd
> > > want to make sure that didn't cause other problems.
> > 
> > Which it would: that would break WAL file recycling.
> 
> Good point. I don't see non-full WAL archiving as a problem for the
> backup or shutdown, but I do see an issue with doing archives every X
> seconds.  If someone sets that really low (and someone will) we could
> easily fill the disk.  

The disk would only fill if the archiver doesn't keep up with
transmitting xlog files to the archive. The archive can fill up if it is
not correctly sized, even now. Switching log files every N seconds would
at least give a very predictable archive sizing calculation which should
actually work against users sizing their archives poorly.

> However, rather than do it ourselves, maybe we
> should make it visible to administrators so they know exactly what is
> happening and can undo it in case they need to recover, something like:
> 
> 
>     archive_command = 'gzip <%p >%f'
> 
> so the compression is done in a way that is visible to the
> administrator.

As long as we tell them there's more than one way to do it. Many tape
drives offer hardware compression, for example, so there would be no
gain in doing this twice.

Best Regards, Simon Riggs



Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Simon Riggs wrote:
> On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Simon Riggs <simon@2ndquadrant.com> writes:
> > > > The wal file could be truncated after the log switch record, though I'd
> > > > want to make sure that didn't cause other problems.
> > > 
> > > Which it would: that would break WAL file recycling.
> > 
> > Good point. I don't see non-full WAL archiving as a problem for the
> > backup or shutdown, but I do see an issue with doing archives every X
> > seconds.  If someone sets that really low (and someone will) we could
> > easily fill the disk.  
> 
> The disk would only fill if the archiver doesn't keep up with
> transmitting xlog files to the archive. The archive can fill up if it is
> not correctly sized, even now. Switching log files every N seconds would
> at least give a very predictable archive sizing calculation which should
> actually work against users sizing their archives poorly.

I was thinking of the archiver filling because of lots of almost-empty
16mb files.  If you archive every five seconds, it is 11 Gigs/hour,
which is not too bad, I guess, but I would bet compression would save
space and I/O load too.

> > However, rather than do it ourselves, maybe we
> > should make it visible to administrators so they know exactly what is
> > happening and can undo it in case they need to recover, something like:
> > 
> > 
> >     archive_command = 'gzip <%p >%f'
> > 
> > so the compression is done in a way that is visible to the
> > administrator.
> 
> As long as we tell them there's more than one way to do it. Many tape
> drives offer hardware compression, for example, so there would be no
> gain in doing this twice.

Good point.  I am thinking 'gzip --fast' would be the best option for
copies to another file system.  I see about 0.6 seconds to compress a
16mb WAL file here and I get 16x compression.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I was thinking of the archiver filling because of lots of almost-empty
> 16mb files.  If you archive every five seconds, it is 11 Gigs/hour,
> which is not too bad, I guess, but I would bet compression would save
> space and I/O load too.

If you wanted to archive every few seconds, it would be worth cutting
the size of the segment files.  At the moment I believe the segment
size is a pg_config_manual.h configuration item.  Not sure if it would
be practical to make it run-time configurable, but in any case doing that
would help a lot for people who want short archive cycles.

But really, if that is the concern, I'd think you'd want Slony or some
other near-real-time replication mechanism.  PITR is designed for people
for whom some-small-number-of-minutes is close enough.
        regards, tom lane


Re: Problem with PITR recovery

From
Alvaro Herrera
Date:
On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote:
> Simon Riggs wrote:

> > The disk would only fill if the archiver doesn't keep up with
> > transmitting xlog files to the archive. The archive can fill up if it is
> > not correctly sized, even now. Switching log files every N seconds would
> > at least give a very predictable archive sizing calculation which should
> > actually work against users sizing their archives poorly.
> 
> I was thinking of the archiver filling because of lots of almost-empty
> 16mb files.  If you archive every five seconds, it is 11 Gigs/hour,
> which is not too bad, I guess, but I would bet compression would save
> space and I/O load too.

I suggested back then that some command to replace an archive could be
provided.  So some people could use rsync to update the older version of
the XLog file to the new state.  Non-rsync enabled people could use a
temporary file to copy the new file, and then rename to the original
XLog name, substituting the older version.  And as a third way, maybe we
can come up with a sort-of-xdelta that would only update the yet-unused
portion of the old xlog file to the new content.  (Maybe this could be
made to work with tape.)

Everyone here said that there was no need for such a thing because it
would complicate matters.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)


Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote:
> > Simon Riggs wrote:
> 
> > > The disk would only fill if the archiver doesn't keep up with
> > > transmitting xlog files to the archive. The archive can fill up if it is
> > > not correctly sized, even now. Switching log files every N seconds would
> > > at least give a very predictable archive sizing calculation which should
> > > actually work against users sizing their archives poorly.
> > 
> > I was thinking of the archiver filling because of lots of almost-empty
> > 16mb files.  If you archive every five seconds, it is 11 Gigs/hour,
> > which is not too bad, I guess, but I would bet compression would save
> > space and I/O load too.
> 
> I suggested back then that some command to replace an archive could be
> provided.  So some people could use rsync to update the older version of
> the XLog file to the new state.  Non-rsync enabled people could use a
> temporary file to copy the new file, and then rename to the original
> XLog name, substituting the older version.  And as a third way, maybe we
> can come up with a sort-of-xdelta that would only update the yet-unused
> portion of the old xlog file to the new content.  (Maybe this could be
> made to work with tape.)
> 
> Everyone here said that there was no need for such a thing because it
> would complicate matters.

I do think we are going to need to go in that direction.  I think the
problem is that we didn't have enough time to come up with a clear
solution to this problem so we delayed it for 8.1.

I agree the idea of overwriting is a nice idea and works for everything
but a tape drive, so it has to be optional in some way.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Klaus Naumann
Date:
Hi Simon,

> Actually, me too. Never saw the need for the Oracle command myself.

It actually has. If you want to move your redo logs to a new disk, you
create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE;
to switch to the new logfile. Then you can remove the "old" one
(speaking just of one file for simplification).
Waiting on that event could take ages.

Strictly speaking, this doesn't concern postgresql (yet). But if, at the
future, we support user defined (= changing these parameters while the
db is running) redo log locations, sizes and count, we need a function
to switch the logfile manually. Which I think the pg_stop_backup()
hack is not suitable for.


Re: Problem with PITR recovery

From
Andrew Rawnsley
Date:
It is also recommended when creating new standby control files, when 
Oracle can't
automatically expand the data file capacity on a standby like it does 
with
a live database. Nothing like seeing the 'Didn't restore XXXX from 
sufficiently old
backup' message when Oracle is confused (which seems to be most of the 
time)
about what transactions have been applied where.

This, of course, doesn't matter for postgresql. Thank the gods....

On Apr 20, 2005, at 3:28 AM, Klaus Naumann wrote:

> Hi Simon,
>
>> Actually, me too. Never saw the need for the Oracle command myself.
>
> It actually has. If you want to move your redo logs to a new disk, you
> create a new redo log file and then issue a ALTER SYSTEM SWITCH 
> LOGFILE;
> to switch to the new logfile. Then you can remove the "old" one
> (speaking just of one file for simplification).
> Waiting on that event could take ages.
>
> Strictly speaking, this doesn't concern postgresql (yet). But if, at 
> the
> future, we support user defined (= changing these parameters while the
> db is running) redo log locations, sizes and count, we need a function
> to switch the logfile manually. Which I think the pg_stop_backup()
> hack is not suitable for.
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>
____________________________

Andrew Rawnsley
Chief Technology Officer
Investor Analytics, LLC
(740) 587-0114
http://www.investoranalytics.com



Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Wed, 2005-04-20 at 09:28 +0200, Klaus Naumann wrote:
> 
> > Actually, me too. Never saw the need for the Oracle command myself.
> 
> It actually has. If you want to move your redo logs to a new disk, you
> create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE;
> to switch to the new logfile. Then you can remove the "old" one
> (speaking just of one file for simplification).
> Waiting on that event could take ages.
> 
> Strictly speaking, this doesn't concern postgresql (yet). But if, at the
> future, we support user defined (= changing these parameters while the
> db is running) redo log locations, sizes and count, we need a function
> to switch the logfile manually. Which I think the pg_stop_backup()
> hack is not suitable for.

Thanks Klaus - I never tried that online.

We're someway away from functionality for online redo location
migration, I agree. Sounds like we'd still be able to do the log switch
as part that.

Best Regards, Simon Riggs



Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Mon, 2005-04-18 at 23:20 +0100, Simon Riggs wrote:
> My plan would be to write a special xlog record for xlog switching. This
> would be a special processing instruction, rather than a data/redo
> instructions. This would be implemented as another xlog info value on
> the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo
> would simply set a variable to be used elsewhere.)
> 
> When written the xlog switch instruction (XLogInsert) would switch to a
> new xlog, just as if a file had been filled, causing it to be
> immediately archived. 

This has been mostly implemented and posted to PATCHES, though I have a
later patch also. There are some points still to discuss.

Setting the pointer seems to work, but there are 3 pointers, each
protected by a separate locks. All of those are designed to be taken and
held independently.

My understanding is that the correct locking order would be:

WALInsertLock
WALWriteLock
info_lck

XLogInsert uses info_lck first, but then checks everything again once it
acquires WALInsertLock. To switch files, we must ensure that nobody can
insert xlrecs with a record pointer higher than the log switch record.
This is different from checkpoints, where a checkpoint record can
actually occur before records which are logically after it; that must
never happen with a log switch else we'd miss them entirely on wal
replay. 

Next, from XLogInsert with WALInsertLock held, we wait to acquire
WALWriteLock, since an I/O might be in progress currently. When we have
this, we then issue an XLogWrite, during which we update the record
pointer, which then is propogated through to info_lck.

AFAICS this is the only case of unconditionally acquiring all 3 locks.

Do we agree that this is the correct lock sequence, and if it is, do we
think that this leaves open the chance of deadlock at any stage?

> A shutdown checkpoint would also have the same effect as an
> XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy
> away the file. Otherwise, we'd have a problem as to which order to write
> the messages in at shutdown time. (Not happy about that bit, so
> suggestions welcome...)

Treating shutdown checkpoint markers as xlog switches is possible but
gives problems since archive_command is a SUSET variable. On replay we
wouldn't necessarily know whether a shutdown checkpoint was treated as
an xlog switch when it was written, so we'd need to attempt to switch
and look beyond the checkpoint marker, just in case. That makes me
uncomfortable.

Hmmm...

Best Regards, Simon Riggs




Re: Problem with PITR recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> AFAICS this is the only case of unconditionally acquiring all 3 locks.

You just lost me ... I think the above is certainly a bad idea from a
concurrency standpoint, and very possibly a deadlock risk.

In any case you are thinking about it the wrong way.  It is not
LogwrtResult you want to advance, it is the Insert variables that define
what the current WAL buffer page is.

ISTM the correct approach involves having a special case in XLogInsert:
just after inserting an end-of-file record, forcibly advance to the next
buffer, and set it up to be the first page for the next segment rather
than the next segment in sequence.  (This is likely best handled as an
extra call to AdvanceXLInsertBuffer that invokes some special-case code
in AdvanceXLInsertBuffer.)  You normally only need the WALInsertLock to
do this.  After that's complete you can release the insert lock, and
then other operations can proceed while you do an XLogFlush to force out
the remaining dirty WAL buffers for the old segment.  Then you're done.
(I think I'd put the XLogFlush in the pg_stop_backup code, not in
XLogInsert proper.)
        regards, tom lane


Re: Problem with PITR recovery

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Treating shutdown checkpoint markers as xlog switches is possible but
> gives problems since archive_command is a SUSET variable. On replay we
> wouldn't necessarily know whether a shutdown checkpoint was treated as
> an xlog switch when it was written, so we'd need to attempt to switch
> and look beyond the checkpoint marker, just in case. That makes me
> uncomfortable.

[ Forgot to respond to this part... ]

I think the only safe way to handle that would be to define a shutdown
checkpoint record as being effectively an end-of-file record ALWAYS,
whether archiving or not.  This would be rather a problem for initdb,
which would go through a new XLOG segment for each of its multiple
calls to a standalone backend --- on the other hand, it's not real
clear why we couldn't fold initdb down to one bootstrap run and one
plain standalone backend run, which'd cut that problem down to the
point of tolerability.

However, this still begs the question of why we are bothering.
I disagree with the goal in this particular case anyhow: I do not
think it's necessary, safe, nor sane for a shutdown to try to archive
the last XLOG segment.  Even if we fixed the xlog mechanism to end the
file there, I really have a problem with the idea that the archiver
should try to start a fresh archiving cycle at shutdown.
        regards, tom lane


Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Wed, 2005-04-20 at 15:59 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Treating shutdown checkpoint markers as xlog switches is possible but
> > gives problems since archive_command is a SUSET variable. On replay we
> > wouldn't necessarily know whether a shutdown checkpoint was treated as
> > an xlog switch when it was written, so we'd need to attempt to switch
> > and look beyond the checkpoint marker, just in case. That makes me
> > uncomfortable.
> 
> [ Forgot to respond to this part... ]
> 
> I think the only safe way to handle that would be to define a shutdown
> checkpoint record as being effectively an end-of-file record ALWAYS,
> whether archiving or not.  This would be rather a problem for initdb,
> which would go through a new XLOG segment for each of its multiple
> calls to a standalone backend --- on the other hand, it's not real
> clear why we couldn't fold initdb down to one bootstrap run and one
> plain standalone backend run, which'd cut that problem down to the
> point of tolerability.
> 
> However, this still begs the question of why we are bothering.

Thats a big question :-)

> I disagree with the goal in this particular case anyhow: I do not
> think it's necessary, safe, nor sane for a shutdown to try to archive
> the last XLOG segment.  Even if we fixed the xlog mechanism to end the
> file there, I really have a problem with the idea that the archiver
> should try to start a fresh archiving cycle at shutdown.

Right now, I'm happy to leave that part anyhow...

Best Regards, Simon Riggs



Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Wed, 2005-04-20 at 15:51 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > AFAICS this is the only case of unconditionally acquiring all 3 locks.
> 
> You just lost me ... I think the above is certainly a bad idea from a
> concurrency standpoint, and very possibly a deadlock risk.

'twas my fear too.

> In any case you are thinking about it the wrong way.  It is not
> LogwrtResult you want to advance, it is the Insert variables that define
> what the current WAL buffer page is.

Yes OK, so that way I don't need the 3 locks. Good.

> ISTM the correct approach involves having a special case in XLogInsert:
> just after inserting an end-of-file record, forcibly advance to the next
> buffer, and set it up to be the first page for the next segment rather
> than the next segment in sequence.  (This is likely best handled as an
> extra call to AdvanceXLInsertBuffer that invokes some special-case code
> in AdvanceXLInsertBuffer.)  You normally only need the WALInsertLock to
> do this.  After that's complete you can release the insert lock, and
> then other operations can proceed while you do an XLogFlush to force out
> the remaining dirty WAL buffers for the old segment.  Then you're done.

Good. Thats was roughly what I'm attempting now, just advancing the
wrong pointer and struggling/worried by the 3 lock problem.

> (I think I'd put the XLogFlush in the pg_stop_backup code, not in
> XLogInsert proper.)

That seems like the way its done elsewhere.

Best Regards, Simon Riggs





Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Treating shutdown checkpoint markers as xlog switches is possible but
> > gives problems since archive_command is a SUSET variable. On replay we
> > wouldn't necessarily know whether a shutdown checkpoint was treated as
> > an xlog switch when it was written, so we'd need to attempt to switch
> > and look beyond the checkpoint marker, just in case. That makes me
> > uncomfortable.
> 
> 
> However, this still begs the question of why we are bothering.
> I disagree with the goal in this particular case anyhow: I do not
> think it's necessary, safe, nor sane for a shutdown to try to archive
> the last XLOG segment.  Even if we fixed the xlog mechanism to end the
> file there, I really have a problem with the idea that the archiver
> should try to start a fresh archiving cycle at shutdown.

Doing the archive at server shutdown eliminates one of the "must
document" items, so the system behaves more predictably that it does
not.  It is not required --- it is a usability issue.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> However, this still begs the question of why we are bothering.
>> I disagree with the goal in this particular case anyhow: I do not
>> think it's necessary, safe, nor sane for a shutdown to try to archive
>> the last XLOG segment.  Even if we fixed the xlog mechanism to end the
>> file there, I really have a problem with the idea that the archiver
>> should try to start a fresh archiving cycle at shutdown.

> Doing the archive at server shutdown eliminates one of the "must
> document" items, so the system behaves more predictably that it does
> not.  It is not required --- it is a usability issue.

No, it just replaces a documentation issue with a reliability issue.
We'd have to consider what to say about the prospect that the archiver
is unable to archive that last segment, is kill -9'd by init at some
critical point in the process, etc etc.  I think it's just a bad idea
to promise people that shutting down the postmaster will have any such
effect.
        regards, tom lane


Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> However, this still begs the question of why we are bothering.
> >> I disagree with the goal in this particular case anyhow: I do not
> >> think it's necessary, safe, nor sane for a shutdown to try to archive
> >> the last XLOG segment.  Even if we fixed the xlog mechanism to end the
> >> file there, I really have a problem with the idea that the archiver
> >> should try to start a fresh archiving cycle at shutdown.
> 
> > Doing the archive at server shutdown eliminates one of the "must
> > document" items, so the system behaves more predictably that it does
> > not.  It is not required --- it is a usability issue.
> 
> No, it just replaces a documentation issue with a reliability issue.
> We'd have to consider what to say about the prospect that the archiver
> is unable to archive that last segment, is kill -9'd by init at some
> critical point in the process, etc etc.  I think it's just a bad idea
> to promise people that shutting down the postmaster will have any such
> effect.

OK, makes sense.  Could we give them a command to archive it before they
shut down?  That would make sense.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Tom Lane
Date:
> OK, makes sense.  Could we give them a command to archive it before they
> shut down?  That would make sense.

Not if the idea is to be certain you got everything ... I think what we
have to do is document a manual procedure for archiving the last XLOG
file.

But really my question is "what's the use case for this?"  ISTM that
on-line backups are what PITR users want, not something involving
shutting down the postmaster --- and the changes Simon is already making
will be enough to handle those cases.
        regards, tom lane


Re: Problem with PITR recovery

From
"Michael Paesold"
Date:
Tom Lane wrote:

> Bruce Momjian wrote:
>> OK, makes sense.  Could we give them a command to archive it before they
>> shut down?  That would make sense.
>
> Not if the idea is to be certain you got everything ... I think what we
> have to do is document a manual procedure for archiving the last XLOG
> file.

What Bruce would want is a way to "stop new transactions, archive and 
shutdown", which would do this atomically. Then we could have another 
shutdown switch for pg_ctl.

But yea, a documentation for a manual procedure would be ok, too, just not 
as user friendly.

Best Regards,
Michael Paesold 



Re: Problem with PITR recovery

From
Bruce Momjian
Date:
Michael Paesold wrote:
> Tom Lane wrote:
> 
> > Bruce Momjian wrote:
> >> OK, makes sense.  Could we give them a command to archive it before they
> >> shut down?  That would make sense.
> >
> > Not if the idea is to be certain you got everything ... I think what we
> > have to do is document a manual procedure for archiving the last XLOG
> > file.
> 
> What Bruce would want is a way to "stop new transactions, archive and 
> shutdown", which would do this atomically. Then we could have another 
> shutdown switch for pg_ctl.

Yea, probably a separate switch, or an additional switch to pg_clt would
be best, but then we have to add to pg_ctl.

> But yea, a documentation for a manual procedure would be ok, too, just not 
> as user friendly.

Right.  I just hate the 'do this, do that' instructions to PITR.  When
they get too long/complex, I get worried.

I used to use Informix's ontape, which was a bad user interface because
the admin had to be sure it was always running.  Anyway, when you
control-C'ed the process, it would flush out any partially written wal
file and you knew you had everything.

I am thinking a special pg_ctl flag, and disabling -W for that so you
have to wait for the success message.  Of course we then have to
document the use of the pg_ctl flag then.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Problem with PITR recovery

From
Simon Riggs
Date:
On Thu, 2005-04-21 at 08:57 -0400, Bruce Momjian wrote:
> Michael Paesold wrote:
> > Tom Lane wrote:
> > > Bruce Momjian wrote:
> > >> OK, makes sense.  Could we give them a command to archive it before they
> > >> shut down?  That would make sense.
> > >
> > > Not if the idea is to be certain you got everything ... I think what we
> > > have to do is document a manual procedure for archiving the last XLOG
> > > file.
> > 
> > What Bruce would want is a way to "stop new transactions, archive and 
> > shutdown", which would do this atomically. Then we could have another 
> > shutdown switch for pg_ctl.
> 
> Yea, probably a separate switch, or an additional switch to pg_clt would
> be best, but then we have to add to pg_ctl.
> 
> > But yea, a documentation for a manual procedure would be ok, too, just not 
> > as user friendly.
> 
> Right.  I just hate the 'do this, do that' instructions to PITR.  When
> they get too long/complex, I get worried.

> I am thinking a special pg_ctl flag, and disabling -W for that so you
> have to wait for the success message.  Of course we then have to
> document the use of the pg_ctl flag then.

I'll write the log switch, you decide when/how to invoke it. 

My head hurts.

Best Regards, Simon Riggs