Thread: Re: Re: Problem with PITR recovery
Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20: > > > I'd say it's very not cool :) It's not we all > > expected from PITR. > > I recall now Simon mentioned about that and have it > > in his TODO. > > Other thing I don't understand what's the problem to > > generate WAL file > > by demand ? Probably, TODO should says about this. > > This would definetly be a good feature to have. What > I would prefer is: > > 1) have the pitr stop command write out and close the > WAL that it is currently using. > > 2) have another stored proc which can be invoked at > any time that will write out and close the WAL that is > currently in use when that command is executed. > > 3) have a feature in postgres that will automatically > write out and close the WAL if the server hasn't had > any activity in XX minutes, or hasn't closed a WAL > file in XX minutes. Yes, I have been working on a design. 1) is required to make PITR better for low transaction rate users. 3) is required to allow standby replication 2) is a standard feature on other DBMS, but I'd have to consider that as optional. Anyway, I'll post more in a few hours on this. Best Regards, Simon Riggs
On Mon, 2005-04-18 at 16:44 +0200, simon@2ndquadrant.com wrote: > Rob Butler <crodster2k@yahoo.com> wrote on 18.04.2005, 15:05:20: > > > > > I'd say it's very not cool :) It's not we all > > > expected from PITR. > > > I recall now Simon mentioned about that and have it > > > in his TODO. > > > Other thing I don't understand what's the problem to > > > generate WAL file > > > by demand ? Probably, TODO should says about this. > > > > This would definetly be a good feature to have. What > > I would prefer is: > > > > 1) have the pitr stop command write out and close the > > WAL that it is currently using. > > > > 2) have another stored proc which can be invoked at > > any time that will write out and close the WAL that is > > currently in use when that command is executed. > > > > 3) have a feature in postgres that will automatically > > write out and close the WAL if the server hasn't had > > any activity in XX minutes, or hasn't closed a WAL > > file in XX minutes. > > Yes, I have been working on a design. > > 1) is required to make PITR better for low transaction rate users. > > 3) is required to allow standby replication > > 2) is a standard feature on other DBMS, but I'd have to consider that as > optional. My plan would be to write a special xlog record for xlog switching. This would be a special processing instruction, rather than a data/redo instructions. This would be implemented as another xlog info value on the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo would simply set a variable to be used elsewhere.) When written the xlog switch instruction (XLogInsert) would switch to a new xlog, just as if a file had been filled, causing it to be immediately archived. On wal replay, ReadRecord would read the instruction, then react by moving to the next file, as if it had naturally reached EOF. The wal file could be truncated after the log switch record, though I'd want to make sure that didn't cause other problems. That is additional functionality that I would add later when the above all works... That would be initiated through a single function pg_walfile_switch() which would be called from 1) pg_stop_backup() 2) by user command 3) at a specified timeout within archiver (already built in) A shutdown checkpoint would also have the same effect as an XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy away the file. Otherwise, we'd have a problem as to which order to write the messages in at shutdown time. (Not happy about that bit, so suggestions welcome...) I'd suggest this as a backpatch for 8.0.x, when completed. I'll commit to doing this in time for 8.1, possibly sooner. Comments? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > The wal file could be truncated after the log switch record, though I'd > want to make sure that didn't cause other problems. Which it would: that would break WAL file recycling. > That would be initiated through a single function pg_walfile_switch() > which would be called from > 1) pg_stop_backup() > 2) by user command > 3) at a specified timeout within archiver (already built in) I would really, really, like NOT to have a user command for this. (If pg_stop_backup does it, that already provides an out for anyone who thinks they need to invoke it manually.) > A shutdown checkpoint would also have the same effect as an > XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy > away the file. The archiver is stopped before we do the shutdown, no? > I'd suggest this as a backpatch for 8.0.x, when completed. Not a chance --- it's a new feature, not a bug fix, and has substantial risk of breaking things. regards, tom lane
On Mon, 2005-04-18 at 19:21 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > The wal file could be truncated after the log switch record, though I'd > > want to make sure that didn't cause other problems. > > Which it would: that would break WAL file recycling. Yeh, there's just too many references to the file length for comfort. > > That would be initiated through a single function pg_walfile_switch() > > which would be called from > > 1) pg_stop_backup() > > 2) by user command > > 3) at a specified timeout within archiver (already built in) > > I would really, really, like NOT to have a user command for this. > (If pg_stop_backup does it, that already provides an out for anyone > who thinks they need to invoke it manually.) Actually, me too. Never saw the need for the Oracle command myself. > > A shutdown checkpoint would also have the same effect as an > > XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy > > away the file. > > The archiver is stopped before we do the shutdown, no? Currently, the bgwriter issues the Shutdown checkpoint and the archiver is always stopped after the bgwriter has issued the checkpoint and quit. It should be possible to send archiver a signal to attempt any remaining archiving before shutdown. Of course, this behaviour would only be initiated when XLogArchivingActive() is true, since it makes no sense otherwise. > > I'd suggest this as a backpatch for 8.0.x, when completed. > > Not a chance --- it's a new feature, not a bug fix, and has substantial > risk of breaking things. No problem for me personally; I only request it, according to users wishes. Best Regards, Simon Riggs
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > The wal file could be truncated after the log switch record, though I'd > > want to make sure that didn't cause other problems. > > Which it would: that would break WAL file recycling. Good point. I don't see non-full WAL archiving as a problem for the backup or shutdown, but I do see an issue with doing archives every X seconds. If someone sets that really low (and someone will) we could easily fill the disk. However, rather than do it ourselves, maybe we should make it visible to administrators so they know exactly what is happening and can undo it in case they need to recover, something like: archive_command = 'gzip <%p >%f' so the compression is done in a way that is visible to the administrator. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, 19 Apr 2005, Simon Riggs wrote: >>> I'd suggest this as a backpatch for 8.0.x, when completed. >> >> Not a chance --- it's a new feature, not a bug fix, and has substantial >> risk of breaking things. > > No problem for me personally; I only request it, according to users > wishes. Users wish deterministic procedure of online backup. Well, it should be at least clearly documented and explained. > > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > The wal file could be truncated after the log switch record, though I'd > > > want to make sure that didn't cause other problems. > > > > Which it would: that would break WAL file recycling. > > Good point. I don't see non-full WAL archiving as a problem for the > backup or shutdown, but I do see an issue with doing archives every X > seconds. If someone sets that really low (and someone will) we could > easily fill the disk. The disk would only fill if the archiver doesn't keep up with transmitting xlog files to the archive. The archive can fill up if it is not correctly sized, even now. Switching log files every N seconds would at least give a very predictable archive sizing calculation which should actually work against users sizing their archives poorly. > However, rather than do it ourselves, maybe we > should make it visible to administrators so they know exactly what is > happening and can undo it in case they need to recover, something like: > > > archive_command = 'gzip <%p >%f' > > so the compression is done in a way that is visible to the > administrator. As long as we tell them there's more than one way to do it. Many tape drives offer hardware compression, for example, so there would be no gain in doing this twice. Best Regards, Simon Riggs
Simon Riggs wrote: > On Mon, 2005-04-18 at 21:25 -0400, Bruce Momjian wrote: > > Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > The wal file could be truncated after the log switch record, though I'd > > > > want to make sure that didn't cause other problems. > > > > > > Which it would: that would break WAL file recycling. > > > > Good point. I don't see non-full WAL archiving as a problem for the > > backup or shutdown, but I do see an issue with doing archives every X > > seconds. If someone sets that really low (and someone will) we could > > easily fill the disk. > > The disk would only fill if the archiver doesn't keep up with > transmitting xlog files to the archive. The archive can fill up if it is > not correctly sized, even now. Switching log files every N seconds would > at least give a very predictable archive sizing calculation which should > actually work against users sizing their archives poorly. I was thinking of the archiver filling because of lots of almost-empty 16mb files. If you archive every five seconds, it is 11 Gigs/hour, which is not too bad, I guess, but I would bet compression would save space and I/O load too. > > However, rather than do it ourselves, maybe we > > should make it visible to administrators so they know exactly what is > > happening and can undo it in case they need to recover, something like: > > > > > > archive_command = 'gzip <%p >%f' > > > > so the compression is done in a way that is visible to the > > administrator. > > As long as we tell them there's more than one way to do it. Many tape > drives offer hardware compression, for example, so there would be no > gain in doing this twice. Good point. I am thinking 'gzip --fast' would be the best option for copies to another file system. I see about 0.6 seconds to compress a 16mb WAL file here and I get 16x compression. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I was thinking of the archiver filling because of lots of almost-empty > 16mb files. If you archive every five seconds, it is 11 Gigs/hour, > which is not too bad, I guess, but I would bet compression would save > space and I/O load too. If you wanted to archive every few seconds, it would be worth cutting the size of the segment files. At the moment I believe the segment size is a pg_config_manual.h configuration item. Not sure if it would be practical to make it run-time configurable, but in any case doing that would help a lot for people who want short archive cycles. But really, if that is the concern, I'd think you'd want Slony or some other near-real-time replication mechanism. PITR is designed for people for whom some-small-number-of-minutes is close enough. regards, tom lane
On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > The disk would only fill if the archiver doesn't keep up with > > transmitting xlog files to the archive. The archive can fill up if it is > > not correctly sized, even now. Switching log files every N seconds would > > at least give a very predictable archive sizing calculation which should > > actually work against users sizing their archives poorly. > > I was thinking of the archiver filling because of lots of almost-empty > 16mb files. If you archive every five seconds, it is 11 Gigs/hour, > which is not too bad, I guess, but I would bet compression would save > space and I/O load too. I suggested back then that some command to replace an archive could be provided. So some people could use rsync to update the older version of the XLog file to the new state. Non-rsync enabled people could use a temporary file to copy the new file, and then rename to the original XLog name, substituting the older version. And as a third way, maybe we can come up with a sort-of-xdelta that would only update the yet-unused portion of the old xlog file to the new content. (Maybe this could be made to work with tape.) Everyone here said that there was no need for such a thing because it would complicate matters. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Alvaro Herrera wrote: > On Tue, Apr 19, 2005 at 11:05:32AM -0400, Bruce Momjian wrote: > > Simon Riggs wrote: > > > > The disk would only fill if the archiver doesn't keep up with > > > transmitting xlog files to the archive. The archive can fill up if it is > > > not correctly sized, even now. Switching log files every N seconds would > > > at least give a very predictable archive sizing calculation which should > > > actually work against users sizing their archives poorly. > > > > I was thinking of the archiver filling because of lots of almost-empty > > 16mb files. If you archive every five seconds, it is 11 Gigs/hour, > > which is not too bad, I guess, but I would bet compression would save > > space and I/O load too. > > I suggested back then that some command to replace an archive could be > provided. So some people could use rsync to update the older version of > the XLog file to the new state. Non-rsync enabled people could use a > temporary file to copy the new file, and then rename to the original > XLog name, substituting the older version. And as a third way, maybe we > can come up with a sort-of-xdelta that would only update the yet-unused > portion of the old xlog file to the new content. (Maybe this could be > made to work with tape.) > > Everyone here said that there was no need for such a thing because it > would complicate matters. I do think we are going to need to go in that direction. I think the problem is that we didn't have enough time to come up with a clear solution to this problem so we delayed it for 8.1. I agree the idea of overwriting is a nice idea and works for everything but a tape drive, so it has to be optional in some way. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Hi Simon, > Actually, me too. Never saw the need for the Oracle command myself. It actually has. If you want to move your redo logs to a new disk, you create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE; to switch to the new logfile. Then you can remove the "old" one (speaking just of one file for simplification). Waiting on that event could take ages. Strictly speaking, this doesn't concern postgresql (yet). But if, at the future, we support user defined (= changing these parameters while the db is running) redo log locations, sizes and count, we need a function to switch the logfile manually. Which I think the pg_stop_backup() hack is not suitable for.
It is also recommended when creating new standby control files, when Oracle can't automatically expand the data file capacity on a standby like it does with a live database. Nothing like seeing the 'Didn't restore XXXX from sufficiently old backup' message when Oracle is confused (which seems to be most of the time) about what transactions have been applied where. This, of course, doesn't matter for postgresql. Thank the gods.... On Apr 20, 2005, at 3:28 AM, Klaus Naumann wrote: > Hi Simon, > >> Actually, me too. Never saw the need for the Oracle command myself. > > It actually has. If you want to move your redo logs to a new disk, you > create a new redo log file and then issue a ALTER SYSTEM SWITCH > LOGFILE; > to switch to the new logfile. Then you can remove the "old" one > (speaking just of one file for simplification). > Waiting on that event could take ages. > > Strictly speaking, this doesn't concern postgresql (yet). But if, at > the > future, we support user defined (= changing these parameters while the > db is running) redo log locations, sizes and count, we need a function > to switch the logfile manually. Which I think the pg_stop_backup() > hack is not suitable for. > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > > ____________________________ Andrew Rawnsley Chief Technology Officer Investor Analytics, LLC (740) 587-0114 http://www.investoranalytics.com
On Wed, 2005-04-20 at 09:28 +0200, Klaus Naumann wrote: > > > Actually, me too. Never saw the need for the Oracle command myself. > > It actually has. If you want to move your redo logs to a new disk, you > create a new redo log file and then issue a ALTER SYSTEM SWITCH LOGFILE; > to switch to the new logfile. Then you can remove the "old" one > (speaking just of one file for simplification). > Waiting on that event could take ages. > > Strictly speaking, this doesn't concern postgresql (yet). But if, at the > future, we support user defined (= changing these parameters while the > db is running) redo log locations, sizes and count, we need a function > to switch the logfile manually. Which I think the pg_stop_backup() > hack is not suitable for. Thanks Klaus - I never tried that online. We're someway away from functionality for online redo location migration, I agree. Sounds like we'd still be able to do the log switch as part that. Best Regards, Simon Riggs
On Mon, 2005-04-18 at 23:20 +0100, Simon Riggs wrote: > My plan would be to write a special xlog record for xlog switching. This > would be a special processing instruction, rather than a data/redo > instructions. This would be implemented as another xlog info value on > the xlog_redo resource manager function, XLOG_FILE_SWITCH. (xlog_redo > would simply set a variable to be used elsewhere.) > > When written the xlog switch instruction (XLogInsert) would switch to a > new xlog, just as if a file had been filled, causing it to be > immediately archived. This has been mostly implemented and posted to PATCHES, though I have a later patch also. There are some points still to discuss. Setting the pointer seems to work, but there are 3 pointers, each protected by a separate locks. All of those are designed to be taken and held independently. My understanding is that the correct locking order would be: WALInsertLock WALWriteLock info_lck XLogInsert uses info_lck first, but then checks everything again once it acquires WALInsertLock. To switch files, we must ensure that nobody can insert xlrecs with a record pointer higher than the log switch record. This is different from checkpoints, where a checkpoint record can actually occur before records which are logically after it; that must never happen with a log switch else we'd miss them entirely on wal replay. Next, from XLogInsert with WALInsertLock held, we wait to acquire WALWriteLock, since an I/O might be in progress currently. When we have this, we then issue an XLogWrite, during which we update the record pointer, which then is propogated through to info_lck. AFAICS this is the only case of unconditionally acquiring all 3 locks. Do we agree that this is the correct lock sequence, and if it is, do we think that this leaves open the chance of deadlock at any stage? > A shutdown checkpoint would also have the same effect as an > XLOG_FILE_SWITCH instruction, so that the archiver would be able to copy > away the file. Otherwise, we'd have a problem as to which order to write > the messages in at shutdown time. (Not happy about that bit, so > suggestions welcome...) Treating shutdown checkpoint markers as xlog switches is possible but gives problems since archive_command is a SUSET variable. On replay we wouldn't necessarily know whether a shutdown checkpoint was treated as an xlog switch when it was written, so we'd need to attempt to switch and look beyond the checkpoint marker, just in case. That makes me uncomfortable. Hmmm... Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > AFAICS this is the only case of unconditionally acquiring all 3 locks. You just lost me ... I think the above is certainly a bad idea from a concurrency standpoint, and very possibly a deadlock risk. In any case you are thinking about it the wrong way. It is not LogwrtResult you want to advance, it is the Insert variables that define what the current WAL buffer page is. ISTM the correct approach involves having a special case in XLogInsert: just after inserting an end-of-file record, forcibly advance to the next buffer, and set it up to be the first page for the next segment rather than the next segment in sequence. (This is likely best handled as an extra call to AdvanceXLInsertBuffer that invokes some special-case code in AdvanceXLInsertBuffer.) You normally only need the WALInsertLock to do this. After that's complete you can release the insert lock, and then other operations can proceed while you do an XLogFlush to force out the remaining dirty WAL buffers for the old segment. Then you're done. (I think I'd put the XLogFlush in the pg_stop_backup code, not in XLogInsert proper.) regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > Treating shutdown checkpoint markers as xlog switches is possible but > gives problems since archive_command is a SUSET variable. On replay we > wouldn't necessarily know whether a shutdown checkpoint was treated as > an xlog switch when it was written, so we'd need to attempt to switch > and look beyond the checkpoint marker, just in case. That makes me > uncomfortable. [ Forgot to respond to this part... ] I think the only safe way to handle that would be to define a shutdown checkpoint record as being effectively an end-of-file record ALWAYS, whether archiving or not. This would be rather a problem for initdb, which would go through a new XLOG segment for each of its multiple calls to a standalone backend --- on the other hand, it's not real clear why we couldn't fold initdb down to one bootstrap run and one plain standalone backend run, which'd cut that problem down to the point of tolerability. However, this still begs the question of why we are bothering. I disagree with the goal in this particular case anyhow: I do not think it's necessary, safe, nor sane for a shutdown to try to archive the last XLOG segment. Even if we fixed the xlog mechanism to end the file there, I really have a problem with the idea that the archiver should try to start a fresh archiving cycle at shutdown. regards, tom lane
On Wed, 2005-04-20 at 15:59 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Treating shutdown checkpoint markers as xlog switches is possible but > > gives problems since archive_command is a SUSET variable. On replay we > > wouldn't necessarily know whether a shutdown checkpoint was treated as > > an xlog switch when it was written, so we'd need to attempt to switch > > and look beyond the checkpoint marker, just in case. That makes me > > uncomfortable. > > [ Forgot to respond to this part... ] > > I think the only safe way to handle that would be to define a shutdown > checkpoint record as being effectively an end-of-file record ALWAYS, > whether archiving or not. This would be rather a problem for initdb, > which would go through a new XLOG segment for each of its multiple > calls to a standalone backend --- on the other hand, it's not real > clear why we couldn't fold initdb down to one bootstrap run and one > plain standalone backend run, which'd cut that problem down to the > point of tolerability. > > However, this still begs the question of why we are bothering. Thats a big question :-) > I disagree with the goal in this particular case anyhow: I do not > think it's necessary, safe, nor sane for a shutdown to try to archive > the last XLOG segment. Even if we fixed the xlog mechanism to end the > file there, I really have a problem with the idea that the archiver > should try to start a fresh archiving cycle at shutdown. Right now, I'm happy to leave that part anyhow... Best Regards, Simon Riggs
On Wed, 2005-04-20 at 15:51 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > AFAICS this is the only case of unconditionally acquiring all 3 locks. > > You just lost me ... I think the above is certainly a bad idea from a > concurrency standpoint, and very possibly a deadlock risk. 'twas my fear too. > In any case you are thinking about it the wrong way. It is not > LogwrtResult you want to advance, it is the Insert variables that define > what the current WAL buffer page is. Yes OK, so that way I don't need the 3 locks. Good. > ISTM the correct approach involves having a special case in XLogInsert: > just after inserting an end-of-file record, forcibly advance to the next > buffer, and set it up to be the first page for the next segment rather > than the next segment in sequence. (This is likely best handled as an > extra call to AdvanceXLInsertBuffer that invokes some special-case code > in AdvanceXLInsertBuffer.) You normally only need the WALInsertLock to > do this. After that's complete you can release the insert lock, and > then other operations can proceed while you do an XLogFlush to force out > the remaining dirty WAL buffers for the old segment. Then you're done. Good. Thats was roughly what I'm attempting now, just advancing the wrong pointer and struggling/worried by the 3 lock problem. > (I think I'd put the XLogFlush in the pg_stop_backup code, not in > XLogInsert proper.) That seems like the way its done elsewhere. Best Regards, Simon Riggs
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Treating shutdown checkpoint markers as xlog switches is possible but > > gives problems since archive_command is a SUSET variable. On replay we > > wouldn't necessarily know whether a shutdown checkpoint was treated as > > an xlog switch when it was written, so we'd need to attempt to switch > > and look beyond the checkpoint marker, just in case. That makes me > > uncomfortable. > > > However, this still begs the question of why we are bothering. > I disagree with the goal in this particular case anyhow: I do not > think it's necessary, safe, nor sane for a shutdown to try to archive > the last XLOG segment. Even if we fixed the xlog mechanism to end the > file there, I really have a problem with the idea that the archiver > should try to start a fresh archiving cycle at shutdown. Doing the archive at server shutdown eliminates one of the "must document" items, so the system behaves more predictably that it does not. It is not required --- it is a usability issue. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> However, this still begs the question of why we are bothering. >> I disagree with the goal in this particular case anyhow: I do not >> think it's necessary, safe, nor sane for a shutdown to try to archive >> the last XLOG segment. Even if we fixed the xlog mechanism to end the >> file there, I really have a problem with the idea that the archiver >> should try to start a fresh archiving cycle at shutdown. > Doing the archive at server shutdown eliminates one of the "must > document" items, so the system behaves more predictably that it does > not. It is not required --- it is a usability issue. No, it just replaces a documentation issue with a reliability issue. We'd have to consider what to say about the prospect that the archiver is unable to archive that last segment, is kill -9'd by init at some critical point in the process, etc etc. I think it's just a bad idea to promise people that shutting down the postmaster will have any such effect. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> However, this still begs the question of why we are bothering. > >> I disagree with the goal in this particular case anyhow: I do not > >> think it's necessary, safe, nor sane for a shutdown to try to archive > >> the last XLOG segment. Even if we fixed the xlog mechanism to end the > >> file there, I really have a problem with the idea that the archiver > >> should try to start a fresh archiving cycle at shutdown. > > > Doing the archive at server shutdown eliminates one of the "must > > document" items, so the system behaves more predictably that it does > > not. It is not required --- it is a usability issue. > > No, it just replaces a documentation issue with a reliability issue. > We'd have to consider what to say about the prospect that the archiver > is unable to archive that last segment, is kill -9'd by init at some > critical point in the process, etc etc. I think it's just a bad idea > to promise people that shutting down the postmaster will have any such > effect. OK, makes sense. Could we give them a command to archive it before they shut down? That would make sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> OK, makes sense. Could we give them a command to archive it before they > shut down? That would make sense. Not if the idea is to be certain you got everything ... I think what we have to do is document a manual procedure for archiving the last XLOG file. But really my question is "what's the use case for this?" ISTM that on-line backups are what PITR users want, not something involving shutting down the postmaster --- and the changes Simon is already making will be enough to handle those cases. regards, tom lane
Tom Lane wrote: > Bruce Momjian wrote: >> OK, makes sense. Could we give them a command to archive it before they >> shut down? That would make sense. > > Not if the idea is to be certain you got everything ... I think what we > have to do is document a manual procedure for archiving the last XLOG > file. What Bruce would want is a way to "stop new transactions, archive and shutdown", which would do this atomically. Then we could have another shutdown switch for pg_ctl. But yea, a documentation for a manual procedure would be ok, too, just not as user friendly. Best Regards, Michael Paesold
Michael Paesold wrote: > Tom Lane wrote: > > > Bruce Momjian wrote: > >> OK, makes sense. Could we give them a command to archive it before they > >> shut down? That would make sense. > > > > Not if the idea is to be certain you got everything ... I think what we > > have to do is document a manual procedure for archiving the last XLOG > > file. > > What Bruce would want is a way to "stop new transactions, archive and > shutdown", which would do this atomically. Then we could have another > shutdown switch for pg_ctl. Yea, probably a separate switch, or an additional switch to pg_clt would be best, but then we have to add to pg_ctl. > But yea, a documentation for a manual procedure would be ok, too, just not > as user friendly. Right. I just hate the 'do this, do that' instructions to PITR. When they get too long/complex, I get worried. I used to use Informix's ontape, which was a bad user interface because the admin had to be sure it was always running. Anyway, when you control-C'ed the process, it would flush out any partially written wal file and you knew you had everything. I am thinking a special pg_ctl flag, and disabling -W for that so you have to wait for the success message. Of course we then have to document the use of the pg_ctl flag then. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Thu, 2005-04-21 at 08:57 -0400, Bruce Momjian wrote: > Michael Paesold wrote: > > Tom Lane wrote: > > > Bruce Momjian wrote: > > >> OK, makes sense. Could we give them a command to archive it before they > > >> shut down? That would make sense. > > > > > > Not if the idea is to be certain you got everything ... I think what we > > > have to do is document a manual procedure for archiving the last XLOG > > > file. > > > > What Bruce would want is a way to "stop new transactions, archive and > > shutdown", which would do this atomically. Then we could have another > > shutdown switch for pg_ctl. > > Yea, probably a separate switch, or an additional switch to pg_clt would > be best, but then we have to add to pg_ctl. > > > But yea, a documentation for a manual procedure would be ok, too, just not > > as user friendly. > > Right. I just hate the 'do this, do that' instructions to PITR. When > they get too long/complex, I get worried. > I am thinking a special pg_ctl flag, and disabling -W for that so you > have to wait for the success message. Of course we then have to > document the use of the pg_ctl flag then. I'll write the log switch, you decide when/how to invoke it. My head hurts. Best Regards, Simon Riggs