Thread: Backup history file should be replicated in Streaming Replication?
Hi, In current SR, since a backup history file is not replicated, the standby always starts an archive recovery without a backup history file, and a wrong minRecoveryPoint might be used. This is not a problem for SR itself, but would cause trouble when SR cooperates with Hot Standby. HS begins accepting read-only queries after a recovery reaches minRecoveryPoint (i.e., a database has become consistent). So, a wrong minRecoveryPoint would execute read-only queries on an inconsistent database. A backup history file should be replicated at the beginning of the standby's recovery. This problem should be addressed right now? Or, I should wait until current simple SR patch has been committed? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Fujii Masao wrote: > In current SR, since a backup history file is not replicated, > the standby always starts an archive recovery without a backup > history file, and a wrong minRecoveryPoint might be used. This > is not a problem for SR itself, but would cause trouble when > SR cooperates with Hot Standby. But the backup history file is included in the base backup you start replication from, right? After that, minRecoveryPoint is stored in the control file and advanced as the recovery progresses. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Fujii Masao wrote: >> In current SR, since a backup history file is not replicated, >> the standby always starts an archive recovery without a backup >> history file, and a wrong minRecoveryPoint might be used. This >> is not a problem for SR itself, but would cause trouble when >> SR cooperates with Hot Standby. > > But the backup history file is included in the base backup you start > replication from, right? No. A backup history file is created by pg_stop_backup(). So it's not included in the base backup. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Fujii Masao wrote: > On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Fujii Masao wrote: >>> In current SR, since a backup history file is not replicated, >>> the standby always starts an archive recovery without a backup >>> history file, and a wrong minRecoveryPoint might be used. This >>> is not a problem for SR itself, but would cause trouble when >>> SR cooperates with Hot Standby. >> But the backup history file is included in the base backup you start >> replication from, right? > > No. A backup history file is created by pg_stop_backup(). > So it's not included in the base backup. Ah, I see. Yeah, that needs to be addressed regardless of HS, because you can otherwise start up (= fail over to) the standby too early, before the minimum recovery point has been reached. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Nov 26, 2009 at 5:17 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Yeah, that needs to be addressed regardless of HS, because you can > otherwise start up (= fail over to) the standby too early, before the > minimum recovery point has been reached. Okey, I address that ASAP. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Thu, Nov 26, 2009 at 5:20 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Nov 26, 2009 at 5:17 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Yeah, that needs to be addressed regardless of HS, because you can >> otherwise start up (= fail over to) the standby too early, before the >> minimum recovery point has been reached. > > Okey, I address that ASAP. pg_stop_backup deletes the previous backup history file from pg_xlog. So replication of a backup history file would fail if just one new online-backup is caused after the base-backup for the standby is taken. This is too aggressive deletion policy for Streaming Replication, I think. So I'd like to change pg_stop_backup so as to delete only backup history files of four or more generations ago (four is enough?). Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Fujii Masao wrote: > pg_stop_backup deletes the previous backup history file from pg_xlog. > So replication of a backup history file would fail if just one new > online-backup is caused after the base-backup for the standby is taken. > This is too aggressive deletion policy for Streaming Replication, I think. > > So I'd like to change pg_stop_backup so as to delete only backup > history files of four or more generations ago (four is enough?). This is essentially the same problem we have with WAL files and checkpoints. If the standby falls behind too much, without having on open connection to the master all the time, the master will delete old files that are still needed in the standby. I don't think it's worthwhile to modify pg_stop_backup() like that. We should address the general problem. At the moment, you're fine if you also configure WAL archiving and log file shipping, but it would be nice to have some simpler mechanism to avoid the problem. For example, a GUC in master to retain all log files (including backup history files) for X days. Or some way for to register the standby with the master so that the master knows it's out there, and still needs the logs, even when it's not connected. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Or some way for to register the standby with the master so that > the master knows it's out there, and still needs the logs, even when > it's not connected. That is the real answer, I think. ...Robert
On 18.12.09 17:05 , Robert Haas wrote: > On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Or some way for to register the standby with the master so that >> the master knows it's out there, and still needs the logs, even when >> it's not connected. > > That is the real answer, I think. It'd prefer if the slave could automatically fetch a new base backup if it falls behind too far to catch up with the available logs. That way, old logs don't start piling up on the server if a slave goes offline for a long time. The slave could for example run a configurable shell script in that case, for example. You could then use that to rsync the data directory from the server (after a pg_start_backup() of course). best regards, Florian Pflug
On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote: > On 18.12.09 17:05 , Robert Haas wrote: >> >> On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> Or some way for to register the standby with the master so that >>> the master knows it's out there, and still needs the logs, even when >>> it's not connected. >> >> That is the real answer, I think. > > It'd prefer if the slave could automatically fetch a new base backup if it > falls behind too far to catch up with the available logs. That way, old logs > don't start piling up on the server if a slave goes offline for a long time. > > The slave could for example run a configurable shell script in that case, > for example. You could then use that to rsync the data directory from the > server (after a pg_start_backup() of course). That would be nice to have too, but it's almost certainly much harder to implement. In particular, there's no hard and fast rule for figuring out when you've dropped so far behind that resnapping the whole thing is faster than replaying the WAL bit by bit. And, of course, you'll have to take the standby down if you go that route, whereas trying to catch up the WAL lets it stay up throughout the process. I think (as I did/do with Hot Standby) that the most important thing here is to get to a point where we have a reasonably good feature that is of some use, and commit it. It will probably have some annoying limitations; we can remove those later. I have a feel that what we have right now is going to be non-robust in the face of network breaks, but that is a problem that can be fixed by a future patch. ...Robert
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Robert Haas wrote: > On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote: >> On 18.12.09 17:05 , Robert Haas wrote: >>> On Fri, Dec 18, 2009 at 11:03 AM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> Or some way for to register the standby with the master so that >>>> the master knows it's out there, and still needs the logs, even when >>>> it's not connected. >>> That is the real answer, I think. >> It'd prefer if the slave could automatically fetch a new base backup if it >> falls behind too far to catch up with the available logs. That way, old logs >> don't start piling up on the server if a slave goes offline for a long time. >> >> The slave could for example run a configurable shell script in that case, >> for example. You could then use that to rsync the data directory from the >> server (after a pg_start_backup() of course). > > That would be nice to have too, Yeah, for small databases, it's probably a better tradeoff. The problem with keeping WAL around in the master indefinitely is that you will eventually run out of disk space if the standby disappears for too long. > but it's almost certainly much harder > to implement. In particular, there's no hard and fast rule for > figuring out when you've dropped so far behind that resnapping the > whole thing is faster than replaying the WAL bit by bit. I'd imagine that you take a new base backup only if you have to, ie. the old WAL files the slave needs have already been deleted from the master. > And, of > course, you'll have to take the standby down if you go that route, > whereas trying to catch up the WAL lets it stay up throughout the > process. Good point. > I think (as I did/do with Hot Standby) that the most important thing > here is to get to a point where we have a reasonably good feature that > is of some use, and commit it. It will probably have some annoying > limitations; we can remove those later. I have a feel that what we > have right now is going to be non-robust in the face of network > breaks, but that is a problem that can be fixed by a future patch. Agreed. About a year ago, I was vocal about not relying on the file based shipping, but I don't have a problem with relying on it as an intermediate step, until we add the other options. It's robust as it is, if you set up WAL archiving. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, Le 18 déc. 2009 à 19:21, Heikki Linnakangas a écrit : > On Fri, Dec 18, 2009 at 12:22 PM, Florian Pflug <fgp.phlo.org@gmail.com> wrote: >>> It'd prefer if the slave could automatically fetch a new base backup if it >>> falls behind too far to catch up with the available logs. That way, old logs >>> don't start piling up on the server if a slave goes offline for a long time. Well I did propose to consider a state machine with clear transition for such problems, a while ago, and I think my remarksstill do apply: http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg131511.html Sorry for non archives.postgresql.org link, couldn't find the mail there. > Yeah, for small databases, it's probably a better tradeoff. The problem > with keeping WAL around in the master indefinitely is that you will > eventually run out of disk space if the standby disappears for too long. I'd vote for having a setting on the master for how long you keep WALs. If slave loose sync then comes back, either you stillhave the required WALs and you're back to catchup or you don't and you're back either to base/init dance. Maybe you want to add a control on the slave to require explicit DBA action before getting back to taking a base backup fromthe master, though, as that could be provided from a nightly PITR backup rather than the live server. >> but it's almost certainly much harder >> to implement. In particular, there's no hard and fast rule for >> figuring out when you've dropped so far behind that resnapping the >> whole thing is faster than replaying the WAL bit by bit. > > I'd imagine that you take a new base backup only if you have to, ie. the > old WAL files the slave needs have already been deleted from the master. Well consider a slave can be in one of those states: base, init, setup, catchup, sync. Now what you just said is reducedto saying what transitions you can do without resorting to base backup, and I don't see that many as soon as the lastsync point is no more available on the master. >> I think (as I did/do with Hot Standby) that the most important thing >> here is to get to a point where we have a reasonably good feature that >> is of some use, and commit it. It will probably have some annoying >> limitations; we can remove those later. I have a feel that what we >> have right now is going to be non-robust in the face of network >> breaks, but that is a problem that can be fixed by a future patch. > > Agreed. About a year ago, I was vocal about not relying on the file > based shipping, but I don't have a problem with relying on it as an > intermediate step, until we add the other options. It's robust as it is, > if you set up WAL archiving. I think I'd like to have the feature that a slave never pretends it's in-sync or soon-to-be when clearly it's not. For theasynchronous case, we can live with it. As soon as we're talking synchronous, you really want the master to skip any not-in-syncslave at COMMIT. To be even more clear, a slave that is not in sync is NOT a slave as far as synchronous replicationis concerned. Regards, -- dim
Dimitri Fontaine escribió: > Well I did propose to consider a state machine with clear transition for such problems, a while ago, and I think my remarksstill do apply: > http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg131511.html > > Sorry for non archives.postgresql.org link, couldn't find the mail there. http://archives.postgresql.org/message-id/87fxcxnjwt.fsf%40hi-media-techno.com -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas wrote: > I think (as I did/do with Hot Standby) that the most important thing > here is to get to a point where we have a reasonably good feature that > is of some use, and commit it. It will probably have some annoying > limitations; we can remove those later. I have a feel that what we > have right now is going to be non-robust in the face of network > breaks, but that is a problem that can be fixed by a future patch. > Improving robustness in all the situations where you'd like things to be better for replication is a never ending job. As I understand it, a major issue with this patch right now is how it links to the client libpq. That's the sort of problem that can make this uncomittable. As long as the fundamentals are good, it's important not to get lost in optimizing the end UI here if it's at the expense of getting something you can deploy at all in the process. If Streaming Replication ships with a working core but a horribly complicated setup/failover mechanism, that's infinitely better than not shipping at all because resources were diverted toward making things more robust or easier to setup instead. Also, the pool of authors who can work on tweaking the smaller details here is larger than those capable of working on the fundamental streaming replication code. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I don't think it's worthwhile to modify pg_stop_backup() like that. We > should address the general problem. At the moment, you're fine if you > also configure WAL archiving and log file shipping, but it would be nice > to have some simpler mechanism to avoid the problem. For example, a GUC > in master to retain all log files (including backup history files) for X > days. Or some way for to register the standby with the master so that > the master knows it's out there, and still needs the logs, even when > it's not connected. I propose the new GUC replication_reserved_segments (better name?) which determines the maximum number of WAL files held for the standby. Design: * Only the WAL files which are replication_reserved_segments segments older than the current write segment can be recycled.IOW, we can think that the standby which falls replication_reserved_segments segments behind is always connectedto the primary, and the WAL files needed for the active standby are not recycled. * Disjoin the standby which falls more than replication_reserved_segment segments behind, in order to avoid the disk fullfailure, i.e., the primary server's PANIC error. This would be only possible in asynchronous replication case. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, 2009-11-26 at 17:02 +0900, Fujii Masao wrote: > On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > Fujii Masao wrote: > >> In current SR, since a backup history file is not replicated, > >> the standby always starts an archive recovery without a backup > >> history file, and a wrong minRecoveryPoint might be used. This > >> is not a problem for SR itself, but would cause trouble when > >> SR cooperates with Hot Standby. > > > > But the backup history file is included in the base backup you start > > replication from, right? > > No. A backup history file is created by pg_stop_backup(). > So it's not included in the base backup. The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Copy the contents of the backup history file into the WAL record, just like we do with prepared transactions. That way it will be streamed to the standby without any other code being needed for SR, while we don't need to retest warm standby to check that still works also. (The thread diverges onto a second point and this first point seems to have been a little forgotten) -- Simon Riggs www.2ndQuadrant.com
On Tue, 2009-12-22 at 14:40 +0900, Fujii Masao wrote: > On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > I don't think it's worthwhile to modify pg_stop_backup() like that. We > > should address the general problem. At the moment, you're fine if you > > also configure WAL archiving and log file shipping, but it would be nice > > to have some simpler mechanism to avoid the problem. For example, a GUC > > in master to retain all log files (including backup history files) for X > > days. Or some way for to register the standby with the master so that > > the master knows it's out there, and still needs the logs, even when > > it's not connected. > > I propose the new GUC replication_reserved_segments (better name?) which > determines the maximum number of WAL files held for the standby. > > Design: > > * Only the WAL files which are replication_reserved_segments segments older > than the current write segment can be recycled. IOW, we can think that the > standby which falls replication_reserved_segments segments behind is always > connected to the primary, and the WAL files needed for the active standby > are not recycled. (I don't fully understand your words above, sorry.) Possibly an easier way would be to have a size limit, not a number of segments. Something like max_reserved_wal = 240GB. We made that change to shared_buffers a few years back and it was really useful. The problem I foresee is that doing it this way puts an upper limit on the size of database we can replicate. While we do base backup and transfer it we must store WAL somewhere. Forcing us to store the WAL on the master while this happens could be very limiting. > * Disjoin the standby which falls more than replication_reserved_segment > segments behind, in order to avoid the disk full failure, i.e., the > primary server's PANIC error. This would be only possible in asynchronous > replication case. Or at the start of replication. -- Simon Riggs www.2ndQuadrant.com
On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The backup history file is a slightly bit quirky way of doing things and > was designed when the transfer mechanism was file-based. > > Why don't we just write a new xlog record that contains the information > we need? Copy the contents of the backup history file into the WAL > record, just like we do with prepared transactions. That way it will be > streamed to the standby without any other code being needed for SR, > while we don't need to retest warm standby to check that still works > also. This means that we can replace a backup history file with the corresponding xlog record. I think that we should simplify the code by making the replacement completely rather than just adding new xlog record. Thought? BTW, in current SR code, the capability to replicate a backup history file has been implemented. But if there is better and simpler idea, I'll adopt it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2009-12-23 at 03:28 +0900, Fujii Masao wrote: > On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > The backup history file is a slightly bit quirky way of doing things and > > was designed when the transfer mechanism was file-based. > > > > Why don't we just write a new xlog record that contains the information > > we need? Copy the contents of the backup history file into the WAL > > record, just like we do with prepared transactions. That way it will be > > streamed to the standby without any other code being needed for SR, > > while we don't need to retest warm standby to check that still works > > also. > > This means that we can replace a backup history file with the corresponding > xlog record. I think that we should simplify the code by making the replacement > completely rather than just adding new xlog record. Thought? We can't do that because it would stop file-based archiving from working. I don't think we should deprecate that for another release at least. -- Simon Riggs www.2ndQuadrant.com
On Wed, Dec 23, 2009 at 2:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > (I don't fully understand your words above, sorry.) NM ;-) > Possibly an easier way would be to have a size limit, not a number of > segments. Something like max_reserved_wal = 240GB. We made that change > to shared_buffers a few years back and it was really useful. For me, a size limit is intuitive because the checkpoint_segments which is closely connected with the new parameter still indicates the number of segments. But if more people like a size limit, I'll make that change. > The problem I foresee is that doing it this way puts an upper limit on > the size of database we can replicate. While we do base backup and > transfer it we must store WAL somewhere. Forcing us to store the WAL on > the master while this happens could be very limiting. Yes. If the size of pg_xlog is relatively small to the size of database, the primary would not be able to hold all the WAL files required for the standby, so we would need to use the restore_command which retrieves the WAL files from the master's archival area. I'm OK that such extra operation is required in that special case, now. >> * Disjoin the standby which falls more than replication_reserved_segment >> segments behind, in order to avoid the disk full failure, i.e., the >> primary server's PANIC error. This would be only possible in asynchronous >> replication case. > > Or at the start of replication. Yes. I think that avoidance of the primary's PANIC error should be given priority over a smooth start of replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Simon Riggs <simon@2ndQuadrant.com> writes: > The backup history file is a slightly bit quirky way of doing things and > was designed when the transfer mechanism was file-based. > Why don't we just write a new xlog record that contains the information > we need? Certainly not. The history file is, in the restore-from-archive case, needed to *find* the xlog data. However, it's not clear to me why SR should have any need for it. It's not doing restore from archive. regards, tom lane
On Wed, Dec 23, 2009 at 3:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> This means that we can replace a backup history file with the corresponding >> xlog record. I think that we should simplify the code by making the replacement >> completely rather than just adding new xlog record. Thought? > > We can't do that because it would stop file-based archiving from > working. I don't think we should deprecate that for another release at > least. Umm... ISTM that we can do that even if file-base archiving case; * pg_stop_backup writes the xlog record corresponding to a backup history file, and requests the WAL file switch. * In PITR or warm-standby, the startup process just marks database as inconsistent until it has read that xlog record. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, 2009-12-22 at 14:02 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > The backup history file is a slightly bit quirky way of doing things and > > was designed when the transfer mechanism was file-based. > > > Why don't we just write a new xlog record that contains the information > > we need? > > Certainly not. The history file is, in the restore-from-archive case, > needed to *find* the xlog data. > > However, it's not clear to me why SR should have any need for it. > It's not doing restore from archive. Definitely should not make this *just* in WAL, files still required, agreed. It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. -- Simon Riggs www.2ndQuadrant.com
On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > It's needed to find the place where the backup stopped, so it defines > the safe stopping point. We could easily pass that info via WAL, when > streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: > On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > It's needed to find the place where the backup stopped, so it defines > > the safe stopping point. We could easily pass that info via WAL, when > > streaming. It doesn't actually matter until we try to failover. > > Right. And, it's also needed to cooperate with HS which begins accepting > read-only queries after a recovery reaches that safe stopping point. Agreed, hence my interest! -- Simon Riggs www.2ndQuadrant.com
Fujii Masao <masao.fujii@gmail.com> writes: > On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> It's needed to find the place where the backup stopped, so it defines >> the safe stopping point. We could easily pass that info via WAL, when >> streaming. It doesn't actually matter until we try to failover. > Right. And, it's also needed to cooperate with HS which begins accepting > read-only queries after a recovery reaches that safe stopping point. OK, so the information needed in the WAL record doesn't even include most of what is in the history file, correct? What you're actually talking about is identifying a WAL position. Maybe you'd want to provide the backup label for identification purposes, but not much else. In that case I concur that this is a better solution than hacking up something to pass over the history file. regards, tom lane
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: >> On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> It's needed to find the place where the backup stopped, so it defines >>> the safe stopping point. We could easily pass that info via WAL, when >>> streaming. It doesn't actually matter until we try to failover. >> Right. And, it's also needed to cooperate with HS which begins accepting >> read-only queries after a recovery reaches that safe stopping point. > > Agreed, hence my interest! Yeah, that's a great idea. I was just having a chat with Magnus this morning, and he asked if the current patch already provides or if it would be possible to write a stand-alone utility to connect to a master and stream WAL files to an archive directory, without setting up a full-blown standby instance. We came to the conclusion that backup history files wouldn't be copied as the patch stands, because the standby has to specifically request them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> The backup history file is a slightly bit quirky way of doing things and >> was designed when the transfer mechanism was file-based. > >> Why don't we just write a new xlog record that contains the information >> we need? > > Certainly not. The history file is, in the restore-from-archive case, > needed to *find* the xlog data. Hmm, not really. The backup_label file tells where the checkpoint record is. And that is still needed. AFAICS the backup history file is only needed to determine the point where the backup was completed, ie. the minimum safe stopping point for WAL replay. I think we could get away without the backup history file altogether. It's kind of nice to have them in the archive directory, for the DBA, to easily see the locations where base backups were taken. But if we write the backup stop location in WAL, the system doesn't really need the history file for anything. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I think we could get away without the backup history file altogether. Hmmm ... actually I was confusing these with timeline history files, which are definitely not something we can drop. You might be right that the backup history file could be part of WAL instead. On the other hand, it's quite comforting that the history file is plain ASCII and can be examined without any special tools. I'm -1 for removing it, even if we decide to duplicate the info in a WAL record. regards, tom lane
* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [091222 15:47]: > I was just having a chat with Magnus this morning, and he asked if the > current patch already provides or if it would be possible to write a > stand-alone utility to connect to a master and stream WAL files to an > archive directory, without setting up a full-blown standby instance. We > came to the conclusion that backup history files wouldn't be copied as > the patch stands, because the standby has to specifically request them. Please, please, please... I've been watching the SR from the sidelines, basically because I want my WAL fsync'ed on 2 physically separate machines before the client's COMMIt returns... And no, I'm not really interested in trying to do raid over NBD or DRBD, and deal with the problems and pitfals that entails... Being able to write a utility that connects as an "SR" client, but just synchronously writes WAL into an archive directory setup is exactly what I want... And once SR has settled, it's something I'm interested in working on... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Tue, 2009-12-22 at 22:46 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: > >> On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >>> It's needed to find the place where the backup stopped, so it defines > >>> the safe stopping point. We could easily pass that info via WAL, when > >>> streaming. It doesn't actually matter until we try to failover. > >> Right. And, it's also needed to cooperate with HS which begins accepting > >> read-only queries after a recovery reaches that safe stopping point. > > > > Agreed, hence my interest! > > Yeah, that's a great idea. > > I was just having a chat with Magnus this morning, and he asked if the > current patch already provides or if it would be possible to write a > stand-alone utility to connect to a master and stream WAL files to an > archive directory, without setting up a full-blown standby instance. We > came to the conclusion that backup history files wouldn't be copied as > the patch stands, because the standby has to specifically request them. There isn't any need to write that utility. Read my post about 2-phase backup and you'll see we are a couple of lines of code away from that. -- Simon Riggs www.2ndQuadrant.com
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> I think we could get away without the backup history file altogether. > > Hmmm ... actually I was confusing these with timeline history files, > which are definitely not something we can drop. You might be right > that the backup history file could be part of WAL instead. On the other > hand, it's quite comforting that the history file is plain ASCII and can > be examined without any special tools. I'm -1 for removing it, even > if we decide to duplicate the info in a WAL record. Ok. How about writing the history file in pg_stop_backup() for informational purposes only. Ie. never read it, but rely on the WAL records instead. I just realized that the current history file fails to recognize this scenario: 1. pg_start_backup() 2. cp -a $PGDATA data-backup 3. create data-backup/recovery.conf 4. postmaster -D data-backup That is, starting postmaster on a data directory, without ever calling pg_stop_backup(). Because pg_stop_backup() was not called, the history file is not there, and recovery won't complain about not reaching the safe starting point. That is of course a case of "don't do that!", but perhaps we should refuse to start up if the backup history file is not found? At least in the WAL-based approach, I think we should refuse to start up if we don't see the pg_stop_backup WAL record. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Dec 23, 2009 at 7:50 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Ok. How about writing the history file in pg_stop_backup() for > informational purposes only. Ie. never read it, but rely on the WAL > records instead. Sounds good. I'll make such change as a self-contained patch. > I just realized that the current history file fails to recognize this > scenario: > > 1. pg_start_backup() > 2. cp -a $PGDATA data-backup > 3. create data-backup/recovery.conf > 4. postmaster -D data-backup > > That is, starting postmaster on a data directory, without ever calling > pg_stop_backup(). Because pg_stop_backup() was not called, the history > file is not there, and recovery won't complain about not reaching the > safe starting point. > > That is of course a case of "don't do that!", but perhaps we should > refuse to start up if the backup history file is not found? At least in > the WAL-based approach, I think we should refuse to start up if we don't > see the pg_stop_backup WAL record. Agreed. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2009-12-23 at 12:50 +0200, Heikki Linnakangas wrote: > I just realized that the current history file fails to recognize this > scenario: > > 1. pg_start_backup() > 2. cp -a $PGDATA data-backup > 3. create data-backup/recovery.conf > 4. postmaster -D data-backup > > That is, starting postmaster on a data directory, without ever calling > pg_stop_backup(). Because pg_stop_backup() was not called, the history > file is not there, and recovery won't complain about not reaching the > safe starting point. > > That is of course a case of "don't do that!", but perhaps we should > refuse to start up if the backup history file is not found? At least in > the WAL-based approach, I think we should refuse to start up if we don't > see the pg_stop_backup WAL record. The code has always been capable of starting without this, which was considered a feature to be able start from a hot copy. I would like to do as you suggest, but it would remove the feature. This would be a great example of why I don't want too many ways to start HS. -- Simon Riggs www.2ndQuadrant.com
On Thu, Dec 24, 2009 at 1:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Dec 23, 2009 at 7:50 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Ok. How about writing the history file in pg_stop_backup() for >> informational purposes only. Ie. never read it, but rely on the WAL >> records instead. > > Sounds good. I'll make such change as a self-contained patch. Done. Please see the attached patch. Design: * pg_stop_backup writes the backup-end xlog record which contains the backup starting point. * In archive recovery, the startup process doesn't mark the database as consistent until it has read the backup-end record. * A backup history file is still created as in the past, but is never used. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Thu, Dec 24, 2009 at 5:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The code has always been capable of starting without this, which was > considered a feature to be able start from a hot copy. I would like to > do as you suggest, but it would remove the feature. This would be a > great example of why I don't want too many ways to start HS. Please tell me what is "a hot copy". You mean standalone hot backup? http://developer.postgresql.org/pgdocs/postgres/continuous-archiving.html#BACKUP-STANDALONE Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Dec 24, 2009 at 5:38 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Dec 24, 2009 at 1:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Dec 23, 2009 at 7:50 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Ok. How about writing the history file in pg_stop_backup() for >>> informational purposes only. Ie. never read it, but rely on the WAL >>> records instead. >> >> Sounds good. I'll make such change as a self-contained patch. > > Done. Please see the attached patch. I included this and the following patches in my 'replication' branch. Also I got rid of the capability to replicate a backup history file, which made the SR code simple. http://archives.postgresql.org/pgsql-hackers/2009-12/msg01931.php Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Fujii Masao wrote: > On Thu, Dec 24, 2009 at 1:39 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Dec 23, 2009 at 7:50 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Ok. How about writing the history file in pg_stop_backup() for >>> informational purposes only. Ie. never read it, but rely on the WAL >>> records instead. >> Sounds good. I'll make such change as a self-contained patch. > > Done. Please see the attached patch. > > Design: > > * pg_stop_backup writes the backup-end xlog record which contains > the backup starting point. > > * In archive recovery, the startup process doesn't mark the database > as consistent until it has read the backup-end record. > > * A backup history file is still created as in the past, but is never > used. As the patch stands, reachedBackupEnd is never set to true if starting from a restore point after the end-of-backup. We'll need to store the information that we've reached end-of-backup somewhere on disk. Here's is modified patch that adds a new backupStartPoint field to pg_control for that + some other minor editorialization. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 783725c..63884eb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -515,8 +515,7 @@ static void xlog_outrec(StringInfo buf, XLogRecord *record); #endif static void issue_xlog_fsync(void); static void pg_start_backup_callback(int code, Datum arg); -static bool read_backup_label(XLogRecPtr *checkPointLoc, - XLogRecPtr *minRecoveryLoc); +static bool read_backup_label(XLogRecPtr *checkPointLoc); static void rm_redo_error_callback(void *arg); static int get_sync_bit(int method); @@ -5355,7 +5354,6 @@ StartupXLOG(void) bool haveBackupLabel = false; XLogRecPtr RecPtr, checkPointLoc, - backupStopLoc, EndOfLog; uint32 endLogId; uint32 endLogSeg; @@ -5454,7 +5452,7 @@ StartupXLOG(void) recoveryTargetTLI, ControlFile->checkPointCopy.ThisTimeLineID))); - if (read_backup_label(&checkPointLoc, &backupStopLoc)) + if (read_backup_label(&checkPointLoc)) { /* * When a backup_label file is present, we want to roll forward from @@ -5597,11 +5595,23 @@ StartupXLOG(void) ControlFile->prevCheckPoint = ControlFile->checkPoint; ControlFile->checkPoint = checkPointLoc; ControlFile->checkPointCopy = checkPoint; - if (backupStopLoc.xlogid != 0 || backupStopLoc.xrecoff != 0) + if (InArchiveRecovery) + { + /* initialize minRecoveryPoint if not set yet */ + if (XLByteLT(ControlFile->minRecoveryPoint, checkPoint.redo)) + ControlFile->minRecoveryPoint = checkPoint.redo; + } + else { - if (XLByteLT(ControlFile->minRecoveryPoint, backupStopLoc)) - ControlFile->minRecoveryPoint = backupStopLoc; + XLogRecPtr InvalidXLogRecPtr = {0, 0}; + ControlFile->minRecoveryPoint = InvalidXLogRecPtr; } + /* + * set backupStartupPoint if we're starting archive recovery from a + * base backup. + */ + if (haveBackupLabel) + ControlFile->backupStartPoint = checkPoint.redo; ControlFile->time = (pg_time_t) time(NULL); /* No need to hold ControlFileLock yet, we aren't up far enough */ UpdateControlFile(); @@ -5703,15 +5713,9 @@ StartupXLOG(void) InRedo = true; - if (minRecoveryPoint.xlogid == 0 && minRecoveryPoint.xrecoff == 0) - ereport(LOG, - (errmsg("redo starts at %X/%X", - ReadRecPtr.xlogid, ReadRecPtr.xrecoff))); - else - ereport(LOG, - (errmsg("redo starts at %X/%X, consistency will be reached at %X/%X", - ReadRecPtr.xlogid, ReadRecPtr.xrecoff, - minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff))); + ereport(LOG, + (errmsg("redo starts at %X/%X", + ReadRecPtr.xlogid, ReadRecPtr.xrecoff))); /* * Let postmaster know we've started redo now, so that it can @@ -5771,7 +5775,8 @@ StartupXLOG(void) * Have we passed our safe starting point? */ if (!reachedMinRecoveryPoint && - XLByteLE(minRecoveryPoint, EndRecPtr)) + XLByteLE(minRecoveryPoint, EndRecPtr) && + XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) { reachedMinRecoveryPoint = true; ereport(LOG, @@ -5877,7 +5882,9 @@ StartupXLOG(void) * be further ahead --- ControlFile->minRecoveryPoint cannot have been * advanced beyond the WAL we processed. */ - if (InRecovery && XLByteLT(EndOfLog, minRecoveryPoint)) + if (InArchiveRecovery && + (XLByteLT(EndOfLog, minRecoveryPoint) || + !XLogRecPtrIsInvalid(ControlFile->backupStartPoint))) { if (reachedStopPoint) /* stopped because of stop request */ ereport(FATAL, @@ -7310,6 +7317,32 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record) { /* nothing to do here */ } + else if (info == XLOG_BACKUP_END) + { + XLogRecPtr startpoint; + memcpy(&startpoint, XLogRecGetData(record), sizeof(startpoint)); + + if (XLByteEQ(ControlFile->backupStartPoint, startpoint)) + { + /* + * We have reached the end of base backup, the point where + * pg_stop_backup() was done. The data on disk is now consistent. + * Reset backupStartPoint, and update minRecoveryPoint to make + * sure we don't allow starting up at an earlier point even if + * recovery is stopped and restarted soon after this. + */ + elog(DEBUG1, "end of backup reached"); + + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + + if (XLByteLT(ControlFile->minRecoveryPoint, lsn)) + ControlFile->minRecoveryPoint = lsn; + MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr)); + UpdateControlFile(); + + LWLockRelease(ControlFileLock); + } + } } void @@ -7351,6 +7384,14 @@ xlog_desc(StringInfo buf, uint8 xl_info, char *rec) { appendStringInfo(buf, "xlog switch"); } + else if (info == XLOG_BACKUP_END) + { + XLogRecPtr startpoint; + + memcpy(&startpoint, rec, sizeof(XLogRecPtr)); + appendStringInfo(buf, "backup end: %X/%X", + startpoint.xlogid, startpoint.xrecoff); + } else appendStringInfo(buf, "UNKNOWN"); } @@ -7686,10 +7727,14 @@ pg_start_backup_callback(int code, Datum arg) /* * pg_stop_backup: finish taking an on-line backup dump * - * We remove the backup label file created by pg_start_backup, and instead - * create a backup history file in pg_xlog (whence it will immediately be - * archived). The backup history file contains the same info found in - * the label file, plus the backup-end time and WAL location. + * We write an end-of-backup WAL record, and remove the backup label file + * created by pg_start_backup, creating a backup history file in pg_xlog + * instead (whence it will immediately be archived). The backup history file + * contains the same info found in the label file, plus the backup-end time + * and WAL location. Before 8.5, the backup-end time was read from the backup + * history file at the beginning of archive recovery, but we now use the WAL + * record for that and the file is for informational and debug purposes only. + * * Note: different from CancelBackup which just cancels online backup mode. */ Datum @@ -7697,6 +7742,7 @@ pg_stop_backup(PG_FUNCTION_ARGS) { XLogRecPtr startpoint; XLogRecPtr stoppoint; + XLogRecData rdata; pg_time_t stamp_time; char strfbuf[128]; char histfilepath[MAXPGPATH]; @@ -7738,22 +7784,6 @@ pg_stop_backup(PG_FUNCTION_ARGS) LWLockRelease(WALInsertLock); /* - * Force a switch to a new xlog segment file, so that the backup is valid - * as soon as archiver moves out the current segment file. We'll report - * the end address of the XLOG SWITCH record as the backup stopping point. - */ - stoppoint = RequestXLogSwitch(); - - XLByteToSeg(stoppoint, _logId, _logSeg); - XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg); - - /* Use the log timezone here, not the session timezone */ - stamp_time = (pg_time_t) time(NULL); - pg_strftime(strfbuf, sizeof(strfbuf), - "%Y-%m-%d %H:%M:%S %Z", - pg_localtime(&stamp_time, log_timezone)); - - /* * Open the existing label file */ lfp = AllocateFile(BACKUP_LABEL_FILE, "r"); @@ -7781,6 +7811,30 @@ pg_stop_backup(PG_FUNCTION_ARGS) errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); /* + * Write the backup-end xlog record + */ + rdata.data = (char *) (&startpoint); + rdata.len = sizeof(startpoint); + rdata.buffer = InvalidBuffer; + rdata.next = NULL; + stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END, &rdata); + + /* + * Force a switch to a new xlog segment file, so that the backup is valid + * as soon as archiver moves out the current segment file. + */ + RequestXLogSwitch(); + + XLByteToSeg(stoppoint, _logId, _logSeg); + XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg); + + /* Use the log timezone here, not the session timezone */ + stamp_time = (pg_time_t) time(NULL); + pg_strftime(strfbuf, sizeof(strfbuf), + "%Y-%m-%d %H:%M:%S %Z", + pg_localtime(&stamp_time, log_timezone)); + + /* * Write the backup history file */ XLByteToSeg(startpoint, _logId, _logSeg); @@ -8086,33 +8140,18 @@ pg_xlogfile_name(PG_FUNCTION_ARGS) * later than the start of the dump, and so if we rely on it as the start * point, we will fail to restore a consistent database state. * - * We also attempt to retrieve the corresponding backup history file. - * If successful, set *minRecoveryLoc to constrain valid PITR stopping - * points. - * * Returns TRUE if a backup_label was found (and fills the checkpoint * location into *checkPointLoc); returns FALSE if not. */ static bool -read_backup_label(XLogRecPtr *checkPointLoc, XLogRecPtr *minRecoveryLoc) +read_backup_label(XLogRecPtr *checkPointLoc) { XLogRecPtr startpoint; - XLogRecPtr stoppoint; - char histfilename[MAXFNAMELEN]; - char histfilepath[MAXPGPATH]; char startxlogfilename[MAXFNAMELEN]; - char stopxlogfilename[MAXFNAMELEN]; TimeLineID tli; - uint32 _logId; - uint32 _logSeg; FILE *lfp; - FILE *fp; char ch; - /* Default is to not constrain recovery stop point */ - minRecoveryLoc->xlogid = 0; - minRecoveryLoc->xrecoff = 0; - /* * See if label file is present */ @@ -8150,45 +8189,6 @@ read_backup_label(XLogRecPtr *checkPointLoc, XLogRecPtr *minRecoveryLoc) errmsg("could not read file \"%s\": %m", BACKUP_LABEL_FILE))); - /* - * Try to retrieve the backup history file (no error if we can't) - */ - XLByteToSeg(startpoint, _logId, _logSeg); - BackupHistoryFileName(histfilename, tli, _logId, _logSeg, - startpoint.xrecoff % XLogSegSize); - - if (InArchiveRecovery) - RestoreArchivedFile(histfilepath, histfilename, "RECOVERYHISTORY", 0); - else - BackupHistoryFilePath(histfilepath, tli, _logId, _logSeg, - startpoint.xrecoff % XLogSegSize); - - fp = AllocateFile(histfilepath, "r"); - if (fp) - { - /* - * Parse history file to identify stop point. - */ - if (fscanf(fp, "START WAL LOCATION: %X/%X (file %24s)%c", - &startpoint.xlogid, &startpoint.xrecoff, startxlogfilename, - &ch) != 4 || ch != '\n') - ereport(FATAL, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("invalid data in file \"%s\"", histfilename))); - if (fscanf(fp, "STOP WAL LOCATION: %X/%X (file %24s)%c", - &stoppoint.xlogid, &stoppoint.xrecoff, stopxlogfilename, - &ch) != 4 || ch != '\n') - ereport(FATAL, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("invalid data in file \"%s\"", histfilename))); - *minRecoveryLoc = stoppoint; - if (ferror(fp) || FreeFile(fp)) - ereport(FATAL, - (errcode_for_file_access(), - errmsg("could not read file \"%s\": %m", - histfilepath))); - } - return true; } diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index 2435f27..31cc537 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -62,6 +62,7 @@ typedef struct CheckPoint #define XLOG_NOOP 0x20 #define XLOG_NEXTOID 0x30 #define XLOG_SWITCH 0x40 +#define XLOG_BACKUP_END 0x50 /* System status indicator */ @@ -117,7 +118,27 @@ typedef struct ControlFileData CheckPoint checkPointCopy; /* copy of last check point record */ - XLogRecPtr minRecoveryPoint; /* must replay xlog to here */ + /* + * These two values determine the minimum point we must recover up to + * before starting up: + * + * minRecoveryPoint is updated to the latest replayed LSN whenever we + * flush a data change during archive recovery. That guards against + * starting archive recovery, aborting it, and restarting with an earlier + * stop location. If we've already flushed to disk data changes from WAL + * record, we mustn't start up until we reach X again. Should be zero + * when we're not doing archive recovery. + * + * backupStartPoint is the redo pointer of the backup start checkpoint, + * if we are recovering from an online backup and haven't reached the end + * of backup yet. It is reset to 0 when the end of backup is reached, and + * we mustn't start up before that. A boolean would suffice otherwise, but + * we use the redo location as a cross-check when we see an end-of-backup + * record, to make sure the end-of-backup record is the counterpart of + * the base backup actually used. + */ + XLogRecPtr minRecoveryPoint; + XLogRecPtr backupStartPoint; /* * This data is used to check for hardware-architecture compatibility of
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Wed, 2009-12-23 at 12:50 +0200, Heikki Linnakangas wrote: > >> I just realized that the current history file fails to recognize this >> scenario: >> >> 1. pg_start_backup() >> 2. cp -a $PGDATA data-backup >> 3. create data-backup/recovery.conf >> 4. postmaster -D data-backup >> >> That is, starting postmaster on a data directory, without ever calling >> pg_stop_backup(). Because pg_stop_backup() was not called, the history >> file is not there, and recovery won't complain about not reaching the >> safe starting point. >> >> That is of course a case of "don't do that!", but perhaps we should >> refuse to start up if the backup history file is not found? At least in >> the WAL-based approach, I think we should refuse to start up if we don't >> see the pg_stop_backup WAL record. > > The code has always been capable of starting without this, which was > considered a feature to be able start from a hot copy. Why is that desirable? The system is in an inconsistent state. To force it, you can always use pg_resetxlog. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2009-12-30 at 15:31 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > The code has always been capable of starting without this, which was > > considered a feature to be able start from a hot copy. > > Why is that desirable? The system is in an inconsistent state. To force > it, you can always use pg_resetxlog. It's not desirable, for me, but its been there since 8.0 and I have no info on whether its used. If you have a workaround that allows us to continue to support it, I suggest you document it and then plug the gap as originally suggested by you. -- Simon Riggs www.2ndQuadrant.com
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote: > Here's is modified patch that adds a new backupStartPoint field to > pg_control for that + some other minor editorialization. I've committed this now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 4, 2010 at 9:55 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Heikki Linnakangas wrote: >> Here's is modified patch that adds a new backupStartPoint field to >> pg_control for that + some other minor editorialization. > > I've committed this now. Thanks a lot! src/backend/access/transam/xlog.c > else > { > XLogRecPtr InvalidXLogRecPtr = {0, 0}; > ControlFile->minRecoveryPoint = InvalidXLogRecPtr; > } In my original patch, the above is for the problem discussed in http://archives.postgresql.org/pgsql-hackers/2009-12/msg02039.php Since you've already fixed the problem, that code is useless. How about getting rid of that code? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao wrote: > On Mon, Jan 4, 2010 at 9:55 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > Heikki Linnakangas wrote: > >> Here's is modified patch that adds a new backupStartPoint field to > >> pg_control for that + some other minor editorialization. > > > > I've committed this now. > > Thanks a lot! > > src/backend/access/transam/xlog.c > > else > > { > > XLogRecPtr InvalidXLogRecPtr = {0, 0}; > > ControlFile->minRecoveryPoint = InvalidXLogRecPtr; > > } > > In my original patch, the above is for the problem discussed in > http://archives.postgresql.org/pgsql-hackers/2009-12/msg02039.php > > Since you've already fixed the problem, that code is useless. > How about getting rid of that code? Has this been addressed? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sun, Feb 7, 2010 at 1:02 AM, Bruce Momjian <bruce@momjian.us> wrote: >> src/backend/access/transam/xlog.c >> > else >> > { >> > XLogRecPtr InvalidXLogRecPtr = {0, 0}; >> > ControlFile->minRecoveryPoint = InvalidXLogRecPtr; >> > } >> >> In my original patch, the above is for the problem discussed in >> http://archives.postgresql.org/pgsql-hackers/2009-12/msg02039.php >> >> Since you've already fixed the problem, that code is useless. >> How about getting rid of that code? > > Has this been addressed? No. We need to obtain the comment about that from Heikki. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Backup history file should be replicated in Streaming Replication?
From
Heikki Linnakangas
Date:
Fujii Masao wrote: > On Sun, Feb 7, 2010 at 1:02 AM, Bruce Momjian <bruce@momjian.us> wrote: >>> src/backend/access/transam/xlog.c >>>> else >>>> { >>>> XLogRecPtr InvalidXLogRecPtr = {0, 0}; >>>> ControlFile->minRecoveryPoint = InvalidXLogRecPtr; >>>> } >>> In my original patch, the above is for the problem discussed in >>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg02039.php >>> >>> Since you've already fixed the problem, that code is useless. >>> How about getting rid of that code? >> Has this been addressed? > > No. We need to obtain the comment about that from Heikki. I removed that. It only makes a difference if you stop archive recovery, remove recovery.conf, and start up again, causing the server to do normal crash recovery. That's a "don't do that" scenario, but it seems better to not clear minRecoveryPoint, even though we don't check it during crash recovery. It might be useful debug information, and also if you then put recovery.conf back, we will enforce that you reach the minRecoveryPoint again. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Feb 8, 2010 at 6:11 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I removed that. Thanks! > It only makes a difference if you stop archive recovery, remove > recovery.conf, and start up again, causing the server to do normal crash > recovery. That's a "don't do that" scenario, but it seems better to not > clear minRecoveryPoint, even though we don't check it during crash > recovery. It might be useful debug information, and also if you then put > recovery.conf back, we will enforce that you reach the minRecoveryPoint > again. This makes sense. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center