Thread: Using pg_upgrade on log-shipping standby servers
I occasionally get questions about how to run pg_upgrade on log-shipping standby servers. The attached documentation patch outlines how to do it. I don't think we can assume that because pg_upgrade was run on the master and standby that they are binary identical, can we? Technically the user file are identical, but the system catalogs and WAL might be different, hence my suggestion to run rsync before allowing the standby to rejoin the primary. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
Bruce Momjian <bruce@momjian.us> writes: > + While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can > + be upgraded, the server must be in changed to a primary server to allow > + writes, and after the upgrade it cannot be reused as a standby server. > + (Running <command>rsync</> after the upgrade allows reuse.) "in changed"? This sentence makes no sense at all to me. regards, tom lane
On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > + While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can > > + be upgraded, the server must be in changed to a primary server to allow > > + writes, and after the upgrade it cannot be reused as a standby server. > > + (Running <command>rsync</> after the upgrade allows reuse.) > > "in changed"? This sentence makes no sense at all to me. Oops. New wording attached with "in" removed: the server must be changed to a primary server -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >> > + While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can >> > + be upgraded, the server must be in changed to a primary server to allow >> > + writes, and after the upgrade it cannot be reused as a standby server. >> > + (Running <command>rsync</> after the upgrade allows reuse.) >> >> "in changed"? This sentence makes no sense at all to me. > > Oops. New wording attached with "in" removed: > > the server must be changed to a primary server Don't we normally talk about "must be promoted to a primary server", not changed? And wouldn't it be good if it also mentions that another good option is to just pg_upgrade the master and rebuild the standby? (Unless that's already mentioned somewhere else). What's the actual usecase for promoting the slave, upgrading it and then *not* using it, which is what I think this paragraph suggests? And I think the sentence about running rsync is extremely vague - run rsync where and how? What are you actually trying to suggest people do? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jul 10, 2012 at 06:21:35PM +0200, Magnus Hagander wrote: > On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote: > >> Bruce Momjian <bruce@momjian.us> writes: > >> > + While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can > >> > + be upgraded, the server must be in changed to a primary server to allow > >> > + writes, and after the upgrade it cannot be reused as a standby server. > >> > + (Running <command>rsync</> after the upgrade allows reuse.) > >> > >> "in changed"? This sentence makes no sense at all to me. > > > > Oops. New wording attached with "in" removed: > > > > the server must be changed to a primary server > > Don't we normally talk about "must be promoted to a primary server", > not changed? OK, sure, updated patch attached. > And wouldn't it be good if it also mentions that another good option > is to just pg_upgrade the master and rebuild the standby? (Unless > that's already mentioned somewhere else). I assume they already realize they re-create the standbys. > What's the actual usecase for promoting the slave, upgrading it and > then *not* using it, which is what I think this paragraph suggests? Testing maybe? I feel we have just avoided saying what you can and can't do with the standbys and pg_upgrade, so I think we have to state something. If we just want to say "recreate", let's say that. > And I think the sentence about running rsync is extremely vague - run > rsync where and how? What are you actually trying to suggest people > do? Updated docs attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
On Tue, Jul 10, 2012 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 10, 2012 at 06:21:35PM +0200, Magnus Hagander wrote: >> On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote: >> >> Bruce Momjian <bruce@momjian.us> writes: >> >> > + While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can >> >> > + be upgraded, the server must be in changed to a primary server to allow >> >> > + writes, and after the upgrade it cannot be reused as a standby server. >> >> > + (Running <command>rsync</> after the upgrade allows reuse.) >> >> >> >> "in changed"? This sentence makes no sense at all to me. >> > >> > Oops. New wording attached with "in" removed: >> > >> > the server must be changed to a primary server >> >> Don't we normally talk about "must be promoted to a primary server", >> not changed? > > OK, sure, updated patch attached. > >> And wouldn't it be good if it also mentions that another good option >> is to just pg_upgrade the master and rebuild the standby? (Unless >> that's already mentioned somewhere else). > > I assume they already realize they re-create the standbys. > >> What's the actual usecase for promoting the slave, upgrading it and >> then *not* using it, which is what I think this paragraph suggests? > > Testing maybe? I feel we have just avoided saying what you can and > can't do with the standbys and pg_upgrade, so I think we have to state > something. If we just want to say "recreate", let's say that. Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade. Once you've promoted it, it is no longer a standby, and now you can use pg_upgrade. >> And I think the sentence about running rsync is extremely vague - run >> rsync where and how? What are you actually trying to suggest people >> do? > > Updated docs attached. I suggest just removing the rsync part completely. You're basically saying "you ca nset up a new standby after you're done", which is kind of obvious anyway. And if you're going to use rsync fromthe master to make a new standby, there's no point in running pg_upgrade on the new standby in the first place. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jul 10, 2012 at 06:29:24PM +0200, Magnus Hagander wrote: > > Testing maybe? I feel we have just avoided saying what you can and > > can't do with the standbys and pg_upgrade, so I think we have to state > > something. If we just want to say "recreate", let's say that. > > Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade. > > Once you've promoted it, it is no longer a standby, and now you can > use pg_upgrade. > > >> And I think the sentence about running rsync is extremely vague - run > >> rsync where and how? What are you actually trying to suggest people > >> do? > > > > Updated docs attached. > > I suggest just removing the rsync part completely. You're basically > saying "you ca nset up a new standby after you're done", which is kind > of obvious anyway. And if you're going to use rsync fromthe master to > make a new standby, there's no point in running pg_upgrade on the new > standby in the first place. I went the other direction and just said you can't upgrade a standby (as a standby), and to just use rsync --- patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
On Tue, Jul 10, 2012 at 6:59 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 10, 2012 at 06:29:24PM +0200, Magnus Hagander wrote: >> > Testing maybe? I feel we have just avoided saying what you can and >> > can't do with the standbys and pg_upgrade, so I think we have to state >> > something. If we just want to say "recreate", let's say that. >> >> Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade. >> >> Once you've promoted it, it is no longer a standby, and now you can >> use pg_upgrade. >> >> >> And I think the sentence about running rsync is extremely vague - run >> >> rsync where and how? What are you actually trying to suggest people >> >> do? >> > >> > Updated docs attached. >> >> I suggest just removing the rsync part completely. You're basically >> saying "you ca nset up a new standby after you're done", which is kind >> of obvious anyway. And if you're going to use rsync fromthe master to >> make a new standby, there's no point in running pg_upgrade on the new >> standby in the first place. > > I went the other direction and just said you can't upgrade a standby (as > a standby), and to just use rsync --- patch attached. Reads much better now. I'd say "use rsync to rebuild the standbys", but that's more nitpicking :) (And maybe "the simplest way" rather than "the simplest case"? But i'll leave that to someone who has english as their first language) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote: > >> >> rsync where and how? What are you actually trying to suggest people > >> >> do? > >> > > >> > Updated docs attached. > >> > >> I suggest just removing the rsync part completely. You're basically > >> saying "you ca nset up a new standby after you're done", which is kind > >> of obvious anyway. And if you're going to use rsync fromthe master to > >> make a new standby, there's no point in running pg_upgrade on the new > >> standby in the first place. > > > > I went the other direction and just said you can't upgrade a standby (as > > a standby), and to just use rsync --- patch attached. > > Reads much better now. I'd say "use rsync to rebuild the standbys", > but that's more nitpicking :) (And maybe "the simplest way" rather > than "the simplest case"? But i'll leave that to someone who has > english as their first language) Both change made; updated patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
On Tue, Jul 10, 2012 at 7:27 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote: >> >> >> rsync where and how? What are you actually trying to suggest people >> >> >> do? >> >> > >> >> > Updated docs attached. >> >> >> >> I suggest just removing the rsync part completely. You're basically >> >> saying "you ca nset up a new standby after you're done", which is kind >> >> of obvious anyway. And if you're going to use rsync fromthe master to >> >> make a new standby, there's no point in running pg_upgrade on the new >> >> standby in the first place. >> > >> > I went the other direction and just said you can't upgrade a standby (as >> > a standby), and to just use rsync --- patch attached. >> >> Reads much better now. I'd say "use rsync to rebuild the standbys", >> but that's more nitpicking :) (And maybe "the simplest way" rather >> than "the simplest case"? But i'll leave that to someone who has >> english as their first language) > > Both change made; updated patch attached. Looks good to me. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jul 10, 2012 at 09:10:25PM +0200, Magnus Hagander wrote: > On Tue, Jul 10, 2012 at 7:27 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote: > >> >> >> rsync where and how? What are you actually trying to suggest people > >> >> >> do? > >> >> > > >> >> > Updated docs attached. > >> >> > >> >> I suggest just removing the rsync part completely. You're basically > >> >> saying "you ca nset up a new standby after you're done", which is kind > >> >> of obvious anyway. And if you're going to use rsync fromthe master to > >> >> make a new standby, there's no point in running pg_upgrade on the new > >> >> standby in the first place. > >> > > >> > I went the other direction and just said you can't upgrade a standby (as > >> > a standby), and to just use rsync --- patch attached. > >> > >> Reads much better now. I'd say "use rsync to rebuild the standbys", > >> but that's more nitpicking :) (And maybe "the simplest way" rather > >> than "the simplest case"? But i'll leave that to someone who has > >> english as their first language) > > > > Both change made; updated patch attached. > > Looks good to me. OK, applied and backpatched to 9.2. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote: > I don't think we can assume that because pg_upgrade was run on the > master and standby that they are binary identical, can we? Technically > the user file are identical, but the system catalogs and WAL might be > different, hence my suggestion to run rsync before allowing the standby > to rejoin the primary. Do you have plans to change that in the future? If we know that the user data files are identical between primary and replica, it would be nice if we could provide a robust way to avoid copying them. Regards,Jeff Davis
On Mon, Jul 16, 2012 at 5:29 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote: >> I don't think we can assume that because pg_upgrade was run on the >> master and standby that they are binary identical, can we? Technically >> the user file are identical, but the system catalogs and WAL might be >> different, hence my suggestion to run rsync before allowing the standby >> to rejoin the primary. > > Do you have plans to change that in the future? > > If we know that the user data files are identical between primary and > replica, it would be nice if we could provide a robust way to avoid > copying them. How about this alternative that may sound crazy, but would lend itself to some unification in archiving: Could pg_upgrade emit WAL segment(s) to provide continuity of a timeline? So something like: * Take down the writable primary for pg_upgrade * Some WAL is emitted and possibly archived * The old version, when reaching the special pg_upgrade WAL, could exit or report its situation having paused replay (as clearly, it cannot proceed). Unsure. * Start up a new version of postgres on the same cluster at that point, which plays the upgrade-WAL. I see this being pretty mechanically intensive, but right now my hands are completely tied as to achieving total continuity of my archives, costing a base-backup's worth of risk window upon upgrade. -- fdr
On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote: > Could pg_upgrade emit WAL segment(s) to provide continuity of a > timeline? So something like: By "segments" did you mean "records"? > * Take down the writable primary for pg_upgrade > * Some WAL is emitted and possibly archived > * The old version, when reaching the special pg_upgrade WAL, could > exit or report its situation having paused replay (as clearly, it > cannot proceed). Unsure. I don't really understand this step. > * Start up a new version of postgres on the same cluster at that > point, which plays the upgrade-WAL. > > I see this being pretty mechanically intensive, but right now my hands > are completely tied as to achieving total continuity of my archives, > costing a base-backup's worth of risk window upon upgrade. Does "continuity of archives" mean "avoid downtime" or "maintain a single WAL sequence". Regards,Jeff Davis
On Mon, Jul 16, 2012 at 05:29:26PM -0700, Jeff Davis wrote: > On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote: > > I don't think we can assume that because pg_upgrade was run on the > > master and standby that they are binary identical, can we? Technically > > the user file are identical, but the system catalogs and WAL might be > > different, hence my suggestion to run rsync before allowing the standby > > to rejoin the primary. > > Do you have plans to change that in the future? > > If we know that the user data files are identical between primary and > replica, it would be nice if we could provide a robust way to avoid > copying them. Well, rsync --checksum would work, but both systems have to be down for that. You could snapshot the down primary and rsync --checksum that against the standby, but I am not sure how much that helps us. I can't figure out how to make this work better without adding a whole lot more code to pg_upgrade that might need adjustment for every minor release, i.e. pg_upgrade knows nothing about the WAL file format, and I want to keep it that way. However, I have two ideas. First, I don't know _why_ the primary/standby would be any different after pg_upgrade, so I added the documentation mention because I couldn't _guarantee_ they were the same. Actually, if people can test this, we might be able to say this is safe. Second, the user files (large) are certainly identical, it is only the system tables (small) that _might_ be different, so rsync'ing just those would add the guarantee, but I know of no easy way to rsync just the system tables. Does that help? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Jul 17, 2012 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote: >> Could pg_upgrade emit WAL segment(s) to provide continuity of a >> timeline? So something like: > > By "segments" did you mean "records"? Yes. It would be nicer not to have to tie it to the WAL segment file size. >> * Take down the writable primary for pg_upgrade >> * Some WAL is emitted and possibly archived >> * The old version, when reaching the special pg_upgrade WAL, could >> exit or report its situation having paused replay (as clearly, it >> cannot proceed). Unsure. > > I don't really understand this step. "Some WAL is emitted and possibly archived" needs a subject in that fragment: "pg_upgrade somehow (directly, or indirectly) emits and/or archives WAL used to complete binary-upgrade". That means that it should appear in the WAL stream and be subject to archive_command, like any other WAL. The sticky part is what the standby should do when it encounters the special wal-upgrade records. It should probably pause replay to allow some other program to stop the old postgres version and start the new version with the same cluster. >> * Start up a new version of postgres on the same cluster at that >> point, which plays the upgrade-WAL. >> >> I see this being pretty mechanically intensive, but right now my hands >> are completely tied as to achieving total continuity of my archives, >> costing a base-backup's worth of risk window upon upgrade. > > Does "continuity of archives" mean "avoid downtime" or "maintain a > single WAL sequence". The latter. -- fdr
On Tue, Jul 17, 2012 at 04:49:39PM -0700, Daniel Farina wrote: > On Tue, Jul 17, 2012 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote: > >> Could pg_upgrade emit WAL segment(s) to provide continuity of a > >> timeline? So something like: > > > > By "segments" did you mean "records"? > > Yes. It would be nicer not to have to tie it to the WAL segment file size. > > >> * Take down the writable primary for pg_upgrade > >> * Some WAL is emitted and possibly archived > >> * The old version, when reaching the special pg_upgrade WAL, could > >> exit or report its situation having paused replay (as clearly, it > >> cannot proceed). Unsure. > > > > I don't really understand this step. > > "Some WAL is emitted and possibly archived" needs a subject in that fragment: > > "pg_upgrade somehow (directly, or indirectly) emits and/or archives > WAL used to complete binary-upgrade". That means that it should > appear in the WAL stream and be subject to archive_command, like any > other WAL. > > The sticky part is what the standby should do when it encounters the > special wal-upgrade records. It should probably pause replay to allow > some other program to stop the old postgres version and start the new > version with the same cluster. WAL is not guaranteed to be the same between PG major versions, so doing anything with WAL is pretty much a no-go. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Jul 17, 2012 at 9:16 PM, Bruce Momjian <bruce@momjian.us> wrote: > WAL is not guaranteed to be the same between PG major versions, so doing > anything with WAL is pretty much a no-go. I understand that the WAL format changes, sometimes dramatically between versions. What I'm suggesting that the first WAL-record emitted by the binary upgrade process could be entitled "WAL-stream upgrade to 9.4" that would fail to be understood by old versions or possibly understood to mean "stop replay, you won't even understand what's about to be said." At that point, start up new version in the same cluster and have it continue replay from that position on forward, which should all be in the new format that it can understand. It need not understand the old format in that case, but the tricky part is this single record that tells the replayer of the old version to stop while a replayer of the new version somehow will know it is the right place to start. One mechanism could be a WAL file segment boundary: the standby could be told to exit when it finishes recovery of the segment 0000000100001234000055CD, and to start the new version beginning recovery at 0000000100001234000055CF (one higher), and that would be the first WAL emitted by pg_upgrade. In principle the same is possible using the fine-grained record position, such as XXXXX/NN, but may be more complex for not much gain. This also means the database would be stuck in an inconsistent state when it starts, not unlike when recovering from a on-line base backup.And that's totally reasonable: the new version hasto start up presuming that the database cluster makes not enough sense to enter hot standby yet. Yet another mechanism is to not have the Postgres recovery-process apply the WAL, but rather some special purpose program that knows how to count through and apply specially-formatted WAL segments, and then set the resultant cluster to start recovering from the WAL past this span of specially-formatted WAL. The crux is to get some continuity in this stream, and there are many ways to slice it. Otherwise, the continuous archives will have a gap while a new base backup is taken of data that mostly rests unchanged. -- fdr
On Tue, Jul 17, 2012 at 6:02 PM, Bruce Momjian <bruce@momjian.us> wrote: > However, I have two ideas. First, I don't know _why_ the > primary/standby would be any different after pg_upgrade, so I added the > documentation mention because I couldn't _guarantee_ they were the same. > Actually, if people can test this, we might be able to say this is safe. > > Second, the user files (large) are certainly identical, it is only the > system tables (small) that _might_ be different, so rsync'ing just those > would add the guarantee, but I know of no easy way to rsync just the > system tables. I'm scratching my head in confusion here. After pg_upgrade, the master is a completely new cluster. The system catalog contents are completely different, and so are things like the database system identifier and the WAL position - yeah, the latter is approximately the same, but almost doesn't count except in horseshoes. Obviously any attempt to replay WAL from the new cluster on the old cluster is doomed to failure, at least unless we do a bunch more engineering here that hasn't really been thought about yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 18, 2012 at 09:36:51AM -0400, Robert Haas wrote: > On Tue, Jul 17, 2012 at 6:02 PM, Bruce Momjian <bruce@momjian.us> wrote: > > However, I have two ideas. First, I don't know _why_ the > > primary/standby would be any different after pg_upgrade, so I added the > > documentation mention because I couldn't _guarantee_ they were the same. > > Actually, if people can test this, we might be able to say this is safe. > > > > Second, the user files (large) are certainly identical, it is only the > > system tables (small) that _might_ be different, so rsync'ing just those > > would add the guarantee, but I know of no easy way to rsync just the > > system tables. > > I'm scratching my head in confusion here. After pg_upgrade, the > master is a completely new cluster. The system catalog contents are > completely different, and so are things like the database system > identifier and the WAL position - yeah, the latter is approximately > the same, but almost doesn't count except in horseshoes. Obviously > any attempt to replay WAL from the new cluster on the old cluster is > doomed to failure, at least unless we do a bunch more engineering here > that hasn't really been thought about yet. No, the point is they run pg_upgrade on the stopped primary and stopped standbys. Are those the same? I am not really sure. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote: > No, the point is they run pg_upgrade on the stopped primary and stopped > standbys. Are those the same? I am not really sure. Of course not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote: > On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote: > > No, the point is they run pg_upgrade on the stopped primary and stopped > > standbys. Are those the same? I am not really sure. > > Of course not. OK, but why? When the clusters are stopped they are the same, you are running the same initdb on both matchines, and running the same pg_upgrade. What would cause the difference, other than the Database System Identifier, which we can deal with? I don't think we can guarantee they are the same, but what would guarantee they are different? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 19, 2012 at 12:02 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote: >> On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > No, the point is they run pg_upgrade on the stopped primary and stopped >> > standbys. Are those the same? I am not really sure. >> >> Of course not. > > OK, but why? When the clusters are stopped they are the same, you are > running the same initdb on both matchines, and running the same > pg_upgrade. What would cause the difference, other than the Database > System Identifier, which we can deal with? I don't think we can > guarantee they are the same, but what would guarantee they are > different? There isn't any guarantee that they are different. There's just no guarantee that they are the same, which is enough to make this idea a non-starter. In general, it's pretty easy to understand that if you perform the same series of inserts, updates, and deletes on two systems, you might not end up with the exact same binary contents. There are a lot of reasons for this: any concurrent activity whatsoever - even the exact timing of autovacuum - can cause the same tuples can end up in different places in the two systems. Now, admittedly, in the case of pg_upgrade, you're restoring the dump using a single process with absolutely no concurrent activity and even autovacuum disabled, so the chances of ending up with entirely identical binary contents are probably higher than average. But even there you could have checkpoints trigger at slightly different times while restoring the dumps, and of course checkpoints take buffer locks, and so now a HOT prune might happen on one machine but get skipped on the other one because the checkpointer has dropped the lock but not the pin, and now you're hosed. Even if you could control for that particular possibility, there are surely others now and there will be more in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 19, 2012 at 12:43:23PM -0400, Robert Haas wrote: > On Thu, Jul 19, 2012 at 12:02 PM, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote: > >> On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote: > >> > No, the point is they run pg_upgrade on the stopped primary and stopped > >> > standbys. Are those the same? I am not really sure. > >> > >> Of course not. > > > > OK, but why? When the clusters are stopped they are the same, you are > > running the same initdb on both matchines, and running the same > > pg_upgrade. What would cause the difference, other than the Database > > System Identifier, which we can deal with? I don't think we can > > guarantee they are the same, but what would guarantee they are > > different? > > There isn't any guarantee that they are different. There's just no > guarantee that they are the same, which is enough to make this idea a > non-starter. > > In general, it's pretty easy to understand that if you perform the > same series of inserts, updates, and deletes on two systems, you might > not end up with the exact same binary contents. There are a lot of > reasons for this: any concurrent activity whatsoever - even the exact > timing of autovacuum - can cause the same tuples can end up in > different places in the two systems. Now, admittedly, in the case of > pg_upgrade, you're restoring the dump using a single process with > absolutely no concurrent activity and even autovacuum disabled, so the > chances of ending up with entirely identical binary contents are > probably higher than average. But even there you could have > checkpoints trigger at slightly different times while restoring the > dumps, and of course checkpoints take buffer locks, and so now a HOT > prune might happen on one machine but get skipped on the other one > because the checkpointer has dropped the lock but not the pin, and now > you're hosed. > > Even if you could control for that particular possibility, there are > surely others now and there will be more in the future. I think the checkpoint issue is the ideal killer --- thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Jul 17, 2012 at 06:02:40PM -0400, Bruce Momjian wrote: > Second, the user files (large) are certainly identical, it is only the > system tables (small) that _might_ be different, so rsync'ing just those > would add the guarantee, but I know of no easy way to rsync just the > system tables. OK, new idea. I said above I didn't know how to copy just the non-user table files (which are not modified by pg_upgrade), but actually, if you use link mode, the user files are the only files with a hard link count of 2. I could create a script that copied from the master to the slave only those files with a link count of one. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
If you're wanting to automatically do some upgrades wouldn't an easier route be: 1) run pg_upgrade, up to the point where it actually start's copying/linking in old cluster data files, and stop the new postmaster. 2) Take a "base backup" style copy (tar, rsync, $FAVOURITE) of the new cluster (small, since without data files) 3) Have pg_upgrade leave a log of exactly which old cluster data files go where in the new cluster That way, anybody, any script, etc who wants to make a new "standby" from and old one only needs the pg_upgrade base backup (which should be small, no data, just catalog stuff), and the log of which old files to move where. The only pre-condition is that the standby's "old pg" *APPLIED* WAL up to the exact same point as the master's "old pg". In that case the standby's old cluster data files should same enough (maybe hint bits off?) to be used. a. On Fri, Jul 20, 2012 at 12:25 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jul 17, 2012 at 06:02:40PM -0400, Bruce Momjian wrote: >> Second, the user files (large) are certainly identical, it is only the >> system tables (small) that _might_ be different, so rsync'ing just those >> would add the guarantee, but I know of no easy way to rsync just the >> system tables. > > OK, new idea. I said above I didn't know how to copy just the non-user > table files (which are not modified by pg_upgrade), but actually, if you > use link mode, the user files are the only files with a hard link count > of 2. I could create a script that copied from the master to the slave > only those files with a link count of one. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Fri, Jul 20, 2012 at 12:39:12PM -0400, Aidan Van Dyk wrote: > If you're wanting to automatically do some upgrades wouldn't an easier route be: > > 1) run pg_upgrade, up to the point where it actually start's > copying/linking in old cluster data files, and stop the new > postmaster. > 2) Take a "base backup" style copy (tar, rsync, $FAVOURITE) of the new > cluster (small, since without data files) > 3) Have pg_upgrade leave a log of exactly which old cluster data files > go where in the new cluster > > That way, anybody, any script, etc who wants to make a new "standby" > from and old one only needs the pg_upgrade base backup (which should > be small, no data, just catalog stuff), and the log of which old files > to move where. > > The only pre-condition is that the standby's "old pg" *APPLIED* WAL up > to the exact same point as the master's "old pg". In that case the > standby's old cluster data files should same enough (maybe hint bits > off?) to be used. I am not sure what a base backup is buying us here --- base backup is designed to create a backup while the server is running, and it is down at that point. I think what you are suggesting is to make a data dir copy while just the schema is in place. That is possible, but it opens up all kinds of possible failure cases because pg_upgrade operations have to be done in a specific order --- it feels very fragile. I think the commands to run after pg_upgrade --link completes on both primary and standby might be as easy as: cd /u/pg/pgsql.old/datafind . -links 1 -exec cp {} /u/pgsql/data \; Why would we want anything more complicated than this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote: > I think the commands to run after pg_upgrade --link completes on both > primary and standby might be as easy as: > > cd /u/pg/pgsql.old/data > find . -links 1 -exec cp {} /u/pgsql/data \; > > Why would we want anything more complicated than this? In practice these are on different machines, and the way the machines are connected could vary wildly. So an automated solution might be difficult to find.
On Sat, Jul 21, 2012 at 11:24:21AM +0300, Peter Eisentraut wrote: > On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote: > > I think the commands to run after pg_upgrade --link completes on both > > primary and standby might be as easy as: > > > > cd /u/pg/pgsql.old/data > > find . -links 1 -exec cp {} /u/pgsql/data \; > > > > Why would we want anything more complicated than this? > > In practice these are on different machines, and the way the machines > are connected could vary wildly. So an automated solution might be > difficult to find. Yeah, I was thinking of just suggesting scp as a doc example and let users adjust that: cd /u/pg/pgsql.old/data find . -links 1 -exec scp {} postgres@momjian.us:/u/pgsql/data \; -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Sat, Jul 21, 2012 at 8:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Jul 21, 2012 at 11:24:21AM +0300, Peter Eisentraut wrote: >> On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote: >> > I think the commands to run after pg_upgrade --link completes on both >> > primary and standby might be as easy as: >> > >> > cd /u/pg/pgsql.old/data >> > find . -links 1 -exec cp {} /u/pgsql/data \; >> > >> > Why would we want anything more complicated than this? >> >> In practice these are on different machines, and the way the machines >> are connected could vary wildly. So an automated solution might be >> difficult to find. > > Yeah, I was thinking of just suggesting scp as a doc example and let > users adjust that: > > cd /u/pg/pgsql.old/data > find . -links 1 -exec scp {} postgres@momjian.us:/u/pgsql/data \; Relying on the number of hard links seems very fragile. For example, it'll break if you are using copy mode. And it won't work on Windows, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote: > Relying on the number of hard links seems very fragile. For example, > it'll break if you are using copy mode. And it won't work on Windows, > either. pg_upgrade could remember the list of files that the user would need to copy to the remote server (i.e., the list of files pg_upgrade itself copied or linked) and write that to a file.
On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote: > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote: > > Relying on the number of hard links seems very fragile. For example, > > it'll break if you are using copy mode. And it won't work on Windows, > > either. > > pg_upgrade could remember the list of files that the user would need to > copy to the remote server (i.e., the list of files pg_upgrade itself > copied or linked) and write that to a file. Good idea. A list of file names seems best, but wouldn't that list exceed the maximum size of an argument list? How could we pass that list to a command like scp? And how could we pass that list on Windows? Pg_upgrade already creates a script to analyze the cluster, so we could create another script to upgrade a standby. However, the problem with a script is that I have no idea what command people would use to do the copy. I think I could create a list and pass that into a loop so only the command has to be modified, but again, how do we do that on Windows? Can we create a shell function in Windows and pass the file name as an argument? Another problem is that the standby cluster might create _new_ files that don't exist on the master, e.g. WAL files, and those have to be removed. I am not clear how to do that either, except by removing all files with a hard link count of 1, and again, this is difficult on Windows. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 08:30:40AM -0400, Bruce Momjian wrote: > On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote: > > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote: > > > Relying on the number of hard links seems very fragile. For example, > > > it'll break if you are using copy mode. And it won't work on Windows, > > > either. > > > > pg_upgrade could remember the list of files that the user would need to > > copy to the remote server (i.e., the list of files pg_upgrade itself > > copied or linked) and write that to a file. > > Good idea. A list of file names seems best, but wouldn't that list > exceed the maximum size of an argument list? How could we pass that > list to a command like scp? And how could we pass that list on > Windows? > > Pg_upgrade already creates a script to analyze the cluster, so we could > create another script to upgrade a standby. However, the problem with a > script is that I have no idea what command people would use to do the > copy. I think I could create a list and pass that into a loop so only > the command has to be modified, but again, how do we do that on Windows? > Can we create a shell function in Windows and pass the file name as an > argument? > > Another problem is that the standby cluster might create _new_ files > that don't exist on the master, e.g. WAL files, and those have to be > removed. I am not clear how to do that either, except by removing all > files with a hard link count of 1, and again, this is difficult on > Windows. Maybe we just need to require Perl, and store the file names in the generated Perl script. I know we don't normally require Perl for operation (only development), but Perl seems to be the easiest solution here. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On tor, 2012-07-26 at 08:30 -0400, Bruce Momjian wrote: > On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote: > > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote: > > > Relying on the number of hard links seems very fragile. For example, > > > it'll break if you are using copy mode. And it won't work on Windows, > > > either. > > > > pg_upgrade could remember the list of files that the user would need to > > copy to the remote server (i.e., the list of files pg_upgrade itself > > copied or linked) and write that to a file. > > Good idea. A list of file names seems best, but wouldn't that list > exceed the maximum size of an argument list? How could we pass that > list to a command like scp? xargs > Pg_upgrade already creates a script to analyze the cluster, so we could > create another script to upgrade a standby. However, the problem with a > script is that I have no idea what command people would use to do the > copy. Exactly. Perhaps an example wouldn't hurt, but I wouldn't go too far. > I think I could create a list and pass that into a loop so only > the command has to be modified, but again, how do we do that on Windows? > Can we create a shell function in Windows and pass the file name as an > argument? I don't know, but I assume that somewhere in the known universe there is a way on Windows to say, here is a list of files, copy them to that host. > Another problem is that the standby cluster might create _new_ files > that don't exist on the master, e.g. WAL files, and those have to be > removed. I am not clear how to do that either, except by removing all > files with a hard link count of 1, and again, this is difficult on > Windows. Well, then that would call for another list of files.
On Thu, Jul 26, 2012 at 9:59 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> I think I could create a list and pass that into a loop so only >> the command has to be modified, but again, how do we do that on Windows? >> Can we create a shell function in Windows and pass the file name as an >> argument? > > I don't know, but I assume that somewhere in the known universe there is > a way on Windows to say, here is a list of files, copy them to that > host. > >> Another problem is that the standby cluster might create _new_ files >> that don't exist on the master, e.g. WAL files, and those have to be >> removed. I am not clear how to do that either, except by removing all >> files with a hard link count of 1, and again, this is difficult on >> Windows. > > Well, then that would call for another list of files. I cannot escape the feeling that if we go down this route in any form we're going to spend years tracking down data-loss-inducing bugs. The ones we have on the master are bad enough, but doing it on the standby is almost worse because (1) few enough people will use this functionality that we won't get many bug reports even if it's badly broken and (2) people who are affected may not discover it until something bad has already happened on the master. I don't hear anyone thinking very hard about ways that the master could be different from the standby, and without a lot of careful thought on that topic I think this is all kinds of bad news. Just to take one example, how are you going to ensure that the standby has replayed all the WAL that the master generated prior to the upgrade? If the answer is "shut everything down cleanly and hope for the best", color me unimpressed. IMV, pg_upgrade is not yet sufficiently reliable that we should be looking for new projects that seem certain to make it less reliable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/26/2012 09:59 AM, Peter Eisentraut wrote: >> I think I could create a list and pass that into a loop so only >> the command has to be modified, but again, how do we do that on Windows? >> Can we create a shell function in Windows and pass the file name as an >> argument? > I don't know, but I assume that somewhere in the known universe there is > a way on Windows to say, here is a list of files, copy them to that > host. The issue isn't whether or not there is a way known somewhere in the universe to do something, it's whether or not it's reasonable to assume that such a thing is likely to be available. In general we have tried not to assume very much at all about what's available on Windows - not much beyond the simple cmd.exe shell. On Windows I typically use scp to copy things around between machines, but I'd be very wary of creating a Postgres utility that expects it to be present. cheers andrew
On Thu, Jul 26, 2012 at 04:59:46PM +0300, Peter Eisentraut wrote: > On tor, 2012-07-26 at 08:30 -0400, Bruce Momjian wrote: > > On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote: > > > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote: > > > > Relying on the number of hard links seems very fragile. For example, > > > > it'll break if you are using copy mode. And it won't work on Windows, > > > > either. > > > > > > pg_upgrade could remember the list of files that the user would need to > > > copy to the remote server (i.e., the list of files pg_upgrade itself > > > copied or linked) and write that to a file. > > > > Good idea. A list of file names seems best, but wouldn't that list > > exceed the maximum size of an argument list? How could we pass that > > list to a command like scp? > > xargs Yes, good idea. > > Pg_upgrade already creates a script to analyze the cluster, so we could > > create another script to upgrade a standby. However, the problem with a > > script is that I have no idea what command people would use to do the > > copy. > > Exactly. Perhaps an example wouldn't hurt, but I wouldn't go too far. Agreed. > > I think I could create a list and pass that into a loop so only > > the command has to be modified, but again, how do we do that on Windows? > > Can we create a shell function in Windows and pass the file name as an > > argument? > > I don't know, but I assume that somewhere in the known universe there is > a way on Windows to say, here is a list of files, copy them to that > host. No idea. > > Another problem is that the standby cluster might create _new_ files > > that don't exist on the master, e.g. WAL files, and those have to be > > removed. I am not clear how to do that either, except by removing all > > files with a hard link count of 1, and again, this is difficult on > > Windows. > > Well, then that would call for another list of files. Well, not really. If we create a list of all user table/index files, then any file not on the list would be removed on the standby, then all the files in the primary not on the list are copied to the standby. One list is less error-prone. This is easy in Unix shell and Perl, but hard on Windows without Perl. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 10:26:53AM -0400, Robert Haas wrote: > > Well, then that would call for another list of files. > > I cannot escape the feeling that if we go down this route in any form > we're going to spend years tracking down data-loss-inducing bugs. The > ones we have on the master are bad enough, but doing it on the standby > is almost worse because (1) few enough people will use this > functionality that we won't get many bug reports even if it's badly > broken and (2) people who are affected may not discover it until > something bad has already happened on the master. I don't hear anyone > thinking very hard about ways that the master could be different from > the standby, and without a lot of careful thought on that topic I > think this is all kinds of bad news. Just to take one example, how > are you going to ensure that the standby has replayed all the WAL that > the master generated prior to the upgrade? If the answer is "shut > everything down cleanly and hope for the best", color me unimpressed. > > IMV, pg_upgrade is not yet sufficiently reliable that we should be > looking for new projects that seem certain to make it less reliable. The script has to make the primary/standby identical, and guarantee that. That is why one list and removing new standby files is necessary. Are you saying having the primary/standby identical is impossible/unreliable, or that having them the same is insufficent? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 10:40 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Jul 26, 2012 at 10:26:53AM -0400, Robert Haas wrote: >> > Well, then that would call for another list of files. >> >> I cannot escape the feeling that if we go down this route in any form >> we're going to spend years tracking down data-loss-inducing bugs. The >> ones we have on the master are bad enough, but doing it on the standby >> is almost worse because (1) few enough people will use this >> functionality that we won't get many bug reports even if it's badly >> broken and (2) people who are affected may not discover it until >> something bad has already happened on the master. I don't hear anyone >> thinking very hard about ways that the master could be different from >> the standby, and without a lot of careful thought on that topic I >> think this is all kinds of bad news. Just to take one example, how >> are you going to ensure that the standby has replayed all the WAL that >> the master generated prior to the upgrade? If the answer is "shut >> everything down cleanly and hope for the best", color me unimpressed. >> >> IMV, pg_upgrade is not yet sufficiently reliable that we should be >> looking for new projects that seem certain to make it less reliable. > > The script has to make the primary/standby identical, and guarantee > that. That is why one list and removing new standby files is necessary. > Are you saying having the primary/standby identical is > impossible/unreliable, or that having them the same is insufficent? Having them the same is clearly sufficient, but only if ALL the files are the same. You seem to be taking it on faith that the data files backing non-system tables will be the same, or close enough to the same, that we can just not worry about those. I find that when I don't worry about things, they break. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 26, 2012 at 11:03:15AM -0400, Robert Haas wrote: > >> IMV, pg_upgrade is not yet sufficiently reliable that we should be > >> looking for new projects that seem certain to make it less reliable. > > > > The script has to make the primary/standby identical, and guarantee > > that. That is why one list and removing new standby files is necessary. > > Are you saying having the primary/standby identical is > > impossible/unreliable, or that having them the same is insufficient? > > Having them the same is clearly sufficient, but only if ALL the files > are the same. You seem to be taking it on faith that the data files > backing non-system tables will be the same, or close enough to the > same, that we can just not worry about those. I find that when I > don't worry about things, they break. Good point, and an interesting question. The way pg_upgrade works is that it moves the user table/index files to the new cluster at the end, and does not start the server after that. This is why you can always revert to the old cluster in link mode after it completes. If you start the new cluster, you are committed and can't revert in link mode. I guess you could ask if the old cluster operations could have modified these files in some non-standard way, but I can't think of how that could happen. What you would probably want is for the old primary cluster to continue streaming its WAL to the standby cluster _until_ pg_upgrade completes, and then do the pg_upgrade on the old standby, but I can't think of a simple way to do that. What would be really interesting would be to somehow have the old and new clusters both on the standby and have the master stream to the appropriate server, but that seems complicated, and I am not sure if it would help because of the interactions behind-the-scenes in pg_upgrade. I don't see the "don't modify the user files" behavior changing anytime soon, and it is documented, so I feel pretty confident that those files were not modified on the primary or standby cluster, and are hence the same, or at least as "the same" as they were when they were running the older major version of Postgres. Is that sufficient? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote: > I don't see the "don't modify the user files" behavior changing anytime > soon, and it is documented, so I feel pretty confident that those files > were not modified on the primary or standby cluster, and are hence the > same, or at least as "the same" as they were when they were running the > older major version of Postgres. > > Is that sufficient? Well, at the very least, you need to guarantee that the standby is caught up - i.e. that it replayed all the WAL records that were generated on the master before it was shut down for the final time. I don't think that telling the user that they must be sure to do that is sufficient - you need some kind of built-in safeguard that will complain loudly if it's not the case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 26, 2012 at 01:24:19PM -0400, Robert Haas wrote: > On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote: > > I don't see the "don't modify the user files" behavior changing anytime > > soon, and it is documented, so I feel pretty confident that those files > > were not modified on the primary or standby cluster, and are hence the > > same, or at least as "the same" as they were when they were running the > > older major version of Postgres. > > > > Is that sufficient? > > Well, at the very least, you need to guarantee that the standby is > caught up - i.e. that it replayed all the WAL records that were > generated on the master before it was shut down for the final time. I > don't think that telling the user that they must be sure to do that is > sufficient - you need some kind of built-in safeguard that will > complain loudly if it's not the case. Yes, that would be a problem because the WAL records are deleted by pg_upgrade. Does a shutdown of the standby not already replay all WAL logs? We could also just require them to just start the standby in master mode and shut it down. The problem with that is it might run things like autovacuum. I was originally thinking that we would require users to run pg_upgrade on the standby, where you need to first switch into master mode. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, 2012-07-26 at 14:17 -0400, Bruce Momjian wrote: > Yes, that would be a problem because the WAL records are deleted by > pg_upgrade. Does a shutdown of the standby not already replay all WAL > logs? There is no notion of "all WAL logs" because the WAL is infinite. Do you mean "all WAL generated by the master before shutdown" or "all WAL that the standby knows has been generated by the master so far"? Regardless, I don't think the standby attempts to do much after a shutdown is requested. > We could also just require them to just start the standby in > master mode and shut it down. The problem with that is it might run > things like autovacuum. If we had sync rep that waits for application of the WAL, that might be a more robust approach. We could, during shutdown of the master cause the standby to cancel all HS queries, and then change to sync rep and wait for complete catchup. There are a lot of details to work out there, but it might give us a higher confidence that it's doing the right thing. Given two shut-down systems, it should be pretty easy to tell if they have played the same amount of WAL though, right? > I was originally thinking that we would require users to run pg_upgrade > on the standby, where you need to first switch into master mode. That sounds a little strange to me. If the original master has generated WAL that the original standby hasn't seen yet, then this doesn't help because the two systems would be diverged, and you'd need a new base backup anyway. And if they have played exactly the same WAL, what does this accomplish? Regards,Jeff Davis
On Thu, Jul 26, 2012 at 2:26 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> I was originally thinking that we would require users to run pg_upgrade >> on the standby, where you need to first switch into master mode. > > That sounds a little strange to me. If the original master has generated > WAL that the original standby hasn't seen yet, then this doesn't help > because the two systems would be diverged, and you'd need a new base > backup anyway. And if they have played exactly the same WAL, what does > this accomplish? This whole approach of coordinating precise content of a standby cluster to run pg_upgrade and then resynchronizing with a also-pg-upgraded primary at a precise WAL position that does not increment to complete the upgrade does not excite me in the slightest: I feel like it is really asking for problems. I think the WAL position should advance and/or have a timeline change when undergoing upgrade so that the system can more reliably report bookkeeping error, and it'd be ideal if WAL was generated that applied those changes. For example: suppose pg_upgrade emitted full-page-write records in the format of the new postgres version on an unoccupied timeline. One can use PG.next tools to report on the first txid in the pg_upgrade generated WAL and then use standard point in time recovery features to halt replay on a PG.previous version, swap to the new timeline, and then start up PG.next on the new timeline, applying the full page writes to its catalogs before becoming consistent. -- fdr
On Thu, Jul 26, 2012 at 3:01 PM, Daniel Farina <daniel@heroku.com> wrote: > For example: suppose pg_upgrade emitted full-page-write records in the > format of the new postgres version on an unoccupied timeline. One can > use PG.next tools to report on the first txid and by txid I meant WAL position, which mucks it up a bit because that's not an option on old Postgres-es. But the overall point is the same: I think pg_upgrade would be much more safe and usable in the standby setting if it emitted data that depend upon a posterior WAL-position at the point the cluster diverges onto a new timeline that is the new version of Postgres, and it would be nice if that data incremented the WAL position rather than being applied out-of-band. And, it seems that emitting full page image records in the format of the new version is a way to accomplish that. -- fdr
On Thu, Jul 26, 2012 at 02:17:22PM -0400, Bruce Momjian wrote: > > > Is that sufficient? > > > > Well, at the very least, you need to guarantee that the standby is > > caught up - i.e. that it replayed all the WAL records that were > > generated on the master before it was shut down for the final time. I > > don't think that telling the user that they must be sure to do that is > > sufficient - you need some kind of built-in safeguard that will > > complain loudly if it's not the case. > > Yes, that would be a problem because the WAL records are deleted by > pg_upgrade. Does a shutdown of the standby not already replay all WAL > logs? We could also just require them to just start the standby in > master mode and shut it down. The problem with that is it might run > things like autovacuum. > > I was originally thinking that we would require users to run pg_upgrade > on the standby, where you need to first switch into master mode. OK, sorry, I was confused. You _have_ to run pg_upgrade on the standby --- there are many things we don't preserve, and we need pg_upgrade to move those user file to the right place --- a obvious example is tablespace files. Database oids aren't even preserved, so the data directory changes. So, you need change the standby to write mode, run pg_upgrade, then run whatever copy command we design. Is Perl the proper language for that script? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 02:26:16PM -0700, Jeff Davis wrote: > On Thu, 2012-07-26 at 14:17 -0400, Bruce Momjian wrote: > > Yes, that would be a problem because the WAL records are deleted by > > pg_upgrade. Does a shutdown of the standby not already replay all WAL > > logs? > > There is no notion of "all WAL logs" because the WAL is infinite. Do you > mean "all WAL generated by the master before shutdown" or "all WAL that > the standby knows has been generated by the master so far"? > > Regardless, I don't think the standby attempts to do much after a > shutdown is requested. Doesn't matter anymore --- I now know pg_upgrade has to be run on the standby, and it will be in primary mode and do full WAL reading before it starts. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 2:17 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Jul 26, 2012 at 01:24:19PM -0400, Robert Haas wrote: >> On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > I don't see the "don't modify the user files" behavior changing anytime >> > soon, and it is documented, so I feel pretty confident that those files >> > were not modified on the primary or standby cluster, and are hence the >> > same, or at least as "the same" as they were when they were running the >> > older major version of Postgres. >> > >> > Is that sufficient? >> >> Well, at the very least, you need to guarantee that the standby is >> caught up - i.e. that it replayed all the WAL records that were >> generated on the master before it was shut down for the final time. I >> don't think that telling the user that they must be sure to do that is >> sufficient - you need some kind of built-in safeguard that will >> complain loudly if it's not the case. > > Yes, that would be a problem because the WAL records are deleted by > pg_upgrade. Does a shutdown of the standby not already replay all WAL > logs? Not if it's an immediate shutdown, and not if it didn't have them all on disk in the first place. Who is to say it's even caught up? > I was originally thinking that we would require users to run pg_upgrade > on the standby, where you need to first switch into master mode. As Jeff says, that doesn't help anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jul 26, 2012 at 7:24 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Jul 26, 2012 at 02:17:22PM -0400, Bruce Momjian wrote: >> > > Is that sufficient? >> > >> > Well, at the very least, you need to guarantee that the standby is >> > caught up - i.e. that it replayed all the WAL records that were >> > generated on the master before it was shut down for the final time. I >> > don't think that telling the user that they must be sure to do that is >> > sufficient - you need some kind of built-in safeguard that will >> > complain loudly if it's not the case. >> >> Yes, that would be a problem because the WAL records are deleted by >> pg_upgrade. Does a shutdown of the standby not already replay all WAL >> logs? We could also just require them to just start the standby in >> master mode and shut it down. The problem with that is it might run >> things like autovacuum. >> >> I was originally thinking that we would require users to run pg_upgrade >> on the standby, where you need to first switch into master mode. > > OK, sorry, I was confused. You _have_ to run pg_upgrade on the standby > --- there are many things we don't preserve, and we need pg_upgrade to > move those user file to the right place --- a obvious example is > tablespace files. Database oids aren't even preserved, so the data > directory changes. These are reasons why you CANNOT run pg_upgrade on the standby, not why you HAVE to. If you run pg_upgrade on the standby and separately on the master, you will end up with divergence precisely because of those things that aren't preserved. Any approach that calls for pg_upgrade to run on the master and standby separately is broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jul 27, 2012 at 08:29:20AM -0400, Robert Haas wrote: > >> Yes, that would be a problem because the WAL records are deleted by > >> pg_upgrade. Does a shutdown of the standby not already replay all WAL > >> logs? We could also just require them to just start the standby in > >> master mode and shut it down. The problem with that is it might run > >> things like autovacuum. > >> > >> I was originally thinking that we would require users to run pg_upgrade > >> on the standby, where you need to first switch into master mode. > > > > OK, sorry, I was confused. You _have_ to run pg_upgrade on the standby > > --- there are many things we don't preserve, and we need pg_upgrade to > > move those user file to the right place --- a obvious example is > > tablespace files. Database oids aren't even preserved, so the data > > directory changes. > > These are reasons why you CANNOT run pg_upgrade on the standby, not > why you HAVE to. If you run pg_upgrade on the standby and separately > on the master, you will end up with divergence precisely because of > those things that aren't preserved. > > Any approach that calls for pg_upgrade to run on the master and > standby separately is broken. Basically, you have to run pg_upgrade on the standby so the user data files are moved properly, then you would need to run a copy script that would copy over all the non-user files from the master. Are you worried that the standby, by becoming a master, will write to the standby old cluster user data files in a way that is inconsistent from the master? If so, I think this entire idea can't work. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Jul 26, 2012 at 10:36:59AM -0400, Bruce Momjian wrote: > > > Pg_upgrade already creates a script to analyze the cluster, so we could > > > create another script to upgrade a standby. However, the problem with a > > > script is that I have no idea what command people would use to do the > > > copy. > > > > Exactly. Perhaps an example wouldn't hurt, but I wouldn't go too far. > > Agreed. > > > > I think I could create a list and pass that into a loop so only > > > the command has to be modified, but again, how do we do that on Windows? > > > Can we create a shell function in Windows and pass the file name as an > > > argument? > > > > I don't know, but I assume that somewhere in the known universe there is > > a way on Windows to say, here is a list of files, copy them to that > > host. > > No idea. > > > > Another problem is that the standby cluster might create _new_ files > > > that don't exist on the master, e.g. WAL files, and those have to be > > > removed. I am not clear how to do that either, except by removing all > > > files with a hard link count of 1, and again, this is difficult on > > > Windows. > > > > Well, then that would call for another list of files. > > Well, not really. If we create a list of all user table/index files, > then any file not on the list would be removed on the standby, then all > the files in the primary not on the list are copied to the standby. > One list is less error-prone. This is easy in Unix shell and Perl, but > hard on Windows without Perl. There was too much concern about pg_upgrade upgrading a standby server that I am not going to peruse the issue at this time. I did add a TODO in case we ever want to resurrect the idea: Consider a way to run pg_upgrade on standby servers http://archives.postgresql.org/pgsql-hackers/2012-07/msg00453.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +