Thread: Using pg_upgrade on log-shipping standby servers

Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
I occasionally get questions about how to run pg_upgrade on log-shipping
standby servers.  The attached documentation patch outlines how to do
it.

I don't think we can assume that because pg_upgrade was run on the
master and standby that they are binary identical, can we?  Technically
the user file are identical, but the system catalogs and WAL might be
different, hence my suggestion to run rsync before allowing the standby
to rejoin the primary.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: Using pg_upgrade on log-shipping standby servers

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> +    While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can
> +    be upgraded, the server must be in changed to a primary server to allow
> +    writes, and after the upgrade it cannot be reused as a standby server.
> +    (Running <command>rsync</> after the upgrade allows reuse.)

"in changed"?  This sentence makes no sense at all to me.
        regards, tom lane


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > +    While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can
> > +    be upgraded, the server must be in changed to a primary server to allow
> > +    writes, and after the upgrade it cannot be reused as a standby server.
> > +    (Running <command>rsync</> after the upgrade allows reuse.)
>
> "in changed"?  This sentence makes no sense at all to me.

Oops.  New wording attached with "in" removed:

    the server must be changed to a primary server

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: Using pg_upgrade on log-shipping standby servers

From
Magnus Hagander
Date:
On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>> > +    While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can
>> > +    be upgraded, the server must be in changed to a primary server to allow
>> > +    writes, and after the upgrade it cannot be reused as a standby server.
>> > +    (Running <command>rsync</> after the upgrade allows reuse.)
>>
>> "in changed"?  This sentence makes no sense at all to me.
>
> Oops.  New wording attached with "in" removed:
>
>         the server must be changed to a primary server

Don't we normally talk about "must be promoted to a primary server",
not changed?

And wouldn't it be good if it also mentions that another good option
is to just pg_upgrade the master and rebuild the standby? (Unless
that's already mentioned somewhere else).

What's the actual usecase for promoting the slave, upgrading it and
then *not* using it, which is what I think this paragraph suggests?
And I think the sentence about running rsync is extremely vague - run
rsync where and how? What are you actually trying to suggest people
do?

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 10, 2012 at 06:21:35PM +0200, Magnus Hagander wrote:
> On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote:
> >> Bruce Momjian <bruce@momjian.us> writes:
> >> > +    While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can
> >> > +    be upgraded, the server must be in changed to a primary server to allow
> >> > +    writes, and after the upgrade it cannot be reused as a standby server.
> >> > +    (Running <command>rsync</> after the upgrade allows reuse.)
> >>
> >> "in changed"?  This sentence makes no sense at all to me.
> >
> > Oops.  New wording attached with "in" removed:
> >
> >         the server must be changed to a primary server
>
> Don't we normally talk about "must be promoted to a primary server",
> not changed?

OK, sure, updated patch attached.

> And wouldn't it be good if it also mentions that another good option
> is to just pg_upgrade the master and rebuild the standby? (Unless
> that's already mentioned somewhere else).

I assume they already realize they re-create the standbys.

> What's the actual usecase for promoting the slave, upgrading it and
> then *not* using it, which is what I think this paragraph suggests?

Testing maybe?  I feel we have just avoided saying what you can and
can't do with the standbys and pg_upgrade, so I think we have to state
something.  If we just want to say "recreate", let's say that.

> And I think the sentence about running rsync is extremely vague - run
> rsync where and how? What are you actually trying to suggest people
> do?

Updated docs attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: Using pg_upgrade on log-shipping standby servers

From
Magnus Hagander
Date:
On Tue, Jul 10, 2012 at 6:26 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jul 10, 2012 at 06:21:35PM +0200, Magnus Hagander wrote:
>> On Tue, Jul 10, 2012 at 6:17 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Tue, Jul 10, 2012 at 12:04:50PM -0400, Tom Lane wrote:
>> >> Bruce Momjian <bruce@momjian.us> writes:
>> >> > +    While a Log-Shipping Standby Server (<xref linkend="warm-standby">) can
>> >> > +    be upgraded, the server must be in changed to a primary server to allow
>> >> > +    writes, and after the upgrade it cannot be reused as a standby server.
>> >> > +    (Running <command>rsync</> after the upgrade allows reuse.)
>> >>
>> >> "in changed"?  This sentence makes no sense at all to me.
>> >
>> > Oops.  New wording attached with "in" removed:
>> >
>> >         the server must be changed to a primary server
>>
>> Don't we normally talk about "must be promoted to a primary server",
>> not changed?
>
> OK, sure, updated patch attached.
>
>> And wouldn't it be good if it also mentions that another good option
>> is to just pg_upgrade the master and rebuild the standby? (Unless
>> that's already mentioned somewhere else).
>
> I assume they already realize they re-create the standbys.
>
>> What's the actual usecase for promoting the slave, upgrading it and
>> then *not* using it, which is what I think this paragraph suggests?
>
> Testing maybe?  I feel we have just avoided saying what you can and
> can't do with the standbys and pg_upgrade, so I think we have to state
> something.  If we just want to say "recreate", let's say that.

Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade.

Once you've promoted it, it is no longer a standby, and now you can
use pg_upgrade.

>> And I think the sentence about running rsync is extremely vague - run
>> rsync where and how? What are you actually trying to suggest people
>> do?
>
> Updated docs attached.

I suggest just removing the rsync part completely. You're basically
saying "you ca nset up a new standby after you're done", which is kind
of obvious anyway. And if you're going to use rsync fromthe master to
make a new standby, there's no point in running pg_upgrade on the new
standby in the first place.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 10, 2012 at 06:29:24PM +0200, Magnus Hagander wrote:
> > Testing maybe?  I feel we have just avoided saying what you can and
> > can't do with the standbys and pg_upgrade, so I think we have to state
> > something.  If we just want to say "recreate", let's say that.
>
> Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade.
>
> Once you've promoted it, it is no longer a standby, and now you can
> use pg_upgrade.
>
> >> And I think the sentence about running rsync is extremely vague - run
> >> rsync where and how? What are you actually trying to suggest people
> >> do?
> >
> > Updated docs attached.
>
> I suggest just removing the rsync part completely. You're basically
> saying "you ca nset up a new standby after you're done", which is kind
> of obvious anyway. And if you're going to use rsync fromthe master to
> make a new standby, there's no point in running pg_upgrade on the new
> standby in the first place.

I went the other direction and just said you can't upgrade a standby (as
a standby), and to just use rsync --- patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: Using pg_upgrade on log-shipping standby servers

From
Magnus Hagander
Date:
On Tue, Jul 10, 2012 at 6:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jul 10, 2012 at 06:29:24PM +0200, Magnus Hagander wrote:
>> > Testing maybe?  I feel we have just avoided saying what you can and
>> > can't do with the standbys and pg_upgrade, so I think we have to state
>> > something.  If we just want to say "recreate", let's say that.
>>
>> Well, the bottom line is we can'd do *anything* with a standby with pg_upgrade.
>>
>> Once you've promoted it, it is no longer a standby, and now you can
>> use pg_upgrade.
>>
>> >> And I think the sentence about running rsync is extremely vague - run
>> >> rsync where and how? What are you actually trying to suggest people
>> >> do?
>> >
>> > Updated docs attached.
>>
>> I suggest just removing the rsync part completely. You're basically
>> saying "you ca nset up a new standby after you're done", which is kind
>> of obvious anyway. And if you're going to use rsync fromthe master to
>> make a new standby, there's no point in running pg_upgrade on the new
>> standby in the first place.
>
> I went the other direction and just said you can't upgrade a standby (as
> a standby), and to just use rsync --- patch attached.

Reads much better now. I'd say "use rsync to rebuild the standbys",
but that's more nitpicking :) (And maybe "the simplest way" rather
than "the simplest case"? But i'll leave that to someone who has
english as their first language)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote:
> >> >> rsync where and how? What are you actually trying to suggest people
> >> >> do?
> >> >
> >> > Updated docs attached.
> >>
> >> I suggest just removing the rsync part completely. You're basically
> >> saying "you ca nset up a new standby after you're done", which is kind
> >> of obvious anyway. And if you're going to use rsync fromthe master to
> >> make a new standby, there's no point in running pg_upgrade on the new
> >> standby in the first place.
> >
> > I went the other direction and just said you can't upgrade a standby (as
> > a standby), and to just use rsync --- patch attached.
>
> Reads much better now. I'd say "use rsync to rebuild the standbys",
> but that's more nitpicking :) (And maybe "the simplest way" rather
> than "the simplest case"? But i'll leave that to someone who has
> english as their first language)

Both change made;  updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

Re: Using pg_upgrade on log-shipping standby servers

From
Magnus Hagander
Date:
On Tue, Jul 10, 2012 at 7:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote:
>> >> >> rsync where and how? What are you actually trying to suggest people
>> >> >> do?
>> >> >
>> >> > Updated docs attached.
>> >>
>> >> I suggest just removing the rsync part completely. You're basically
>> >> saying "you ca nset up a new standby after you're done", which is kind
>> >> of obvious anyway. And if you're going to use rsync fromthe master to
>> >> make a new standby, there's no point in running pg_upgrade on the new
>> >> standby in the first place.
>> >
>> > I went the other direction and just said you can't upgrade a standby (as
>> > a standby), and to just use rsync --- patch attached.
>>
>> Reads much better now. I'd say "use rsync to rebuild the standbys",
>> but that's more nitpicking :) (And maybe "the simplest way" rather
>> than "the simplest case"? But i'll leave that to someone who has
>> english as their first language)
>
> Both change made;  updated patch attached.

Looks good to me.

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 10, 2012 at 09:10:25PM +0200, Magnus Hagander wrote:
> On Tue, Jul 10, 2012 at 7:27 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Jul 10, 2012 at 07:06:39PM +0200, Magnus Hagander wrote:
> >> >> >> rsync where and how? What are you actually trying to suggest people
> >> >> >> do?
> >> >> >
> >> >> > Updated docs attached.
> >> >>
> >> >> I suggest just removing the rsync part completely. You're basically
> >> >> saying "you ca nset up a new standby after you're done", which is kind
> >> >> of obvious anyway. And if you're going to use rsync fromthe master to
> >> >> make a new standby, there's no point in running pg_upgrade on the new
> >> >> standby in the first place.
> >> >
> >> > I went the other direction and just said you can't upgrade a standby (as
> >> > a standby), and to just use rsync --- patch attached.
> >>
> >> Reads much better now. I'd say "use rsync to rebuild the standbys",
> >> but that's more nitpicking :) (And maybe "the simplest way" rather
> >> than "the simplest case"? But i'll leave that to someone who has
> >> english as their first language)
> >
> > Both change made;  updated patch attached.
> 
> Looks good to me.

OK, applied and backpatched to 9.2.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Jeff Davis
Date:
On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote:
> I don't think we can assume that because pg_upgrade was run on the
> master and standby that they are binary identical, can we?  Technically
> the user file are identical, but the system catalogs and WAL might be
> different, hence my suggestion to run rsync before allowing the standby
> to rejoin the primary.

Do you have plans to change that in the future?

If we know that the user data files are identical between primary and
replica, it would be nice if we could provide a robust way to avoid
copying them.

Regards,Jeff Davis




Re: Using pg_upgrade on log-shipping standby servers

From
Daniel Farina
Date:
On Mon, Jul 16, 2012 at 5:29 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote:
>> I don't think we can assume that because pg_upgrade was run on the
>> master and standby that they are binary identical, can we?  Technically
>> the user file are identical, but the system catalogs and WAL might be
>> different, hence my suggestion to run rsync before allowing the standby
>> to rejoin the primary.
>
> Do you have plans to change that in the future?
>
> If we know that the user data files are identical between primary and
> replica, it would be nice if we could provide a robust way to avoid
> copying them.

How about this alternative that may sound crazy, but would lend itself
to some unification in archiving:

Could pg_upgrade emit WAL segment(s) to provide continuity of a
timeline?  So something like:

* Take down the writable primary for pg_upgrade
* Some WAL is emitted and possibly archived
* The old version, when reaching the special pg_upgrade WAL, could
exit or report its situation having paused replay (as clearly, it
cannot proceed). Unsure.
* Start up a new version of postgres on the same cluster at that
point, which plays the upgrade-WAL.

I see this being pretty mechanically intensive, but right now my hands
are completely tied as to achieving total continuity of my archives,
costing a base-backup's worth of risk window upon upgrade.

-- 
fdr


Re: Using pg_upgrade on log-shipping standby servers

From
Jeff Davis
Date:
On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote:
> Could pg_upgrade emit WAL segment(s) to provide continuity of a
> timeline?  So something like:

By "segments" did you mean "records"?

> * Take down the writable primary for pg_upgrade
> * Some WAL is emitted and possibly archived
> * The old version, when reaching the special pg_upgrade WAL, could
> exit or report its situation having paused replay (as clearly, it
> cannot proceed). Unsure.

I don't really understand this step.

> * Start up a new version of postgres on the same cluster at that
> point, which plays the upgrade-WAL.
> 
> I see this being pretty mechanically intensive, but right now my hands
> are completely tied as to achieving total continuity of my archives,
> costing a base-backup's worth of risk window upon upgrade.

Does "continuity of archives" mean "avoid downtime" or "maintain a
single WAL sequence". 

Regards,Jeff Davis



Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Mon, Jul 16, 2012 at 05:29:26PM -0700, Jeff Davis wrote:
> On Tue, 2012-07-10 at 11:50 -0400, Bruce Momjian wrote:
> > I don't think we can assume that because pg_upgrade was run on the
> > master and standby that they are binary identical, can we?  Technically
> > the user file are identical, but the system catalogs and WAL might be
> > different, hence my suggestion to run rsync before allowing the standby
> > to rejoin the primary.
> 
> Do you have plans to change that in the future?
> 
> If we know that the user data files are identical between primary and
> replica, it would be nice if we could provide a robust way to avoid
> copying them.

Well, rsync --checksum would work, but both systems have to be down for
that.  You could snapshot the down primary and rsync --checksum that
against the standby, but I am not sure how much that helps us.

I can't figure out how to make this work better without adding a whole
lot more code to pg_upgrade that might need adjustment for every minor
release, i.e. pg_upgrade knows nothing about the WAL file format, and I
want to keep it that way.

However, I have two ideas.  First, I don't know _why_ the
primary/standby would be any different after pg_upgrade, so I added the
documentation mention because I couldn't _guarantee_ they were the same.
Actually, if people can test this, we might be able to say this is safe.

Second, the user files (large) are certainly identical, it is only the
system tables (small) that _might_ be different, so rsync'ing just those
would add the guarantee, but I know of no easy way to rsync just the
system tables.

Does that help?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Daniel Farina
Date:
On Tue, Jul 17, 2012 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote:
>> Could pg_upgrade emit WAL segment(s) to provide continuity of a
>> timeline?  So something like:
>
> By "segments" did you mean "records"?

Yes.  It would be nicer not to have to tie it to the WAL segment file size.

>> * Take down the writable primary for pg_upgrade
>> * Some WAL is emitted and possibly archived
>> * The old version, when reaching the special pg_upgrade WAL, could
>> exit or report its situation having paused replay (as clearly, it
>> cannot proceed). Unsure.
>
> I don't really understand this step.

"Some WAL is emitted and possibly archived" needs a subject in that fragment:

"pg_upgrade somehow (directly, or indirectly) emits and/or archives
WAL used to complete binary-upgrade".  That means that it should
appear in the WAL stream and be subject to archive_command, like any
other WAL.

The sticky part is what the standby should do when it encounters the
special wal-upgrade records.  It should probably pause replay to allow
some other program to stop the old postgres version and start the new
version with the same cluster.

>> * Start up a new version of postgres on the same cluster at that
>> point, which plays the upgrade-WAL.
>>
>> I see this being pretty mechanically intensive, but right now my hands
>> are completely tied as to achieving total continuity of my archives,
>> costing a base-backup's worth of risk window upon upgrade.
>
> Does "continuity of archives" mean "avoid downtime" or "maintain a
> single WAL sequence".

The latter.

-- 
fdr


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 17, 2012 at 04:49:39PM -0700, Daniel Farina wrote:
> On Tue, Jul 17, 2012 at 11:55 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> > On Tue, 2012-07-17 at 01:02 -0700, Daniel Farina wrote:
> >> Could pg_upgrade emit WAL segment(s) to provide continuity of a
> >> timeline?  So something like:
> >
> > By "segments" did you mean "records"?
> 
> Yes.  It would be nicer not to have to tie it to the WAL segment file size.
> 
> >> * Take down the writable primary for pg_upgrade
> >> * Some WAL is emitted and possibly archived
> >> * The old version, when reaching the special pg_upgrade WAL, could
> >> exit or report its situation having paused replay (as clearly, it
> >> cannot proceed). Unsure.
> >
> > I don't really understand this step.
> 
> "Some WAL is emitted and possibly archived" needs a subject in that fragment:
> 
> "pg_upgrade somehow (directly, or indirectly) emits and/or archives
> WAL used to complete binary-upgrade".  That means that it should
> appear in the WAL stream and be subject to archive_command, like any
> other WAL.
> 
> The sticky part is what the standby should do when it encounters the
> special wal-upgrade records.  It should probably pause replay to allow
> some other program to stop the old postgres version and start the new
> version with the same cluster.

WAL is not guaranteed to be the same between PG major versions, so doing
anything with WAL is pretty much a no-go.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Daniel Farina
Date:
On Tue, Jul 17, 2012 at 9:16 PM, Bruce Momjian <bruce@momjian.us> wrote:
> WAL is not guaranteed to be the same between PG major versions, so doing
> anything with WAL is pretty much a no-go.

I understand that the WAL format changes, sometimes dramatically
between versions. What I'm suggesting that the first WAL-record
emitted by the binary upgrade process could be entitled "WAL-stream
upgrade to 9.4" that would fail to be understood by old versions or
possibly understood to mean "stop replay, you won't even understand
what's about to be said."

At that point, start up new version in the same cluster and have it
continue replay from that position on forward, which should all be in
the new format that it can understand.  It need not understand the old
format in that case, but the tricky part is this single record that
tells the replayer of the old version to stop while a replayer of the
new version somehow will know it is the right place to start.

One mechanism could be a WAL file segment boundary: the standby could
be told to exit when it finishes recovery of the segment
0000000100001234000055CD, and to start the new version beginning
recovery at 0000000100001234000055CF (one higher), and that would be
the first WAL emitted by pg_upgrade. In principle the same is possible
using the fine-grained record position, such as XXXXX/NN, but may be
more complex for not much gain.

This also means the database would be stuck in an inconsistent state
when it starts, not unlike when recovering from a on-line base backup.And that's totally reasonable: the new version
hasto start up
 
presuming that the database cluster makes not enough sense to enter
hot standby yet.

Yet another mechanism is to not have the Postgres recovery-process
apply the WAL, but rather some special purpose program that knows how
to count through and apply specially-formatted WAL segments, and then
set the resultant cluster to start recovering from the WAL past this
span of specially-formatted WAL.  The crux is to get some continuity
in this stream, and there are many ways to slice it. Otherwise, the
continuous archives will have a gap while a new base backup is taken
of data that mostly rests unchanged.

-- 
fdr


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Tue, Jul 17, 2012 at 6:02 PM, Bruce Momjian <bruce@momjian.us> wrote:
> However, I have two ideas.  First, I don't know _why_ the
> primary/standby would be any different after pg_upgrade, so I added the
> documentation mention because I couldn't _guarantee_ they were the same.
> Actually, if people can test this, we might be able to say this is safe.
>
> Second, the user files (large) are certainly identical, it is only the
> system tables (small) that _might_ be different, so rsync'ing just those
> would add the guarantee, but I know of no easy way to rsync just the
> system tables.

I'm scratching my head in confusion here.  After pg_upgrade, the
master is a completely new cluster.  The system catalog contents are
completely different, and so are things like the database system
identifier and the WAL position - yeah, the latter is approximately
the same, but almost doesn't count except in horseshoes.  Obviously
any attempt to replay WAL from the new cluster on the old cluster is
doomed to failure, at least unless we do a bunch more engineering here
that hasn't really been thought about yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Wed, Jul 18, 2012 at 09:36:51AM -0400, Robert Haas wrote:
> On Tue, Jul 17, 2012 at 6:02 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > However, I have two ideas.  First, I don't know _why_ the
> > primary/standby would be any different after pg_upgrade, so I added the
> > documentation mention because I couldn't _guarantee_ they were the same.
> > Actually, if people can test this, we might be able to say this is safe.
> >
> > Second, the user files (large) are certainly identical, it is only the
> > system tables (small) that _might_ be different, so rsync'ing just those
> > would add the guarantee, but I know of no easy way to rsync just the
> > system tables.
> 
> I'm scratching my head in confusion here.  After pg_upgrade, the
> master is a completely new cluster.  The system catalog contents are
> completely different, and so are things like the database system
> identifier and the WAL position - yeah, the latter is approximately
> the same, but almost doesn't count except in horseshoes.  Obviously
> any attempt to replay WAL from the new cluster on the old cluster is
> doomed to failure, at least unless we do a bunch more engineering here
> that hasn't really been thought about yet.

No, the point is they run pg_upgrade on the stopped primary and stopped
standbys.  Are those the same?  I am not really sure.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote:
> No, the point is they run pg_upgrade on the stopped primary and stopped
> standbys.  Are those the same?  I am not really sure.

Of course not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote:
> On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > No, the point is they run pg_upgrade on the stopped primary and stopped
> > standbys.  Are those the same?  I am not really sure.
> 
> Of course not.

OK, but why?  When the clusters are stopped they are the same, you are
running the same initdb on both matchines, and running the same
pg_upgrade.  What would cause the difference, other than the Database
System Identifier, which we can deal with?  I don't think we can
guarantee they are the same, but what would guarantee they are
different?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 19, 2012 at 12:02 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote:
>> On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > No, the point is they run pg_upgrade on the stopped primary and stopped
>> > standbys.  Are those the same?  I am not really sure.
>>
>> Of course not.
>
> OK, but why?  When the clusters are stopped they are the same, you are
> running the same initdb on both matchines, and running the same
> pg_upgrade.  What would cause the difference, other than the Database
> System Identifier, which we can deal with?  I don't think we can
> guarantee they are the same, but what would guarantee they are
> different?

There isn't any guarantee that they are different.  There's just no
guarantee that they are the same, which is enough to make this idea a
non-starter.

In general, it's pretty easy to understand that if you perform the
same series of inserts, updates, and deletes on two systems, you might
not end up with the exact same binary contents.  There are a lot of
reasons for this: any concurrent activity whatsoever - even the exact
timing of autovacuum - can cause the same tuples can end up in
different places in the two systems.  Now, admittedly, in the case of
pg_upgrade, you're restoring the dump using a single process with
absolutely no concurrent activity and even autovacuum disabled, so the
chances of ending up with entirely identical binary contents are
probably higher than average.  But even there you could have
checkpoints trigger at slightly different times while restoring the
dumps, and of course checkpoints take buffer locks, and so now a HOT
prune might happen on one machine but get skipped on the other one
because the checkpointer has dropped the lock but not the pin, and now
you're hosed.

Even if you could control for that particular possibility, there are
surely others now and there will be more in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 19, 2012 at 12:43:23PM -0400, Robert Haas wrote:
> On Thu, Jul 19, 2012 at 12:02 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Jul 19, 2012 at 09:41:29AM -0400, Robert Haas wrote:
> >> On Thu, Jul 19, 2012 at 2:38 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> > No, the point is they run pg_upgrade on the stopped primary and stopped
> >> > standbys.  Are those the same?  I am not really sure.
> >>
> >> Of course not.
> >
> > OK, but why?  When the clusters are stopped they are the same, you are
> > running the same initdb on both matchines, and running the same
> > pg_upgrade.  What would cause the difference, other than the Database
> > System Identifier, which we can deal with?  I don't think we can
> > guarantee they are the same, but what would guarantee they are
> > different?
> 
> There isn't any guarantee that they are different.  There's just no
> guarantee that they are the same, which is enough to make this idea a
> non-starter.
> 
> In general, it's pretty easy to understand that if you perform the
> same series of inserts, updates, and deletes on two systems, you might
> not end up with the exact same binary contents.  There are a lot of
> reasons for this: any concurrent activity whatsoever - even the exact
> timing of autovacuum - can cause the same tuples can end up in
> different places in the two systems.  Now, admittedly, in the case of
> pg_upgrade, you're restoring the dump using a single process with
> absolutely no concurrent activity and even autovacuum disabled, so the
> chances of ending up with entirely identical binary contents are
> probably higher than average.  But even there you could have
> checkpoints trigger at slightly different times while restoring the
> dumps, and of course checkpoints take buffer locks, and so now a HOT
> prune might happen on one machine but get skipped on the other one
> because the checkpointer has dropped the lock but not the pin, and now
> you're hosed.
> 
> Even if you could control for that particular possibility, there are
> surely others now and there will be more in the future.

I think the checkpoint issue is the ideal killer --- thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Tue, Jul 17, 2012 at 06:02:40PM -0400, Bruce Momjian wrote:
> Second, the user files (large) are certainly identical, it is only the
> system tables (small) that _might_ be different, so rsync'ing just those
> would add the guarantee, but I know of no easy way to rsync just the
> system tables.

OK, new idea.  I said above I didn't know how to copy just the non-user
table files (which are not modified by pg_upgrade), but actually, if you
use link mode, the user files are the only files with a hard link count
of 2.  I could create a script that copied from the master to the slave
only those files with a link count of one.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Aidan Van Dyk
Date:
If you're wanting to automatically do some upgrades wouldn't an easier route be:

1) run pg_upgrade, up to the point where it actually start's
copying/linking in old cluster data files, and stop the new
postmaster.
2) Take a "base backup" style copy (tar, rsync, $FAVOURITE) of the new
cluster (small, since without data files)
3) Have pg_upgrade leave a log of exactly which old cluster data files
go where in the new cluster

That way, anybody, any script, etc who wants to make a new "standby"
from and old one only needs the pg_upgrade base backup (which should
be small, no data, just catalog stuff), and the log of which old files
to move where.

The only pre-condition is that the standby's "old pg" *APPLIED* WAL up
to the exact same point as the master's "old pg".  In that case the
standby's old cluster data files should same enough (maybe hint bits
off?) to be used.

a.

On Fri, Jul 20, 2012 at 12:25 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jul 17, 2012 at 06:02:40PM -0400, Bruce Momjian wrote:
>> Second, the user files (large) are certainly identical, it is only the
>> system tables (small) that _might_ be different, so rsync'ing just those
>> would add the guarantee, but I know of no easy way to rsync just the
>> system tables.
>
> OK, new idea.  I said above I didn't know how to copy just the non-user
> table files (which are not modified by pg_upgrade), but actually, if you
> use link mode, the user files are the only files with a hard link count
> of 2.  I could create a script that copied from the master to the slave
> only those files with a link count of one.
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Fri, Jul 20, 2012 at 12:39:12PM -0400, Aidan Van Dyk wrote:
> If you're wanting to automatically do some upgrades wouldn't an easier route be:
> 
> 1) run pg_upgrade, up to the point where it actually start's
> copying/linking in old cluster data files, and stop the new
> postmaster.
> 2) Take a "base backup" style copy (tar, rsync, $FAVOURITE) of the new
> cluster (small, since without data files)
> 3) Have pg_upgrade leave a log of exactly which old cluster data files
> go where in the new cluster
> 
> That way, anybody, any script, etc who wants to make a new "standby"
> from and old one only needs the pg_upgrade base backup (which should
> be small, no data, just catalog stuff), and the log of which old files
> to move where.
> 
> The only pre-condition is that the standby's "old pg" *APPLIED* WAL up
> to the exact same point as the master's "old pg".  In that case the
> standby's old cluster data files should same enough (maybe hint bits
> off?) to be used.

I am not sure what a base backup is buying us here --- base backup is
designed to create a backup while the server is running, and it is down
at that point.  I think what you are suggesting is to make a data dir
copy while just the schema is in place.  That is possible, but it opens
up all kinds of possible failure cases because pg_upgrade operations
have to be done in a specific order --- it feels very fragile.

I think the commands to run after pg_upgrade --link completes on both
primary and standby might be as easy as:
cd /u/pg/pgsql.old/datafind . -links 1 -exec cp {} /u/pgsql/data \;

Why would we want anything more complicated than this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Peter Eisentraut
Date:
On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote:
> I think the commands to run after pg_upgrade --link completes on both
> primary and standby might be as easy as:
> 
>         cd /u/pg/pgsql.old/data
>         find . -links 1 -exec cp {} /u/pgsql/data \;
> 
> Why would we want anything more complicated than this?

In practice these are on different machines, and the way the machines
are connected could vary wildly.  So an automated solution might be
difficult to find.



Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Sat, Jul 21, 2012 at 11:24:21AM +0300, Peter Eisentraut wrote:
> On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote:
> > I think the commands to run after pg_upgrade --link completes on both
> > primary and standby might be as easy as:
> > 
> >         cd /u/pg/pgsql.old/data
> >         find . -links 1 -exec cp {} /u/pgsql/data \;
> > 
> > Why would we want anything more complicated than this?
> 
> In practice these are on different machines, and the way the machines
> are connected could vary wildly.  So an automated solution might be
> difficult to find.

Yeah, I was thinking of just suggesting scp as a doc example and let
users adjust that:
       cd /u/pg/pgsql.old/data       find . -links 1 -exec scp {} postgres@momjian.us:/u/pgsql/data \;


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Sat, Jul 21, 2012 at 8:57 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Sat, Jul 21, 2012 at 11:24:21AM +0300, Peter Eisentraut wrote:
>> On fre, 2012-07-20 at 13:11 -0400, Bruce Momjian wrote:
>> > I think the commands to run after pg_upgrade --link completes on both
>> > primary and standby might be as easy as:
>> >
>> >         cd /u/pg/pgsql.old/data
>> >         find . -links 1 -exec cp {} /u/pgsql/data \;
>> >
>> > Why would we want anything more complicated than this?
>>
>> In practice these are on different machines, and the way the machines
>> are connected could vary wildly.  So an automated solution might be
>> difficult to find.
>
> Yeah, I was thinking of just suggesting scp as a doc example and let
> users adjust that:
>
>         cd /u/pg/pgsql.old/data
>         find . -links 1 -exec scp {} postgres@momjian.us:/u/pgsql/data \;

Relying on the number of hard links seems very fragile.  For example,
it'll break if you are using copy mode.  And it won't work on Windows,
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Peter Eisentraut
Date:
On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote:
> Relying on the number of hard links seems very fragile.  For example,
> it'll break if you are using copy mode.  And it won't work on Windows,
> either.

pg_upgrade could remember the list of files that the user would need to
copy to the remote server (i.e., the list of files pg_upgrade itself
copied or linked) and write that to a file.



Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote:
> On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote:
> > Relying on the number of hard links seems very fragile.  For example,
> > it'll break if you are using copy mode.  And it won't work on Windows,
> > either.
> 
> pg_upgrade could remember the list of files that the user would need to
> copy to the remote server (i.e., the list of files pg_upgrade itself
> copied or linked) and write that to a file.

Good idea.  A list of file names seems best, but wouldn't that list
exceed the maximum size of an argument list?  How could we pass that
list to a command like scp?   And how could we pass that list on
Windows?

Pg_upgrade already creates a script to analyze the cluster, so we could
create another script to upgrade a standby.  However, the problem with a
script is that I have no idea what command people would use to do the
copy.  I think I could create a list and pass that into a loop so only
the command has to be modified, but again, how do we do that on Windows?
Can we create a shell function in Windows and pass the file name as an
argument?

Another problem is that the standby cluster might create _new_ files
that don't exist on the master, e.g. WAL files, and those have to be
removed.  I am not clear how to do that either, except by removing all
files with a hard link count of 1, and again, this is difficult on
Windows.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 08:30:40AM -0400, Bruce Momjian wrote:
> On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote:
> > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote:
> > > Relying on the number of hard links seems very fragile.  For example,
> > > it'll break if you are using copy mode.  And it won't work on Windows,
> > > either.
> > 
> > pg_upgrade could remember the list of files that the user would need to
> > copy to the remote server (i.e., the list of files pg_upgrade itself
> > copied or linked) and write that to a file.
> 
> Good idea.  A list of file names seems best, but wouldn't that list
> exceed the maximum size of an argument list?  How could we pass that
> list to a command like scp?   And how could we pass that list on
> Windows?
> 
> Pg_upgrade already creates a script to analyze the cluster, so we could
> create another script to upgrade a standby.  However, the problem with a
> script is that I have no idea what command people would use to do the
> copy.  I think I could create a list and pass that into a loop so only
> the command has to be modified, but again, how do we do that on Windows?
> Can we create a shell function in Windows and pass the file name as an
> argument?
> 
> Another problem is that the standby cluster might create _new_ files
> that don't exist on the master, e.g. WAL files, and those have to be
> removed.  I am not clear how to do that either, except by removing all
> files with a hard link count of 1, and again, this is difficult on
> Windows.

Maybe we just need to require Perl, and store the file names in the
generated Perl script.  I know we don't normally require Perl for
operation (only development), but Perl seems to be the easiest solution
here.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Peter Eisentraut
Date:
On tor, 2012-07-26 at 08:30 -0400, Bruce Momjian wrote:
> On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote:
> > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote:
> > > Relying on the number of hard links seems very fragile.  For example,
> > > it'll break if you are using copy mode.  And it won't work on Windows,
> > > either.
> > 
> > pg_upgrade could remember the list of files that the user would need to
> > copy to the remote server (i.e., the list of files pg_upgrade itself
> > copied or linked) and write that to a file.
> 
> Good idea.  A list of file names seems best, but wouldn't that list
> exceed the maximum size of an argument list?  How could we pass that
> list to a command like scp?

xargs

> Pg_upgrade already creates a script to analyze the cluster, so we could
> create another script to upgrade a standby.  However, the problem with a
> script is that I have no idea what command people would use to do the
> copy.

Exactly.  Perhaps an example wouldn't hurt, but I wouldn't go too far.

> I think I could create a list and pass that into a loop so only
> the command has to be modified, but again, how do we do that on Windows?
> Can we create a shell function in Windows and pass the file name as an
> argument?

I don't know, but I assume that somewhere in the known universe there is
a way on Windows to say, here is a list of files, copy them to that
host.

> Another problem is that the standby cluster might create _new_ files
> that don't exist on the master, e.g. WAL files, and those have to be
> removed.  I am not clear how to do that either, except by removing all
> files with a hard link count of 1, and again, this is difficult on
> Windows.

Well, then that would call for another list of files.




Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 26, 2012 at 9:59 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> I think I could create a list and pass that into a loop so only
>> the command has to be modified, but again, how do we do that on Windows?
>> Can we create a shell function in Windows and pass the file name as an
>> argument?
>
> I don't know, but I assume that somewhere in the known universe there is
> a way on Windows to say, here is a list of files, copy them to that
> host.
>
>> Another problem is that the standby cluster might create _new_ files
>> that don't exist on the master, e.g. WAL files, and those have to be
>> removed.  I am not clear how to do that either, except by removing all
>> files with a hard link count of 1, and again, this is difficult on
>> Windows.
>
> Well, then that would call for another list of files.

I cannot escape the feeling that if we go down this route in any form
we're going to spend years tracking down data-loss-inducing bugs.  The
ones we have on the master are bad enough, but doing it on the standby
is almost worse because (1) few enough people will use this
functionality that we won't get many bug reports even if it's badly
broken and (2) people who are affected may not discover it until
something bad has already happened on the master.  I don't hear anyone
thinking very hard about ways that the master could be different from
the standby, and without a lot of careful thought on that topic I
think this is all kinds of bad news.  Just to take one example, how
are you going to ensure that the standby has replayed all the WAL that
the master generated prior to the upgrade?  If the answer is "shut
everything down cleanly and hope for the best", color me unimpressed.

IMV, pg_upgrade is not yet sufficiently reliable that we should be
looking for new projects that seem certain to make it less reliable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Andrew Dunstan
Date:
On 07/26/2012 09:59 AM, Peter Eisentraut wrote:
>> I think I could create a list and pass that into a loop so only
>> the command has to be modified, but again, how do we do that on Windows?
>> Can we create a shell function in Windows and pass the file name as an
>> argument?
> I don't know, but I assume that somewhere in the known universe there is
> a way on Windows to say, here is a list of files, copy them to that
> host.


The issue isn't whether or not there is a way known somewhere in the 
universe to do something, it's whether or not it's reasonable to assume 
that such a thing is likely to be available. In general we have tried 
not to assume very much at all about what's available on Windows - not 
much beyond the simple cmd.exe shell.

On Windows I typically use scp to copy things around between machines, 
but I'd be very wary of creating a Postgres utility that expects it to 
be present.

cheers

andrew


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 04:59:46PM +0300, Peter Eisentraut wrote:
> On tor, 2012-07-26 at 08:30 -0400, Bruce Momjian wrote:
> > On Mon, Jul 23, 2012 at 11:11:27PM +0300, Peter Eisentraut wrote:
> > > On mån, 2012-07-23 at 10:08 -0400, Robert Haas wrote:
> > > > Relying on the number of hard links seems very fragile.  For example,
> > > > it'll break if you are using copy mode.  And it won't work on Windows,
> > > > either.
> > > 
> > > pg_upgrade could remember the list of files that the user would need to
> > > copy to the remote server (i.e., the list of files pg_upgrade itself
> > > copied or linked) and write that to a file.
> > 
> > Good idea.  A list of file names seems best, but wouldn't that list
> > exceed the maximum size of an argument list?  How could we pass that
> > list to a command like scp?
> 
> xargs

Yes, good idea.

> > Pg_upgrade already creates a script to analyze the cluster, so we could
> > create another script to upgrade a standby.  However, the problem with a
> > script is that I have no idea what command people would use to do the
> > copy.
> 
> Exactly.  Perhaps an example wouldn't hurt, but I wouldn't go too far.

Agreed.

> > I think I could create a list and pass that into a loop so only
> > the command has to be modified, but again, how do we do that on Windows?
> > Can we create a shell function in Windows and pass the file name as an
> > argument?
> 
> I don't know, but I assume that somewhere in the known universe there is
> a way on Windows to say, here is a list of files, copy them to that
> host.

No idea.

> > Another problem is that the standby cluster might create _new_ files
> > that don't exist on the master, e.g. WAL files, and those have to be
> > removed.  I am not clear how to do that either, except by removing all
> > files with a hard link count of 1, and again, this is difficult on
> > Windows.
> 
> Well, then that would call for another list of files.

Well, not really.  If we create a list of all user table/index files,
then any file not on the list would be removed on the standby, then all
the files in the primary not on the list are copied to the standby.  
One list is less error-prone.  This is easy in Unix shell and Perl, but
hard on Windows without Perl.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 10:26:53AM -0400, Robert Haas wrote:
> > Well, then that would call for another list of files.
> 
> I cannot escape the feeling that if we go down this route in any form
> we're going to spend years tracking down data-loss-inducing bugs.  The
> ones we have on the master are bad enough, but doing it on the standby
> is almost worse because (1) few enough people will use this
> functionality that we won't get many bug reports even if it's badly
> broken and (2) people who are affected may not discover it until
> something bad has already happened on the master.  I don't hear anyone
> thinking very hard about ways that the master could be different from
> the standby, and without a lot of careful thought on that topic I
> think this is all kinds of bad news.  Just to take one example, how
> are you going to ensure that the standby has replayed all the WAL that
> the master generated prior to the upgrade?  If the answer is "shut
> everything down cleanly and hope for the best", color me unimpressed.
> 
> IMV, pg_upgrade is not yet sufficiently reliable that we should be
> looking for new projects that seem certain to make it less reliable.

The script has to make the primary/standby identical, and guarantee
that.  That is why one list and removing new standby files is necessary.
Are you saying having the primary/standby identical is
impossible/unreliable, or that having them the same is insufficent?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 26, 2012 at 10:40 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Jul 26, 2012 at 10:26:53AM -0400, Robert Haas wrote:
>> > Well, then that would call for another list of files.
>>
>> I cannot escape the feeling that if we go down this route in any form
>> we're going to spend years tracking down data-loss-inducing bugs.  The
>> ones we have on the master are bad enough, but doing it on the standby
>> is almost worse because (1) few enough people will use this
>> functionality that we won't get many bug reports even if it's badly
>> broken and (2) people who are affected may not discover it until
>> something bad has already happened on the master.  I don't hear anyone
>> thinking very hard about ways that the master could be different from
>> the standby, and without a lot of careful thought on that topic I
>> think this is all kinds of bad news.  Just to take one example, how
>> are you going to ensure that the standby has replayed all the WAL that
>> the master generated prior to the upgrade?  If the answer is "shut
>> everything down cleanly and hope for the best", color me unimpressed.
>>
>> IMV, pg_upgrade is not yet sufficiently reliable that we should be
>> looking for new projects that seem certain to make it less reliable.
>
> The script has to make the primary/standby identical, and guarantee
> that.  That is why one list and removing new standby files is necessary.
> Are you saying having the primary/standby identical is
> impossible/unreliable, or that having them the same is insufficent?

Having them the same is clearly sufficient, but only if ALL the files
are the same.  You seem to be taking it on faith that the data files
backing non-system tables will be the same, or close enough to the
same, that we can just not worry about those.  I find that when I
don't worry about things, they break.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 11:03:15AM -0400, Robert Haas wrote:
> >> IMV, pg_upgrade is not yet sufficiently reliable that we should be
> >> looking for new projects that seem certain to make it less reliable.
> >
> > The script has to make the primary/standby identical, and guarantee
> > that.  That is why one list and removing new standby files is necessary.
> > Are you saying having the primary/standby identical is
> > impossible/unreliable, or that having them the same is insufficient?
> 
> Having them the same is clearly sufficient, but only if ALL the files
> are the same.  You seem to be taking it on faith that the data files
> backing non-system tables will be the same, or close enough to the
> same, that we can just not worry about those.  I find that when I
> don't worry about things, they break.

Good point, and an interesting question.  The way pg_upgrade works is
that it moves the user table/index files to the new cluster at the end,
and does not start the server after that.  This is why you can always
revert to the old cluster in link mode after it completes.  If you start
the new cluster, you are committed and can't revert in link mode.  I
guess you could ask if the old cluster operations could have modified
these files in some non-standard way, but I can't think of how that
could happen.  

What you would probably want is for the old primary cluster to continue
streaming its WAL to the standby cluster _until_ pg_upgrade completes,
and then do the pg_upgrade on the old standby, but I can't think of a
simple way to do that.

What would be really interesting would be to somehow have the old and
new clusters both on the standby and have the master stream to the
appropriate server, but that seems complicated, and I am not sure if it
would help because of the interactions behind-the-scenes in pg_upgrade.

I don't see the "don't modify the user files" behavior changing anytime
soon, and it is documented, so I feel pretty confident that those files
were not modified on the primary or standby cluster, and are hence the
same, or at least as "the same" as they were when they were running the
older major version of Postgres.

Is that sufficient?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
> I don't see the "don't modify the user files" behavior changing anytime
> soon, and it is documented, so I feel pretty confident that those files
> were not modified on the primary or standby cluster, and are hence the
> same, or at least as "the same" as they were when they were running the
> older major version of Postgres.
>
> Is that sufficient?

Well, at the very least, you need to guarantee that the standby is
caught up - i.e. that it replayed all the WAL records that were
generated on the master before it was shut down for the final time.  I
don't think that telling the user that they must be sure to do that is
sufficient - you need some kind of built-in safeguard that will
complain loudly if it's not the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 01:24:19PM -0400, Robert Haas wrote:
> On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > I don't see the "don't modify the user files" behavior changing anytime
> > soon, and it is documented, so I feel pretty confident that those files
> > were not modified on the primary or standby cluster, and are hence the
> > same, or at least as "the same" as they were when they were running the
> > older major version of Postgres.
> >
> > Is that sufficient?
> 
> Well, at the very least, you need to guarantee that the standby is
> caught up - i.e. that it replayed all the WAL records that were
> generated on the master before it was shut down for the final time.  I
> don't think that telling the user that they must be sure to do that is
> sufficient - you need some kind of built-in safeguard that will
> complain loudly if it's not the case.

Yes, that would be a problem because the WAL records are deleted by
pg_upgrade.   Does a shutdown of the standby not already replay all WAL
logs?  We could also just require them to just start the standby in
master mode and shut it down.  The problem with that is it might run
things like autovacuum.

I was originally thinking that we would require users to run pg_upgrade
on the standby, where you need to first switch into master mode.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Jeff Davis
Date:
On Thu, 2012-07-26 at 14:17 -0400, Bruce Momjian wrote:
> Yes, that would be a problem because the WAL records are deleted by
> pg_upgrade.   Does a shutdown of the standby not already replay all WAL
> logs?

There is no notion of "all WAL logs" because the WAL is infinite. Do you
mean "all WAL generated by the master before shutdown" or "all WAL that
the standby knows has been generated by the master so far"?

Regardless, I don't think the standby attempts to do much after a
shutdown is requested.

>   We could also just require them to just start the standby in
> master mode and shut it down.  The problem with that is it might run
> things like autovacuum.

If we had sync rep that waits for application of the WAL, that might be
a more robust approach. We could, during shutdown of the master cause
the standby to cancel all HS queries, and then change to sync rep and
wait for complete catchup.

There are a lot of details to work out there, but it might give us a
higher confidence that it's doing the right thing.

Given two shut-down systems, it should be pretty easy to tell if they
have played the same amount of WAL though, right?

> I was originally thinking that we would require users to run pg_upgrade
> on the standby, where you need to first switch into master mode.

That sounds a little strange to me. If the original master has generated
WAL that the original standby hasn't seen yet, then this doesn't help
because the two systems would be diverged, and you'd need a new base
backup anyway. And if they have played exactly the same WAL, what does
this accomplish?

Regards,Jeff Davis



Re: Using pg_upgrade on log-shipping standby servers

From
Daniel Farina
Date:
On Thu, Jul 26, 2012 at 2:26 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> I was originally thinking that we would require users to run pg_upgrade
>> on the standby, where you need to first switch into master mode.
>
> That sounds a little strange to me. If the original master has generated
> WAL that the original standby hasn't seen yet, then this doesn't help
> because the two systems would be diverged, and you'd need a new base
> backup anyway. And if they have played exactly the same WAL, what does
> this accomplish?

This whole approach of coordinating precise content of a standby
cluster to run pg_upgrade and then resynchronizing with a
also-pg-upgraded primary at a precise WAL position that does not
increment to complete the upgrade does not excite me in the slightest:
I feel like it is really asking for problems.  I think the WAL
position should advance and/or have a timeline change when undergoing
upgrade so that the system can more reliably report bookkeeping error,
and it'd be ideal if WAL was generated that applied those changes.

For example: suppose pg_upgrade emitted full-page-write records in the
format of the new postgres version on an unoccupied timeline.  One can
use PG.next tools to report on the first txid in the pg_upgrade
generated WAL and then use standard point in time recovery features to
halt replay on a PG.previous version, swap to the new timeline, and
then start up PG.next on the new timeline, applying the full page
writes to its catalogs before becoming consistent.

-- 
fdr


Re: Using pg_upgrade on log-shipping standby servers

From
Daniel Farina
Date:
On Thu, Jul 26, 2012 at 3:01 PM, Daniel Farina <daniel@heroku.com> wrote:
> For example: suppose pg_upgrade emitted full-page-write records in the
> format of the new postgres version on an unoccupied timeline.  One can
> use PG.next tools to report on the first txid

and by txid I meant WAL position, which mucks it up a bit because
that's not an option on old Postgres-es.  But the overall point is the
same: I think pg_upgrade would be much more safe and usable in the
standby setting if it emitted data that depend upon a posterior
WAL-position at the point the cluster diverges onto a new timeline
that is the new version of Postgres, and it would be nice if that data
incremented the WAL position rather than being applied out-of-band.
And, it seems that emitting full page image records in the format of
the new version is a way to accomplish that.

-- 
fdr


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 02:17:22PM -0400, Bruce Momjian wrote:
> > > Is that sufficient?
> > 
> > Well, at the very least, you need to guarantee that the standby is
> > caught up - i.e. that it replayed all the WAL records that were
> > generated on the master before it was shut down for the final time.  I
> > don't think that telling the user that they must be sure to do that is
> > sufficient - you need some kind of built-in safeguard that will
> > complain loudly if it's not the case.
> 
> Yes, that would be a problem because the WAL records are deleted by
> pg_upgrade.   Does a shutdown of the standby not already replay all WAL
> logs?  We could also just require them to just start the standby in
> master mode and shut it down.  The problem with that is it might run
> things like autovacuum.
> 
> I was originally thinking that we would require users to run pg_upgrade
> on the standby, where you need to first switch into master mode.

OK, sorry, I was confused.  You _have_ to run pg_upgrade on the standby
--- there are many things we don't preserve, and we need pg_upgrade to
move those user file to the right place --- a obvious example is
tablespace files.  Database oids aren't even preserved, so the data
directory changes.

So, you need change the standby to write mode, run pg_upgrade, then run
whatever copy command we design.  

Is Perl the proper language for that script?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 02:26:16PM -0700, Jeff Davis wrote:
> On Thu, 2012-07-26 at 14:17 -0400, Bruce Momjian wrote:
> > Yes, that would be a problem because the WAL records are deleted by
> > pg_upgrade.   Does a shutdown of the standby not already replay all WAL
> > logs?
> 
> There is no notion of "all WAL logs" because the WAL is infinite. Do you
> mean "all WAL generated by the master before shutdown" or "all WAL that
> the standby knows has been generated by the master so far"?
> 
> Regardless, I don't think the standby attempts to do much after a
> shutdown is requested.

Doesn't matter anymore --- I now know pg_upgrade has to be run on the
standby, and it will be in primary mode and do full WAL reading before
it starts.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 26, 2012 at 2:17 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Jul 26, 2012 at 01:24:19PM -0400, Robert Haas wrote:
>> On Thu, Jul 26, 2012 at 12:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> > I don't see the "don't modify the user files" behavior changing anytime
>> > soon, and it is documented, so I feel pretty confident that those files
>> > were not modified on the primary or standby cluster, and are hence the
>> > same, or at least as "the same" as they were when they were running the
>> > older major version of Postgres.
>> >
>> > Is that sufficient?
>>
>> Well, at the very least, you need to guarantee that the standby is
>> caught up - i.e. that it replayed all the WAL records that were
>> generated on the master before it was shut down for the final time.  I
>> don't think that telling the user that they must be sure to do that is
>> sufficient - you need some kind of built-in safeguard that will
>> complain loudly if it's not the case.
>
> Yes, that would be a problem because the WAL records are deleted by
> pg_upgrade.   Does a shutdown of the standby not already replay all WAL
> logs?

Not if it's an immediate shutdown, and not if it didn't have them all
on disk in the first place.  Who is to say it's even caught up?

> I was originally thinking that we would require users to run pg_upgrade
> on the standby, where you need to first switch into master mode.

As Jeff says, that doesn't help anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Robert Haas
Date:
On Thu, Jul 26, 2012 at 7:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Jul 26, 2012 at 02:17:22PM -0400, Bruce Momjian wrote:
>> > > Is that sufficient?
>> >
>> > Well, at the very least, you need to guarantee that the standby is
>> > caught up - i.e. that it replayed all the WAL records that were
>> > generated on the master before it was shut down for the final time.  I
>> > don't think that telling the user that they must be sure to do that is
>> > sufficient - you need some kind of built-in safeguard that will
>> > complain loudly if it's not the case.
>>
>> Yes, that would be a problem because the WAL records are deleted by
>> pg_upgrade.   Does a shutdown of the standby not already replay all WAL
>> logs?  We could also just require them to just start the standby in
>> master mode and shut it down.  The problem with that is it might run
>> things like autovacuum.
>>
>> I was originally thinking that we would require users to run pg_upgrade
>> on the standby, where you need to first switch into master mode.
>
> OK, sorry, I was confused.  You _have_ to run pg_upgrade on the standby
> --- there are many things we don't preserve, and we need pg_upgrade to
> move those user file to the right place --- a obvious example is
> tablespace files.  Database oids aren't even preserved, so the data
> directory changes.

These are reasons why you CANNOT run pg_upgrade on the standby, not
why you HAVE to.  If you run pg_upgrade on the standby and separately
on the master, you will end up with divergence precisely because of
those things that aren't preserved.

Any approach that calls for pg_upgrade to run on the master and
standby separately is broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Fri, Jul 27, 2012 at 08:29:20AM -0400, Robert Haas wrote:
> >> Yes, that would be a problem because the WAL records are deleted by
> >> pg_upgrade.   Does a shutdown of the standby not already replay all WAL
> >> logs?  We could also just require them to just start the standby in
> >> master mode and shut it down.  The problem with that is it might run
> >> things like autovacuum.
> >>
> >> I was originally thinking that we would require users to run pg_upgrade
> >> on the standby, where you need to first switch into master mode.
> >
> > OK, sorry, I was confused.  You _have_ to run pg_upgrade on the standby
> > --- there are many things we don't preserve, and we need pg_upgrade to
> > move those user file to the right place --- a obvious example is
> > tablespace files.  Database oids aren't even preserved, so the data
> > directory changes.
> 
> These are reasons why you CANNOT run pg_upgrade on the standby, not
> why you HAVE to.  If you run pg_upgrade on the standby and separately
> on the master, you will end up with divergence precisely because of
> those things that aren't preserved.
> 
> Any approach that calls for pg_upgrade to run on the master and
> standby separately is broken.

Basically, you have to run pg_upgrade on the standby so the user data
files are moved properly, then you would need to run a copy script that
would copy over all the non-user files from the master.

Are you worried that the standby, by becoming a master, will write to
the standby old cluster user data files in a way that is inconsistent
from the master?  If so, I think this entire idea can't work.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Using pg_upgrade on log-shipping standby servers

From
Bruce Momjian
Date:
On Thu, Jul 26, 2012 at 10:36:59AM -0400, Bruce Momjian wrote:
> > > Pg_upgrade already creates a script to analyze the cluster, so we could
> > > create another script to upgrade a standby.  However, the problem with a
> > > script is that I have no idea what command people would use to do the
> > > copy.
> > 
> > Exactly.  Perhaps an example wouldn't hurt, but I wouldn't go too far.
> 
> Agreed.
> 
> > > I think I could create a list and pass that into a loop so only
> > > the command has to be modified, but again, how do we do that on Windows?
> > > Can we create a shell function in Windows and pass the file name as an
> > > argument?
> > 
> > I don't know, but I assume that somewhere in the known universe there is
> > a way on Windows to say, here is a list of files, copy them to that
> > host.
> 
> No idea.
> 
> > > Another problem is that the standby cluster might create _new_ files
> > > that don't exist on the master, e.g. WAL files, and those have to be
> > > removed.  I am not clear how to do that either, except by removing all
> > > files with a hard link count of 1, and again, this is difficult on
> > > Windows.
> > 
> > Well, then that would call for another list of files.
> 
> Well, not really.  If we create a list of all user table/index files,
> then any file not on the list would be removed on the standby, then all
> the files in the primary not on the list are copied to the standby.  
> One list is less error-prone.  This is easy in Unix shell and Perl, but
> hard on Windows without Perl.

There was too much concern about pg_upgrade upgrading a standby server
that I am not going to peruse the issue at this time.

I did add a TODO in case we ever want to resurrect the idea:
Consider a way to run pg_upgrade on standby servers
http://archives.postgresql.org/pgsql-hackers/2012-07/msg00453.php
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +