Thread: Better Upgrades

Better Upgrades

From
David Fetter
Date:
Folks,

While chatting with Bruce about how to make something better than
pg_upgrade, we (and by "we," I mean mostly Bruce) came up with the
following.

What needs improvement:

- pg_upgrade forces a down time event, no matter how cleverly it's done.
- pg_upgrade is very much a blocker for on-disk format changes.

The proposal:

- Add a new script--possibly Perl or Bash, which would:
    - Initdb a new cluster with the new version of PostgreSQL and a
      different port.
    - Start logical replication from the old version to the new
      version.
    - Poll until a pre-determined default amount of replication lag was observed, then:
      * Issue an ALTER SYSTEM on the new server to change its port to the old server's
      * Issue a pg_ctl stop -w to the old server
      * Issue a pg_ctl restart on the new server
      * Happiness!

Assumptions underlying it:

- Disk and similar resources are cheap enough for most users that
  doubling up during the upgrade is feasible.
- The default upgrade path should require exactly one step.
- Errors do not, by and large, have the capacity to violate an SLA.

The proposal has blockers: 

- We don't actually have logical decoding for DDL, although I'm given
  to understand that Álvaro Herrera has done some yeoman follow-up
  work on Dimitri Fontaine's PoC patches.
- We don't have logical decoding for DCL (GRANT/REVOKE)

We also came up with and, we believe, addressed an important issue,
namely how to ensure continuity.  When we issue a `pg_ctl stop -w`,
that's short for "Cancel current commands and stop cleanly."  At this
point, the new server will not have WAL to replay, so a pg_ctl restart
will load the new configuration and come up pretty much immediately,
and the next try will find a brand new server without a down time
event.

Does this seem worth coding up in its current form?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Better Upgrades

From
"David G. Johnston"
Date:
On Mon, Feb 5, 2018 at 5:09 PM, David Fetter <david@fetter.org> wrote:

The proposal has blockers:

- We don't actually have logical decoding for DDL, although I'm given
  to understand that Álvaro Herrera has done some yeoman follow-up
  work on Dimitri Fontaine's PoC patches.
- We don't have logical decoding for DCL (GRANT/REVOKE)

We also came up with and, we believe, addressed an important issue,
namely how to ensure continuity.  When we issue a `pg_ctl stop -w`,
that's short for "Cancel current commands and stop cleanly."  At this
point, the new server will not have WAL to replay, so a pg_ctl restart
will load the new configuration and come up pretty much immediately,
and the next try will find a brand new server without a down time
event.

To what extent do people think that differing cache contents between the primary and secondary are likely to play a role?  While connection downtime is zero as long as people have their fail-and-reconnect logic in place queries taking longer than expected do to unexpected disk I/O could cause SLA problems.​  pg_prewarm currently fulfills that role, and maybe it isn't the large a problem in practice (the scale I operate at doesn't have this concern), but having some kind of channel to keep cached blocks (at least within PostgreSQL) synchronized would probably be something to at least factor into the design even if the first pass didn't solve that particular problem.

David J.


Dave

Re: Better Upgrades

From
Craig Ringer
Date:
On 6 February 2018 at 08:09, David Fetter <david@fetter.org> wrote:
Folks,

While chatting with Bruce about how to make something better than
pg_upgrade, we (and by "we," I mean mostly Bruce) came up with the
following.

What needs improvement:

- pg_upgrade forces a down time event, no matter how cleverly it's done.
- pg_upgrade is very much a blocker for on-disk format changes.

The proposal:

- Add a new script--possibly Perl or Bash, which would:
    - Initdb a new cluster with the new version of PostgreSQL and a
      different port.
    - Start logical replication from the old version to the new
      version.
    - Poll until a pre-determined default amount of replication lag was observed, then:
      * Issue an ALTER SYSTEM on the new server to change its port to the old server's
      * Issue a pg_ctl stop -w to the old server
      * Issue a pg_ctl restart on the new server
      * Happiness!

Assumptions underlying it:

- Disk and similar resources are cheap enough for most users that
  doubling up during the upgrade is feasible.
- The default upgrade path should require exactly one step.
- Errors do not, by and large, have the capacity to violate an SLA.

The proposal has blockers:

- We don't actually have logical decoding for DDL, although I'm given
  to understand that Álvaro Herrera has done some yeoman follow-up
  work on Dimitri Fontaine's PoC patches.

Yep, some DDL support would be key. Lots of apps expect to do DDL online (automatic migrations, etc), and after all it's been one of Pg's selling points for a long time. Not having it means people have to do much more prep - and *will* ignore that prep and break things if you expose it as an easy to use tool.

You will find dealing with DDL that does full table rewrites to be a challenge. Especially ALTER TABLEs that add a DEFAULT using a non-IMMUTABLE expression and ALTER TYPE ... USING with a non-IMMUTABLE expression. But even for the immutable cases some work is needed to let us cleanly replicate the DDL.

Some DDL may initially need to be disallowed because it plays poorly with logical decoding, e.g. CREATE INDEX CONCURRENTLY. It's fixable, it just requires special case handling where the apply side understands that specific statement and knows how to recover if we have an error partway through applying it.

We (2ndQuadrant) routinely do online migrations like this using pglogical, which is a much more capable tool than in-core logical replication currently is. But it still requires careful preparation and application qualification, and preferably a trial run. Improvements in schema change / DDL handling would be needed to address that.

Logical decoding doesn't support a number of postgres features, which still rules its use out for some customers. No sequence support (though pglogical works around that with periodic sequence sync). No pg_largeobject support. Probably more I'm forgetting.

Long txns and txns that change huge numbers of rows, especially if they also dirty the catalogs (even temp tables!) are a challenge. Performance isn't great for them, but more importantly we don't start decoding them until they commit so the replication latency for them can be very large. Care is required to time a cutover so you don't land up stopping writes to the old db then waiting ages before the new db is caught up.

 
- We don't have logical decoding for DCL (GRANT/REVOKE)

In general there are issues with any command affecting the cluster as a whole: capturing them, ensuring replay order between them if run in different dbs, etc.

The simplest option would be to disallow them during migration (make them ERROR). Or possibly restrict them to a single DB where we permit them and add them to a DDL replication queue.

If you let them run in any db and capture them wherever they ran you run into order-of-apply issues because the logical change streams aren't synchronised between the DBs.
 

We also came up with and, we believe, addressed an important issue,
namely how to ensure continuity.  When we issue a `pg_ctl stop -w`,
that's short for "Cancel current commands and stop cleanly."  At this
point, the new server will not have WAL to replay, so a pg_ctl restart
will load the new configuration and come up pretty much immediately,
and the next try will find a brand new server without a down time
event.

You'll need to be able to make sure the old server stays up after the last user commit  for long enough to flush all pending WAL to the new server. There may be committed changes that the new server hasn't applied yet.

If the old server crashes during the grace period you'll have to restart it and retry, since you don't know for sure if the new server got all the changes.

You'll want some kind of read-only mode where you can ensure that exactly one server is writeable by user queries at a time. And to prevent conflicts there'll be some period where both are read-only during the final catchup phase; you have to make the old server read-only, wait until the new server applies pending changes, then make the new server read/write. Some care about when that's done will minimise the write-downtime. And users who're willing to accept conflicts can always force read/write earlier on the new server. But only if they're not using SEQUENCEs.

The cut-over period is needed to make sure all sequences are fully synced over too.

And if 2PC is in use, the cutover must be done during a period where there are no prepared txns on the old master, since we don't currently decode and send txns until COMMIT PREPARED time. So we'd lose prepared-but-not-committed txns if we cut over while they existed.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Better Upgrades

From
Peter Eisentraut
Date:
On 2/5/18 19:09, David Fetter wrote:
> - Add a new script--possibly Perl or Bash, which would:
>     - Initdb a new cluster with the new version of PostgreSQL and a
>       different port.

This will need integration with the packaging system.  You'll want to
carry over settings from the old instance.  You might want to put the
new instance on a different host.

>     - Start logical replication from the old version to the new
>       version.

There is a step missing that does the DDL sync.  And various features
are not handled by logical replication.  So you'll need a pre-check mode
like pg_upgrade.

Also, you need to do this for each database, so you'll need to decide
whether you'll do it all in parallel or sequentially, where you will
continue when it fails part way through, etc.

>     - Poll until a pre-determined default amount of replication lag was observed, then:

Probably the replication lag should be zero, or perhaps you'll even want
to switch to synchronous replication.  Or perhaps you'll switch the
writing node while replication is still catching up.  A lot of that
depends on the application.

>       * Issue an ALTER SYSTEM on the new server to change its port to the old server's

Or you use a connection proxy and have that handle redirecting the
traffic, so you don't need to restart anything.

>       * Issue a pg_ctl stop -w to the old server
>       * Issue a pg_ctl restart on the new server

You can't use pg_ctl when using systemd.

> Does this seem worth coding up in its current form?

Logically, this should be a ten line script.  But I fear making it
actually work in a variety of practical scenarios will probably require
dozens of options and end up very complicated.

At this point, it might be worth more to actually try this procedure by
hand first and work out the details, e.g., how do you do the DDL sync,
how to you convert the configuration files, etc.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Better Upgrades

From
"Joshua D. Drake"
Date:
On 02/05/2018 04:09 PM, David Fetter wrote:
> Does this seem worth coding up in its current form?

No. The pg_upgrade utility is awesome and I have commended Bruce on 
multiple occasions about his work with it. That being said, the
"solution" is to support in-place upgrades and our work should be toward 
that. The idea that we can support and upgrade to the catalog plus 
backward and forward support for changes to the page files (upgrade the 
page files as they are accessed/written to) is a much longer, more 
mature and reasonable approach to this problem[1].

JD

1. Props to Theo for bringing this up about a decade ago.


> Best,
> David.


-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Better Upgrades

From
Craig Ringer
Date:
On 6 February 2018 at 09:51, Joshua D. Drake <jd@commandprompt.com> wrote:
On 02/05/2018 04:09 PM, David Fetter wrote:
Does this seem worth coding up in its current form?

No. The pg_upgrade utility is awesome and I have commended Bruce on multiple occasions about his work with it. That being said, the
"solution" is to support in-place upgrades and our work should be toward that.

Yeah. Streaming upgrade is useful, but IMO it's a workaround for upgrade issues more than a solution in its self. Useful for people who want a conservative upgrade, but not that big a win. Sure you'd like to be able to downgrade again if it doesn't go right, but that requires a two-way sync, which introduces its own problems and failure modes.

Support for reading prior version catalogs in-place is what I see as the user-friendly end goal. Just start Pg12 on top of a pg11 datadir with the --allow-upgrade flag and you're done.
 
But I don't think I'm any keener to do the drudgery required to implement it than anyone else is... and I share others' concerns about the maintenance burden imposed, impact on future catalog change freedom, etc.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sv: Better Upgrades

From
Andreas Joseph Krogh
Date:
På tirsdag 06. februar 2018 kl. 01:09:18, skrev David Fetter <david@fetter.org>:
Folks,

While chatting with Bruce about how to make something better than
pg_upgrade, we (and by "we," I mean mostly Bruce) came up with the
following.

What needs improvement:

- pg_upgrade forces a down time event, no matter how cleverly it's done.
- pg_upgrade is very much a blocker for on-disk format changes.

The proposal:

- Add a new script--possibly Perl or Bash, which would:
    - Initdb a new cluster with the new version of PostgreSQL and a
      different port.
    - Start logical replication from the old version to the new
      version.
    - Poll until a pre-determined default amount of replication lag was observed, then:
      * Issue an ALTER SYSTEM on the new server to change its port to the old server's
      * Issue a pg_ctl stop -w to the old server
      * Issue a pg_ctl restart on the new server
      * Happiness!

Assumptions underlying it:

- Disk and similar resources are cheap enough for most users that
  doubling up during the upgrade is feasible.
- The default upgrade path should require exactly one step.
- Errors do not, by and large, have the capacity to violate an SLA.

The proposal has blockers:

- We don't actually have logical decoding for DDL, although I'm given
  to understand that Álvaro Herrera has done some yeoman follow-up
  work on Dimitri Fontaine's PoC patches.
- We don't have logical decoding for DCL (GRANT/REVOKE)
 
- Need to add support for decoding LargeObjects.
 
--
Andreas Joseph Krogh

Re: Better Upgrades

From
Daniel Gustafsson
Date:
> On 06 Feb 2018, at 01:09, David Fetter <david@fetter.org> wrote:

> - pg_upgrade is very much a blocker for on-disk format changes.

I wouldn’t call it a blocker, but pg_upgrade across an on-disk format change
would be a very different experience from what we have today since it would
need to read and rewrite data rather than hardlink/copy.  Definitely not a
trivial change though, that I completely agree with.

> The proposal:
>
> - Add a new script--possibly Perl or Bash, which would:
>    - Initdb a new cluster with the new version of PostgreSQL and a
>      different port.
>    - Start logical replication from the old version to the new
>      version.
>    - Poll until a pre-determined default amount of replication lag was observed, then:
>      * Issue an ALTER SYSTEM on the new server to change its port to the old server's
>      * Issue a pg_ctl stop -w to the old server
>      * Issue a pg_ctl restart on the new server
>      * Happiness!

Considering how many that will want to build frontends for upgrade tools, or
include them in automation frameworks, I think it would be wise to make any new
upgrade tool capable of emitting machine parseable status information (or any
equivalent means of “what is it doing right now” interrogation).  Thats not
specific for this tool of course, more a general observation regarding upgrade
tools and their usage.

cheers ./daniel

Re: Better Upgrades

From
Bruce Momjian
Date:
On Tue, Feb  6, 2018 at 01:51:09PM +0100, Daniel Gustafsson wrote:
> > On 06 Feb 2018, at 01:09, David Fetter <david@fetter.org> wrote:
> 
> > - pg_upgrade is very much a blocker for on-disk format changes.
> 
> I wouldn’t call it a blocker, but pg_upgrade across an on-disk format change
> would be a very different experience from what we have today since it would
> need to read and rewrite data rather than hardlink/copy.  Definitely not a
> trivial change though, that I completely agree with.

Uh, not necessarily.  To allow for on-disk format changes, pg_upgrade
_could_ rewrite the data files as it copies them (not link), or we could
modify the backend to be able to read the old format.  We have already
done that for some changes to data and index types.

If we did add more backward-read capability to the backend, I think we
would need to add some housekeeping so we could know when the
read-old-format code could be removed.

I have to admit that the number of hurdles needed to use logical
replication to reduce downtime made me feel like it is cleaner to just
take the multi-minute downtime using link mode and have it be reliable. 
Even if you can get logical replication to work, the drain/switch over
time could be significant.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Better Upgrades

From
Greg Stark
Date:
On 6 February 2018 at 03:13, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 6 February 2018 at 09:51, Joshua D. Drake <jd@commandprompt.com> wrote:
>>
>> On 02/05/2018 04:09 PM, David Fetter wrote:
>>
>>> Does this seem worth coding up in its current form?
>>
>> No. The pg_upgrade utility is awesome and I have commended Bruce on
>> multiple occasions about his work with it. That being said, the
>> "solution" is to support in-place upgrades and our work should be toward
>> that.
>
> Yeah. Streaming upgrade is useful, but IMO it's a workaround for upgrade
> issues more than a solution in its self. Useful for people who want a
> conservative upgrade, but not that big a win. Sure you'd like to be able to
> downgrade again if it doesn't go right, but that requires a two-way sync,
> which introduces its own problems and failure modes.

My feeling is that worrying about in-place binary upgrades today is
wasted effort. Already the window for installations where this is
useful is narrow -- you have to be big enough that the resources for
deploying a second instance is significant but not so big that the
downtime and risk is untenable. I have the feeling that in-place
binary upgrades are going to end up sapping developer time and imposes
awkward constraints on the system for a legacy feature users are
recommended to never use anyways.

Running distributed systems with replication used to be a niche "HA"
solution. Today it's the norm to have configuration management systems
that automatically provision and deploy however many replicas you want
and set up repmgr/consul/whatever to do automatic failovers. In that
environment creating a couple new logical replicas using the new
version and doing a failover becomes an easy choice.


-- 
greg


Re: Better Upgrades

From
Daniel Gustafsson
Date:
> On 02 Mar 2018, at 12:59, Greg Stark <stark@mit.edu> wrote:

> My feeling is that worrying about in-place binary upgrades today is
> wasted effort. Already the window for installations where this is
> useful is narrow -- you have to be big enough that the resources for
> deploying a second instance is significant but not so big that the
> downtime and risk is untenable.

I might be colorblind from $dayjob, but I don’t think that these installations
(data warehouses et.al) are that uncommon.  They are also installations that
risk staying on an old version due to upgrades being non-trivial (not saying
that in-place is trivial, just that there are places where it may make sense).

> I have the feeling that in-place
> binary upgrades are going to end up sapping developer time

Having worked on supporting the 8.2->8.3 on-disk format change in pg_upgrade
for GPDB, I am not arguing against that.  Not at all.

cheers ./daniel

Re: Better Upgrades

From
Daniel Gustafsson
Date:
> On 02 Mar 2018, at 01:03, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Feb  6, 2018 at 01:51:09PM +0100, Daniel Gustafsson wrote:
>>> On 06 Feb 2018, at 01:09, David Fetter <david@fetter.org> wrote:
>>
>>> - pg_upgrade is very much a blocker for on-disk format changes.
>>
>> I wouldn’t call it a blocker, but pg_upgrade across an on-disk format change
>> would be a very different experience from what we have today since it would
>> need to read and rewrite data rather than hardlink/copy.  Definitely not a
>> trivial change though, that I completely agree with.
>
> Uh, not necessarily.  To allow for on-disk format changes, pg_upgrade
> _could_ rewrite the data files as it copies them (not link), or we could
> modify the backend to be able to read the old format.  We have already
> done that for some changes to data and index types.

Right, that is another option.  I guess we’ll have to wait and see what the
impact will be for the available options when we get there, until there is an
actual on-disk change to reason around it’s a fairly academic discussion.

cheers ./daniel

Re: Better Upgrades

From
David Fetter
Date:
On Mon, Mar 05, 2018 at 11:18:20AM +0100, Daniel Gustafsson wrote:
> > On 02 Mar 2018, at 12:59, Greg Stark <stark@mit.edu> wrote:
> 
> > My feeling is that worrying about in-place binary upgrades today
> > is wasted effort. Already the window for installations where this
> > is useful is narrow -- you have to be big enough that the
> > resources for deploying a second instance is significant but not
> > so big that the downtime and risk is untenable.
> 
> I might be colorblind from $dayjob,

I agree.

> but I don’t think that these installations (data warehouses et.al)
> are that uncommon.

Data warehouses are by no means rare. They also need backups, just as
any other system does, and that means at the very least duplicate
storage on at least one separate node, even if that node can't
actually bring the warehouse back up in the case of a catastrophic
failure of the original warehouse.

> They are also installations that risk staying on an old version due
> to upgrades being non-trivial (not saying that in-place is trivial,
> just that there are places where it may make sense).

I see in-place upgrades as the riskiest of the possible options, and
that's not just in the case of PostgreSQL.  Every other system with
that feature has had catastrophic failures that were impossible to
predict in advance.  That reality turns on the fundamentals of
constructing such systems, to wit:

- They take enormous amounts of deep knowledge to get right.
- The people who have that knowledge are not inclined to doing what
  amounts to an invisible feature.
- They are perforce done as the last thing before a release, often
  blocking other features and holding up the release process as a
  consequence.
- They can never really be tested for corner cases--see above for one
  of the reasons.
- There can be no realistic back-out plan when something goes wrong.

> > I have the feeling that in-place binary upgrades are going to end
> > up sapping developer time
> 
> Having worked on supporting the 8.2->8.3 on-disk format change in
> pg_upgrade for GPDB, I am not arguing against that.  Not at all.

Your experience reflects one of the fundamental problems with those
systems. It wasn't a one-off.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate